Selecting subsets of a data.frame is easy in R if you define the predicates manually.
But if you need to define many conditions the standard slicing and subsetting methods
are cumbersome.
For this illustration I want to pick some large number of numerical ranges and label
all of the rows that match any of the predicates.
The key is using outer to match against many predicates and then checking that any of them was satisfied.
peaks <- pi*c(0,2,4,6,8,10) low <- peaks - pi/4 high <- peaks + pi/4 ranges <- data.frame(low=low,high=high) x<- seq(0,10*pi,0.01) y<- cos(x) df <- data.frame(x=x,y=y) # given a vector x # which elements are contained in one of the ranges # defined by the high and low columns of the ranges data.frame library(plyr) inranges <- function(x, ranges) { a<-outer(x,ranges$low, ">") b<-outer(x,ranges$high, "<") c<-a & b aaply(c,1,function(y) any(y) ) } # I can now add a new column that indicates which rows matched df$peaks <- inranges(df$x, ranges) library(ggplot2) p <- ggplot(df,aes(x=x,y=y)) p <- p + geom_point(aes(color=peaks)) p #or I can subset the data to only the matching rows: df.peaks <- subset(df,inranges(x,ranges)) p <- ggplot(df.peaks,aes(x=x,y=y)) p <- p + geom_point() p
Thanks – very useful. Instead of using “aaply” from “plyr”, would “any” do?
Great point. I could replace
T %in% y
withany(y)
. I still need to compute any for each row, so I do not see a way to get rid of the aaply off the top of my head.Thanks!
It is not working:
Error in get(x, envir = this, inherits = inh)(this, …) :
attempt to apply non-function
Work for me. If you provide the line-by-line output I’ll be glad to help you debug.
platform x86_64-apple-darwin9.8.0
arch x86_64
os darwin9.8.0
system x86_64, darwin9.8.0
status
major 2
minor 14.1
year 2011
month 12
day 22
svn rev 57956
language R
version.string R version 2.14.1 (2011-12-22)
Good tip!