In the ideal case all it takes to start using futures in R is to replace any standard assignment (<-
) in your R code with a future assignment (%<=%
) and make sure your right-hand side (RHS) expression is within curly brackets ({ ... }
). Also, if you assign these to lists (e.g. in a for loop), you need to use a list environment (listenv
) instead of a plain list.
However, there are few cases where you have to take extra precautions. These are often related to how global variables are falsely identified in non-standard evaluation, e.g. subset(data, x < 3)
. Global variables need to be identified when futures are created, but this is a particularly hard task when non-standard evaluation is involved.
If you identify other use cases, please consider reporting them so they can be documented here and possibly even be fixed.
Consider the following use of subset()
:
> data <- data.frame(x=1:5, y=1:5)
> v <- subset(data, x < 3)$y
> v
[1] 1 2
From a static code inspection point of view, the expression x < 3
asks for variable x
to be compared to 3, and there is nothing specifying that x
is part of data
and not the global environment. That x
is indeed part of the data
object can only safely be inferred at run time when subset()
is called. This is not a problem in the above snipped, but when using futures all global/unknown variables need to be captured when the future is created (it is too late to do it when the future is resolved), e.g.
> library(future)
> data <- data.frame(x=1:5, y=1:5)
> v %<=% subset(data, x < 3)$y
Error in globalsOf(expr, envir = envir, tweak = tweakExpression, dotdotdot = "return", :
Identified a global by static code inspection, but failed to locate the corresponding
object in the relevant environments: 'x'
Above, code inspection of the future expression subset(data, x < 3)$y
incorrectly identifies x
as a global variables that needs to be captured (“frozen”) for the (lazy) future. Since no such variable x
exists, we get an error.
The same error would be reported when using plan(eager, globals=TRUE)
or plan(multicore, globals=TRUE)
, which validates globals before the future is created.
The most clear and backward-compatible solution to this problem is to explicitly specify the context of x
, i.e.
> data <- data.frame(x=1:5, y=1:5)
> v %<=% subset(data, data$x < 3)$y
> v
[1] 1 2
An alternative is to use a dummy variable. In contrast to the code-inspection algorithm used to identify globals, we know from reading the documentation that subset()
will look for x
in the data
object, not in the parent environments. Armed with this knowledge, we can trick the future package (more precisely the globals package) to pickup a dummy variable x
instead, e.g.
> data <- data.frame(x=1:5, y=1:5)
> x <- NULL ## To please future et al.
> v %<=% subset(data, x < 3)$y
> v
[1] 1 2
Another common use case for non-standard evaluation is when creating ggplot2 figures. For instance, in
> library(ggplot2)
> p <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
> p
fields mpg
and wt
of the mtcars
data object are plotted against each other. That mpg
and wt
are actually fields of mtcars
can not be inferred from code inspection alone, but you need know that that is how ggplot2 works. Analogously to the above subset()
example, this explains why we get the following error:
> library(future)
> library(ggplot2)
> p %<=% { ggplot(mtcars, aes(wt, mpg)) + geom_point() }
Error in globalsOf(expr, envir = envir, tweak = tweakExpression, dotdotdot = "return", :
Identified a global by static code inspection, but failed to locate the corresponding
object in the relevant environments: 'wt'
A few comments are needed here. First of all, because %<=%
has higher precedence than +
, we need to place all of the ggplot2 expression within curly brackets, otherwise we get an error. Second, the reason for only wt
being listed as a missing global variable and not mpg
is because the latter is (incorrectly) located to be ggplot2::mpg
.
One workaround is to make use of the *_string()
functions of ggplot2, e.g.
> p %<=% { ggplot(mtcars, aes_string('wt', 'mpg')) + geom_point() }
> p
Another one, is to explicitly specify mtcars$wt
and mtcars$mpg
, which may become a bit tedious.
A third alternative is to make use of dummy variables wt
and mpg
, i.e.
> p %<=% {
+ wt <- mpg <- NULL
+ ggplot(mtcars, aes(wt, mpg)) + geom_point()
+ }
> p
By the way, since all futures are evaluated in a local environment, the dummy variables are not assigned to the calling environment.
When a global variable is a vector, a matrix, a list, a data frame, an environment, or any other type of object that can be assigned via subsetting, the global package fails to identify it as a global variable if its first occurrence in the future expression is as part of a subsetting assignment. For example,
> library(future)
> x <- matrix(1:12, nrow=3, ncol=4)
> y %<=% {
+ x[1,1] <- 3
+ 42
+ }
> rm(x)
> y
Error in x[1, 1] <- 3 : object 'x' not found
Another example is
> library(future)
> x <- list(a=1, b=2)
> y %<=% {
+ x$c <- 3
+ 42
+ }
> rm(x)
> y
Error in x$c <- 3 : object 'x' not found
A workaround is to explicitly tell the future package about the global variable by simply listing it at the beginning of the expression, e.g.
> library(future)
> x <- list(a=1, b=2)
> y %<=% {
+ x ## Force 'x' to be global
+ x$c <- 3
+ 42
+ }
> rm(x)
> y
[1] 42
When calling a function using do.call()
make sure to specify the function as the object itself and not by name. This will help identify the function as a global object in the future expression. For instance, use
do.call(file_ext, list("foo.txt"))
instead of
do.call("file_ext", list("foo.txt"))
so that file_ext()
is properly located and exported. Although you may not notice a difference when evaluating futures in the same R session, it may become a problem when futures are evaluated in external R sessions, such as on a cluster.
It may also become a problem with lazy futures an the intended function is redefined after the future is resolved. For example,
> library("future")
> library("listenv")
> library("tools")
> plan(lazy)
> pathnames <- c("foo.txt", "bar.png", "yoo.md")
> res <- listenv()
> for (ii in seq_along(pathnames)) {
+ res[[ii]] %<=% do.call("file_ext", list(pathnames[ii]))
+ }
> file_ext <- function(...) "haha!"
> unlist(res)
[1] "haha!" "haha!" "haha!"
The future assignment operator %<=%
is a binary infix operator, which means it has higher precedence than most other binary operators but also higher than some of the unary operators in R. For instance, this explains why we get the following error:
> x %<=% 2 * runif(1)
Error in x %<=% 2 * runif(1) : non-numeric argument to binary operator
What effectively is happening here is that because of the higher priority of %<=%
, we first create a future x %<=% 2
and then we try to multiply it (not its value) with the value of runif(1)
- which makes no sense. In order to properly assign the future variable, we need need to put the future expression within curly brackets;
> x %<=% { 2 * runif(1) }
> x
[1] 1.030209
Parentheses will also do. For details on precedence on operators in R, see Section 'Infix and prefix operators' in the 'R Language Definition' document.
Copyright Henrik Bengtsson, 2015