A Future for R: Common Issues with Solutions

In the ideal case all it takes to start using futures in R is to replace select standard assignments (<-) in your R code with future assignments (%<=%) and make sure the right-hand side (RHS) expressions are within curly brackets ({ ... }). Also, if you assign these to lists (e.g. in a for loop), you need to use a list environment (listenv) instead of a plain list.

However, there are few cases where you have to take extra precautions. These are often related to how global variables are falsely identified in non-standard evaluation, e.g. subset(data, x < 3). Global variables need to be identified when futures are created, but this is a particularly hard task when non-standard evaluation is involved.

If you identify other use cases, please consider reporting them so they can be documented here and possibly even be fixed.

Issues with globals

False globals due to non-standard evaluation

subset(data, x < 3)

Consider the following use of subset():

> data <- data.frame(x=1:5, y=1:5)
> v <- subset(data, x < 3)$y
> v
[1] 1 2

From a static code inspection point of view, the expression x < 3 asks for variable x to be compared to 3, and there is nothing specifying that x is part of data and not the global environment. That x is indeed part of the data object can only safely be inferred at run time when subset() is called. This is not a problem in the above snipped, but when using futures all global/unknown variables need to be captured when the future is created (it is too late to do it when the future is resolved), e.g.

> library(future)
> data <- data.frame(x=1:5, y=1:5)
> v %<=% subset(data, x < 3)$y
Error in globalsOf(expr, envir = envir, tweak = tweakExpression, dotdotdot = "return",  :
  Identified a global by static code inspection, but failed to locate the corresponding
  object in the relevant environments: 'x'

Above, code inspection of the future expression subset(data, x < 3)$y incorrectly identifies x as a global variables that needs to be captured (“frozen”) for the (lazy) future. Since no such variable x exists, we get an error.

The most clear and backward-compatible solution to this problem is to explicitly specify the context of x, i.e.

> data <- data.frame(x=1:5, y=1:5)
> v %<=% subset(data, data$x < 3)$y
> v
[1] 1 2

An alternative is to use a dummy variable. In contrast to the code-inspection algorithm used to identify globals, we know from reading the documentation that subset() will look for x in the data object, not in the parent environments. Armed with this knowledge, we can trick the future package (more precisely the globals package) to pickup a dummy variable x instead, e.g.

> data <- data.frame(x=1:5, y=1:5)
> x <- NULL ## To please future et al.
> v %<=% subset(data, x < 3)$y
> v
[1] 1 2

ggplot2

Another common use case for non-standard evaluation is when creating ggplot2 figures. For instance, in

> library(ggplot2)
> p <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
> p

fields mpg and wt of the mtcars data object are plotted against each other. That mpg and wt are actually fields of mtcars can not be inferred from code inspection alone, but you need know that that is how ggplot2 works. Analogously to the above subset() example, this explains why we get the following error:

> library(future)
> library(ggplot2)
> p %<=% { ggplot(mtcars, aes(wt, mpg)) + geom_point() }
Error in globalsOf(expr, envir = envir, tweak = tweakExpression, dotdotdot = "return",  :
  Identified a global by static code inspection, but failed to locate the corresponding
  object in the relevant environments: 'wt'

A few comments are needed here. First of all, because %<=% has higher precedence than +, we need to place all of the ggplot2 expression within curly brackets, otherwise we get an error. Second, the reason for only wt being listed as a missing global variable and not mpg is because the latter is (incorrectly) located to be ggplot2::mpg.

One workaround is to make use of the *_string() functions of ggplot2, e.g.

> p %<=% { ggplot(mtcars, aes_string('wt', 'mpg')) + geom_point() }
> p

Another one, is to explicitly specify mtcars$wt and mtcars$mpg, which may become a bit tedious. A third alternative is to make use of dummy variables wt and mpg, i.e.

> p %<=% {
+   wt <- mpg <- NULL
+   ggplot(mtcars, aes(wt, mpg)) + geom_point()
+ }
> p

By the way, since all futures are evaluated in a local environment, the dummy variables are not assigned to the calling environment.

Missing or incorrect globals

Global variable obscured by subassignment

When a global variable is a vector, a matrix, a list, a data frame, an environment, or any other type of object that can be assigned via subsetting, the global package fails to identify it as a global variable if its first occurrence in the future expression is as part of a subsetting assignment. For example,

> library(future)
> x <- matrix(1:12, nrow=3, ncol=4)
> y %<=% {
+   x[1,1] <- 3
+   42
+ }
> rm(x)
> y
Error in x[1, 1] <- 3 : object 'x' not found

Another example is

> library(future)
> x <- list(a=1, b=2)
> y %<=% {
+   x$c <- 3
+   42
+ }
> rm(x)
> y
Error in x$c <- 3 : object 'x' not found

A workaround is to explicitly tell the future package about the global variable by simply listing it at the beginning of the expression, e.g.

> library(future)
> x <- list(a=1, b=2)
> y %<=% {
+   x  ## Force 'x' to be global
+   x$c <- 3
+   42
+ }
> rm(x)
> y
[1] 42

do.call() - function not found

When calling a function using do.call() make sure to specify the function as the object itself and not by name. This will help identify the function as a global object in the future expression. For instance, use

do.call(file_ext, list("foo.txt"))

instead of

do.call("file_ext", list("foo.txt"))

so that file_ext() is properly located and exported. Although you may not notice a difference when evaluating futures in the same R session, it may become a problem if you use a character string instead of a function object when futures are evaluated in external R sessions, such as on a cluster. It may also become a problem with lazy futures an the intended function is redefined after the future is resolved. For example,

> library("future")
> library("listenv")
> library("tools")
> plan(lazy)
> pathnames <- c("foo.txt", "bar.png", "yoo.md")
> res <- listenv()
> for (ii in seq_along(pathnames)) {
+   res[[ii]] %<=% do.call("file_ext", list(pathnames[ii]))
+ }
> file_ext <- function(...) "haha!"
> unlist(res)
[1] "haha!" "haha!" "haha!"

Trying to pass an unresolve future to another future

It is not possible for a future to resolve another one unless it was created by the future trying to resolve it. For instance, the following gives an error:

> library("future")
> plan(multisession, maxCores=3L)
> f1 <- future({ Sys.getpid() })
> f2 <- future({ value(f1) })
> v1 <- value(f1)
[1] 7464
> v2 <- value(f2)
Error: Invalid usage of futures: A future whose value has not yet been collected
 can only be queried by the R process (cdd013cb-e045-f4a5-3977-9f064c31f188; pid
 1276 on MyMachine) that created it, not by any other R processes (5579f789-e7b6
 -bace-c50d-6c7a23ddb5a3; pid 2352 on MyMachine): {; Sys.getpid(); }

This is because the main R process creates two futures, but then the second future tries to retrieve the value of the first one. This is an invalid request because the second future has no channel to communicate with the first future; it is only the process that created a future who can communicate with it(*).

Note that it is only unresolved futures that cannot be queried this way. Thus, the solution to the above problem is to make sure all futures are resolved before they are passed to other futures, e.g.

> f1 <- future({ Sys.getpid() })
> v1 <- value(f1)
> v1
[1] 7464
> f2 <- future({ value(f1) })
> v2 <- value(f2)
> v2
[1] 7464

This works because the value has already been collected and stored inside future f1 before future f2 is created. Since the value is already stored internally, value(f1) is readily available everywhere. Of course, instead of using value(f1) for the second future, it would be more readable and cleaner to simply use v1.

The above is typically not a problem when future assignments are used. For example:

> v1 %<=% { Sys.getpid() })
> v2 %<=% { v1 }
> v1
[1] 2352
> v2
[1] 2352

The reason that this approach works out of the box is because in the second future assignment v1 is identified as a global variable, which is retrieved. Up to this point, v1 is a promise (“delayed assignment” in R), but when it is retrieved as a global variable its value is resolved and v1 becomes a regular variable.

However, there are cases where future assignments can be passed via global variables without being resolved. This can happen if the future assignment is done to an element of an environment (including list environments). For instance,

> library("listenv")
> x <- listenv()
> x$a %<=% { Sys.getpid() }
> x$b %<=% { x$a }
> x$a
[1] 2352
> x$b
Error: Invalid usage of futures: A future whose value has not yet been collected
 can only be queried by the R process (cdd013cb-e045-f4a5-3977-9f064c31f188; pid
 1276 on localhost) that created it, not by any other R processes (2ce86ccd-5854
 -7a05-1373-e1b20022e4d8; pid 7464 on localhost): {; Sys.getpid(); }

As previously, this can be avoided by making sure x$a is resolved first, which can be one in various ways, e.g. dummy <- x$a, resolve(x$a) and force(x$a).

Footnote: (*) Although certain types of futures, such as eager and lazy ones, could be passed on to other futures and be resolved there because they share the same evaluation process, by definition of the Future API it is invalid to do so regardless of future type. This conservative approach is taken in order to make future expressions behave consistently regardless of type of futures.

Miscellanous

Error “non-numeric argument to binary operator”

The future assignment operator %<=% is a binary infix operator, which means it has higher precedence than most other binary operators but also higher than some of the unary operators in R. For instance, this explains why we get the following error:

> x %<=% 2 * runif(1)
Error in x %<=% 2 * runif(1) : non-numeric argument to binary operator

What effectively is happening here is that because of the higher priority of %<=%, we first create a future x %<=% 2 and then we try to multiply it (not its value) with the value of runif(1) - which makes no sense. In order to properly assign the future variable, we need need to put the future expression within curly brackets;

> x %<=% { 2 * runif(1) }
> x
[1] 1.030209

Parentheses will also do. For details on precedence on operators in R, see Section 'Infix and prefix operators' in the 'R Language Definition' document.


Copyright Henrik Bengtsson, 2015-2016