The future package provides a generic API for using futures in R. A future is a simple yet powerful mechanism to evaluate an R expression and retrieve its value at some point in time. Futures can be resolved in many different ways depending on which strategy is used. There are various types of synchronous and asynchronous futures to choose from in the future package.
This package, future.callr, provides a type of futures that utilizes the callr package.
For example,
> library("future.callr")
> plan(callr)
>
> x %<-% { Sys.sleep(5); 3.14 }
> y %<-% { Sys.sleep(5); 2.71 }
> x + y
[1] 5.85
This is obviously a toy example to illustrate what futures look like and how to work with them. For further examples on how to use futures, see the vignettes of the future package as well as those of future.apply and doFuture.
The future.callr package implements a future backend wrapper for callr.
Backend | Description | Alternative in future package |
---|---|---|
callr |
parallel evaluation in a separate R process (on current machine) | plan(multisession) |
When using callr
futures, each future is resolved in a
fresh background R session which ends as soon as the value of the future
has been collected. In contrast, multisession
futures are
resolved in background R worker sessions that serve multiple futures
over their life spans. The advantage of using a new R process for each
future is that the R environment is guaranteed not to be contaminated by
previous futures, e.g. memory allocations, finalizers, modified options,
and loaded and attached packages. The disadvantage, is an added overhead
of launching a new R process. (At the moment, I am neither aware of
formal benchmarking of this extra overhead nor of performance
comparisons of callr
to alternative future backends.)
Another advantage with callr
futures compared to
multisession
futures is that they do not communicate via R
(socket) connections. This avoids the limitation in the number of
parallel futures that can be active at any time that
multisession
futures and cluster
futures in
general have, which they inherit from SOCKcluster
clusters
as defined by the parallel package. The number of
parallel futures these can serve is limited by the maximum
number of open connections in R, which currently is 125 (excluding
the three reserved by R itself). Note that these 125 slots have to be
shared with file connections etc. To increase this limit, R has to be
rebuilt from source. However, since callr
futures rely on
the callr package
which does not make use of R-specific connections, there is no limit
in the number of background R processes that can be used
simultaneously.
A third advantage with callr
futures, is that there is
not risk for port-clashing with other processes on the system when
clusters are set up (*), because callr does not rely on
ports. Furthermore, on Windows, the firewall triggers an alert that the
user needs to approve whenever a not-previously-approved port is
requested by R - which
happens also for local, non-public ports that are used by
SOCKcluster
:s. When using callr
futures, no
sockets and therefore no ports are involved.
(*) To lower the risk for such clashes SOCKcluster
:s (of
the parallel package) request random ports, but clashes
still occur at times.
The future
package provides a demo using futures for calculating a set of
Mandelbrot planes. The demo does not assume anything about what type of
futures are used. The user has full control of how futures are
evaluated. For instance, to use callr
futures, run the
demo as:
library("future.callr")
plan(callr)
demo("mandelbrot", package = "future", ask = FALSE)