RLink User Guide
The RLink application consists of two parts, RLink proper, and the RLinkRuntime paclet. The former contains the Wolfram Language code implementing the RLink API and Java code interfacing RLink with R, and comes with the Wolfram Language. The latter contains basically the R distribution, with a few additional libraries, and is packaged as a downloadable paclet.
In cases where you want to use the R distribution bundled with RLink (as opposed to using your own R installation), you will need to have the RLinkRuntime paclet downloaded and installed on your machine before you can use RLink. The default is set so that the InstallR function (described in the next section) used to install R runtime does that for you automatically if it finds no paclet installed.
Sometimes, however, it may be more convenient to download and install the paclet manually. You can do this with the RLinkResourcesInstall function.
If the result is $Failed, this means that the installation failed, which may happen for various reasons, such as if you are not connected to the internet or have disabled the Wolfram System's internet access. This has to be corrected before RLink can function properly (unless you use your own R distribution, in which case you do not need RLinkResourcesInstall at all—this is described in the next section).
While the package has been loaded, the R runtime has not been started yet. To do that, you need to call the InstallR function.
When you use InstallR with the default R distribution (which comes with RLink), InstallR checks for the RLinkRuntime paclet (containing that R distribution). If it does not find it installed, it calls internally the RLinkResourcesInstall function, which attempts to download the paclet from the Wolfram paclet server and install it. This is a one-time operation, but you will need to have internet connectivity enabled for it to succeed. Once the paclet has been successfully downloaded and installed, it will be used by RLink for all your subsequent RLink sessions automatically (unless you give an explicit option to InstallR to point it to a different R distribution, in which case that one will be used).
To stop and uninstall the R runtime, you can use the UninstallR command. This will uninstall R.
Now you can no longer use RLink functions that communicate with R until you call InstallR again. For example, this attempts to send some data to R.
This installs R again, using InstallR.
Chances are that you will rarely need to use UninstallR. One situation where it is useful is when you need to relaunch the R runtime with a different set of command-line options. Another such situation is when you have several different R distributions installed on the same machine and would like to point RLink to a different one than currently used.
RLink supports connecting to an external R distribution for all platforms where RLink and R are supported. In simple cases, no additional configuration steps will be required to establish the connection. However, in general, one has to go through the one-time R configuration process, described in detail in "Configuring an External R Installation to Work with RLink".
The only required argument for external R installation is the option "RHomeLocation", which tells RLink where to look for the R home directory. The configuration process is necessary if InstallR fails when passed this option only. Once this configuration process has been completed for the specified R installation, one should be able to connect to that installation by calling InstallR with some extra options passed.
Specify the Location of the R Distribution When Installing RLink
In simple cases, you can specify the location by using the "RHomeLocation" option to InstallR, calling it as follows.
Specify Additional Options
In cases when the configuration process was required to set up external R distribution for use with RLink, you will in general need to pass additional information to RLink—specifically, the location of the native JRI library built in the mentioned setup process (the "JRINativeLibraryLocation" option) and the version number for this specific R installation (the "RVersion" option) to InstallR. This can be done as follows.
More details and examples for all supported platforms can be found in "Configuring an External R Installation to Work with RLink".
To send data to R, you have to use the RSet function. Your data will have to be expressed in a form that RLink can understand. For most common data types, such as (multidimensional) arrays, you can use the usual Wolfram Language nested list representation of them. More details on this can be found in "R Data Types in RLink".
You can test the assignment with the help of REvaluate.
Since scalars are interpreted as one-element vectors, the result is a list. More details on this can be also found in "R Data Types in RLink".
You do not have to indicate the type of data you are sending to R, in the majority of cases. The data type is determined for you automatically by RLink, based on the form of your data. The procedure for the automatic type detection is described in more detail in the reference page for ToRForm, and also in "R Data Types in RLink".
You can also use RSet on expressions more general than variables. In particular, you can make part assignments to elements of lists and arrays.
In general, the only requirement is that the R expression represented by the string passed as a first argument to RSet can be assigned a value (is an L-value in R).
To execute any string of valid R code and get the results back to the Wolfram Language, you can use the REvaluate function. You have seen some examples of its use already.
In this case, the result was transferred back to the Wolfram Language, but not saved anywhere in the R workspace. If you wish to also save the result, you can assign it to some variable in the R workspace.
You can execute multiline chunks of R code with REvaluate, but in this case, you have to enclose the code in curly braces.
Many more examples can be found on the documentation page for REvaluate.
While the topic of this section is logically connected to the discussion in the previous section, it is important enough to have a separate discussion. If you go slightly beyond using the functions already available in R or its various extensions, one of the main things you may want to do is to define your own R functions, from the Wolfram Language.
This is perfectly possible with RLink. The details of how it is done are discussed in "Functions". Here only a few simple examples will be considered. Generally, functions in RLink are represented by opaque references, which point to functions defined in the R workspace. Such references can be stored in variables or used directly, to call R functions on Wolfram Language expressions as arguments and get the result back to the Wolfram Language.
A function reference was just generated, and stored in a variable sq. But also, an assignment in the R workspace was performed. There are now a number of ways you can call this function. First, you can call it directly in R.
However, constructing function references through REvaluate and using them in such a manner is often not the best option, in particular because a new copy of a function reference is generated at every call, and also because such references have a lifetime of only the current RLink session (this is explained in much detail in "Functions"). There is a special device for creating "better" function references, which are cached and have an indefinite lifespan, by using RFunction.
Such a use does not produce a new copy of a function reference on each call, since references produced by RFunction are cached.
You can use function references as you would other objects in RLink; in particular, you can send them to R and pass them as arguments to other (higher-order) functions, etc. For example, now a previous function reference will be assigned to a variable in R.
You can pass function references as arguments to other functions. For example, you can define an analog of the Wolfram Language's Select function for R as follows.
You can now use it with some custom filtering function, which you can also define with RFunction.
There are a number of more subtle points on using function references in RLink, discussed in "Functions".
For example, if you try to transfer to R some general symbolic expression, you get an error message telling you that RLink does not know how to convert this input to a data type that it can transfer to R.
To learn which inputs can and cannot be transferred to R, please see "R Data Types in RLink", which has a detailed discussion on this.
Some errors do not manifest themselves during the data transfer to R, but show up as R runtime errors. In such cases, RLink attempts to deliver the R error message generated in the R workspace to the Wolfram System.
RLink is a rather high-level interface, built on top of JLink, which itself is built on top of the Wolfram Symbolic Transfer Protocol (WSTP), and RJava/JRI, which is a Java interface to R runtime (the latter used as standalone set of dynamic native libraries). Also, RLink often uses flexible means of data transfer and execution, involving sometimes run time R code generation and execution. This flexibility allows RLink to handle a rather large subset of possible R objects, and also things like part assignments to arrays and lists, in a uniform way. But the price to pay for this is an (often very considerable) overhead. In cases when this overhead is not acceptable, there are ways to optimize the data transfer between the Wolfram Language and R. Some of them are described in this section.
Vectors versus Lists
The main advice here would be to avoid sending and returning R lists whenever you can and send/return R vectors (arrays) instead. The reason for that is that using arrays will be much more efficient, in all stages of communication with R. It will be more efficiently transformed to the internal representation (since Wolfram Language packed arrays can often be utilized), it will be more efficiently transformed to R, it will be more efficiently processed by R, and the result will again be obtained by the Wolfram Language much faster.
You can see that in the latter case, the time complexity is also linear, but with a much larger constant, which is about 40 times that of a vector (which is, more or less, the typical speed ratio between top-level iteration—when done right—and the one using packed arrays).
The overhead is very noticeable (it was not even feasible go to the same larger number of elements as for vectors). While the future versions of RLink will likely have more efficient means of data transfer regarding lists, the current advice is to avoid sending back and forth lists of more than a few thousand entries. Since R lists are frequently used as an aggregate data structure, chances are that huge R lists will not often appear naturally.
The second run is much faster (the first one was necessary since some Java class loading and other events were triggered by the first call on a fresh kernel, which makes the measurement based on the first call inaccurate).
There is a constant overhead of a function call, which dominated the running time for the previous examples. Here, the number of points will be increased 100 times, but the time it takes to compute the result is almost the same.
One general piece of advice is that, whatever you do, you have to try minimizing both the amount of data being transferred both ways and, sometimes even more importantly, the number of times that functions like REvaluate, RSet, and RFunction are called.
The worst possible scenario here is a lightweight function defined in R (or a piece of R code doing very little), called a large number of times from the Wolfram Language. In such a case, you can be almost certain that the total running time will be dominated by the time spent on data transfer and other inner working of RLink, rather than time spent in R doing the actual computation.
The best scenario is when most of the hard, computationally intensive work is done in R, and data is transferred to R and back in an efficient manner (for example, using data structures containing mostly vectors). Lists are OK as long as they do not have a huge number of elements. Since lists are most frequently used as an aggregate data structure to hold together heterogeneous collections of vectors (and possibly other data structures), they do not usually have a huge number of elements unless used inappropriately in cases where vectors should be used. One exception that is quite problematic for RLink is when the result of computation in R is a large ragged array (for example, of integers), which can only be represented as an R list. In such cases, one way to speed up the transfer would be to pad such an array to a rectangular one, whenever this is possible.