Application Structure

R is a programming language and software environment for statistical computing and graphics. RLink is a Wolfram System application that uses JLink and RJava/JRI Java libraries to link to the R functionality. It allows the user to communicate data between the Wolfram Language and R and execute R code from within the Wolfram Language.

R runtime is available as a set of shared libraries (.dll/.so/.dylib). R defines a C-level interface that can be used to call R from external programs. The JRI Java library forms a layer on top of the lower-level C interface and provides a similar (but cross-platform) Java interface, allowing Java programs to call R and exchange information with R. JRI also provides a higher-level interface for most common operations, and it is this interface that is used extensively in RLink.

RLink is a Wolfram System application written in a mixture of Java and the Wolfram Language, and using JLink as a bridge between the two. Note that the choice of Java is more of an implementation detail. It was a convenient choice because it provides a high-level cross-platform interface to R and an access to an object system (of Java), but Java does not play a fundamental role here. It may even happen that further incarnations of RLink will switch to another language for that, e.g. C++.

The Java part contains two hierarchies of classes, used to transfer data to and from R, and a few higher-level classes to hold it all together. Both in- and out-hierarchies map to the simplified object model of R, in the way it is implemented and used by RLink.

The Wolfram Language part contains higher-level constructs built on top of the Java component. That includes communication with Java objects, automatic type identification and function dispatch, data transformations (RLink uses a longer internal form of expressions to communicate data between R and the Wolfram Language, but provides a shorter and more convenient form for the end user), and implementation of the higher-level logic and commands, such as RSet, REvaluate, and RFunction.

Application Layout

RLink is constructed as a Wolfram System application that contains the Wolfram Language code, documentation, shared libraries, and Java libraries (.jar files). In addition, the file PacletInfo.m is a descriptor for the application. A rough outline follows.

RLink
PacletInfo.m
RLink.m
Kernel
init.m
    RDataTypes.m
    DataTypes
        Base.m
        Common.m
Java    
    RLink.jar
    jna.jar
    JRI.jar
    ...
Documentation
English
guide, reference, and tutorial pages

You can find more information on the application layout and how it can be built in the documentation for the Wolfram Workbench at http://www.wolfram.com/products/workbench, particularly in the sections that describe application development.

Note that there is a full R distribution bundled with RLink, which is packaged as a separate paclet, called RLinkRuntime. This distribution, however, has been augmented with additional shared libraries (JRI library and possibly others, depending on the platform). Moreover, some of these shared libraries (as well as R shared libraries) have been modified (this refers to Linux and Mac OS versions of RLink). Modifications did not affect the source code, so that they can be found automatically during the loading (the modifications only affected the way libraries are loaded). Therefore, you can not ordinarily replace the bundled R distribution with your own and expect it to work. Instead, RLink has an explicit option that allows you to switch to using your own R distribution, which must then be located outside the RLink project (currently this option only works for Windows).

Here is the rough outline of RLinkRuntime paclet layout (Windows layout is shown here):

    SystemFiles
        Windows
            R
                bin
                    i386
                        jri.dll
                        R.dll
                        ...
                doc
                ...
        PacletInfo.m

The paclet is downloaded from Wolfram paclet server automatically upon the first use of RLink. Also, RLink supports manual downloads of this paclet.

RLink has a type extension system, which allows the user to define Wolfram Language wrappers (data types) for various R data types, which are not the core R types, but have significant functionality that warrants a separate type. This type extension system provides tools to define such new types and register them with RLink, so that they can be used (exchanged with R) on an equal footing with the core types, and without making any changes to the source code of core RLink.

The Wolfram System will find different parts of the application as necessary. For example, when the Wolfram Language Documentation Center is used, this will connect to find the documentation. When the application is loaded with Needs["RLink`"], this will load the init.m file. Finally, when the InstallR function is executed, the RLink project settings are configured and the R runtime is loaded. RLink automatically adds relevant .jar files to Java ClassPath, and sets up or updates relevant environmental variables (for the running process and subprocesses), such as R_HOME, and on Windows also PATH.

Simplified R Object Model

The following scheme illustrates the simplified object model of R, in the way it is used by RLink.

                |RCode
RObject     --> |RCoreObject + RAttributes
                |REnvironment

                |-- NULL                            
RCoreObject --> |-- RVEctor
                |-- RList
                |-- RFunction

RFunction    -->    |-- builtin
                |-- closure                
                
RVector        --> [(RNativeType|NA)..]

NA            --> Missing element, can be at any position in a vector

                |--    integer
                |--    double
RNativeType --> |--    complex
                |--    logical: TRUE|FALSE        
                |--    character (string)
                
RList        -->    {RObject..}

RAttributes    --> RList

As you can see, within this model, any R object can be represented as an R vector, R list, R function, or R NULL object, plus any R object can have attributes, which themselves are stored in an R list. There are also objects of types REnvironment and RCode, which represent R environments and generic R objects that do not have a special support in RLink, respectively. These two types are different from the rest in that they are used for representation (on the Wolfram Language side) only, and you generally cannot correctly reconstruct R objects from objects of those types.

R vectors can only contain the elements of the same native type, of which RLink supports five types: integer, double, complex, logical, and character. Any R vector may contain missing elements, denoted by NA in R. R lists can contain arbitrary R objects as their elements, including other R lists. Note that lists are a recursive data structure, which is what makes this simple object model also powerful.

Java Implementation

The core of the Java implementation consists of two hierarchies or classes/interfaces that are responsible for sending data to R ("In" classes) and receiving data from R ("Out" classes), plus the class RExecutor.java, which is responsible for execution of R code, and the class RLinkInit.java, which is responsible for the project initialization.

Both "In" and "Out" class hierarchies implement a mapping between Java and the simplified R object model described in the previous section. Classes are intended to be immutable (no setter methods, all data can be set only in constructors, or, for the "Out" types, as a result of the action of the rGet() method), and their instances to be used only once, for a single data transfer, and then disposed of (so no class instance is used for more than a single data transfer). This makes the data transfer stateless.

Each of the "In" classes implements the interface.

public interface IRInType {
    public boolean rPut(String rVar, RExecutor exec);
}

Here is a (simplified) class hierarchy for the "In" classes.

IRInType
    RListInType
    RNullInType
    RVectorInType
        RCharacterVectorInType
        RComplexVectorInType
        RDoubleVectorInType
        RIntegerVectorInType
        RLogicalVectorInType

Each of the "Out" classes implements the interface.

public interface IROutType  {
    public boolean rGet(RExecutor exec);
    
    public String getVariableNameOrCodeString();
    
    public ROutTypes getType();
    
    public IROutAttributes getAttributes();
}

The class hierarchy for "Out" classes is similar to that for "In" classes.

The class RLinkInit.java contains methods that perform the project initialization. There can be at most a single instance of it active at any given time, which is realized through the use of a singleton design pattern. The R runtime is started by the method installR of the RLinkInit.java class.

public static synchronized boolean installR(String[] args)

This sets the relevant environmental variables, loads shared libraries, and starts the R runtime. The shared libraries are loaded in this order: first, jri library is loaded by the code in JRI.jar Java library. Then, R and Rblas (and possibly other shared libraries) are loaded by jri.

Wolfram Language Implementation

Wolfram Language implementation consists of several components, or layers. The first component implements an automatic type identification/function dispatch. So this module binds proper Java classes with Wolfram Language functions, based on the type information.

The second module uses the automatic dispatch to actually implement data exchange between R and the Wolfram Language. For the data coming to R, the type of each element in the data is determined by the analysis performed on the Wolfram Language side and based on the special internal form of expressions representing the data used by RLink. For the data coming from R, types are determined by querying the type of R object (for R code strings) or Java types (for Java classes instances). As a result, the data gets automatically sent to R or received from R. For R lists, the procedure is recursive. The Wolfram Language side of data exchange operates on the internal (long) form of data, where data is represented by Wolfram Language expressions with the heads RList, RVector, RNull , RAttributes, RCode, REnvironment, and RFunction, and such expressions contain the full information needed to construct the corresponding R object, in the form that is convenient for sending the data to R.

Since the (longer) internal form used by RLink to represent the data is not very convenient to work with, there is also a conversion layer, which converts this long form to a shorter and more convenient form that better corresponds to the usual Wolfram Language workflow. It also converts the shorter form back to the longer one. The two functions that realize these conversions are ToRForm and FromRForm, which, while being internal-purpose functions, strictly speaking, are exposed to the end user, being convenient in some circumstances. This is also true for the internal form of expressionsthe user can use it instead of the short form, if so inclined.

There is also a top layer, implementing the top-level functions RSet, REvaluate, and function calls via function references (RFunction), as well as functions related to the RLink installation (InstallR and UninstallR). This layer is also concerned with error handling, both on the Wolfram Language side and the R side. Finally, there is a type extension system, consisting of the functions RDataTypeRegister, RDataTypeUnregister, RDataTypeDefinitionsReload, and a few others. This system allows the user to create new Wolfram Language representations (wrappers) for various R data types, and make RLink treat those on an equal footing with already-existing core data types handled by RLink.