# R Data Types in *RLink*

R has a simple yet powerful type system. Being an interface between R and the Wolfram Language, *RLink* implements a mapping between R types and Wolfram Language expressions. It is important to understand this mapping in some detail, in order to work with *RLink* effectively.

### Simplified R Object Model

The following scheme illustrates the simplified object model of R, in the way it is used by *RLink*.

|RCode

RObject --> |RCoreObject + RAttributes

|REnvironment

|-- NULL

RCoreObject --> |-- RVEctor

|-- RList

|-- RFunction

RFunction --> |-- builtin

|-- closure

RVector --> [(RNativeType|NA)..]

NA --> Missing element, can be at any position in a vector

|-- integer

|-- double

RNativeType --> |-- complex

|-- logical: TRUE|FALSE

|-- character (string)

RList --> {RObject..}

RAttributes --> RList

As you can see, within this model, any R object can be represented as an R vector, R list, R function, or R NULL object, plus any R object can have attributes, which themselves are stored in an R list. There are also objects of the types REnvironment and RCode, which represent R environments and generic R objects that do not have a special support in *RLink*, respectively. These two types are different from the rest in that they are used for representation (on the Wolfram Language side) only, and you generally cannot correctly reconstruct R objects from objects of those types.

### Long and Short Forms of Data Representation in *RLink*

There are two different but equivalent ways in which *RLink* allows R objects to be represented by Wolfram Language expressions. One such representation is as close to a standard Wolfram Language way of representing similar objects as possible, and it is this form you will likely work with most of the time. Another representation is an internal *RLink* representation, which is typically longer and harder to read, but is completely unambiguous and better suited for communication with R. The closest analogy here is that the shorter form acts like the Wolfram Language's InputForm, while the longer form is similar to the Wolfram Language's FullForm (this is in fact a pretty close analogy).

Apart from a few special cases, which are detailed here, the mapping between the two forms is unambiguous and is realized by the functions ToRForm (short to long form) and FromRForm (long to short form). In the short form, R objects can have, apart from the usual Wolfram Language heads (List, Wolfram Language atoms—see "Atomic Objects"—, etc.), several special heads: RObject, RAttributes, RCode, REnvironment, and RFunction. The latter three heads are also present in the long form; in fact, they are not transformed in any way by ToRForm. The head RObject is a container used to carry the data for the attributes of a given R object (in cases where the set of attributes is non-empty), and the head RAttributes is a container to store the attributes. In the long form, R objects are represented by three additional heads: RVector, RList, and RNull, plus RAttributes, as before. The head RObject never appears in the long form.

You have to load the package before you can work with it.

In[1]:= |

A simple vector of integers can be represented in the short form.

In[3]:= |

Out[3]= |

Its long form is given by the following.

In[4]:= |

Out[4]= |

As another example, an integer matrix can be represented in the short form.

In[5]:= |

Out[5]= |

In the long form, it is given as follows.

In[6]:= |

Out[6]= |

You can use the functions ToRForm and FromRForm to convert one form to another.

In[7]:= |

Out[7]= |

In[8]:= |

Out[8]= |

In[9]:= |

Out[9]= |

In[10]:= |

Out[10]= |

Most of the time, you will not need the long form of R objects. It is, however, useful in some circumstances; in particular, sometimes you may want to check the interpretation of your short-form input by *RLink*.

For more examples of the short versus the long form of *RLink* expressions, see the reference pages for the functions ToRForm and FromRForm.

### Automatic Type Detection

When you send some data to R through *RLink*, it tries to automatically detect the type of the data being sent. This is needed to map the data correctly to the type of R object where the data will be stored in your R session. What technically happens is that your input is transformed to a Wolfram Language expression giving the long internal form of it in *RLink*, as described in the previous section (thus, the type detection is a part of the ToRForm functionality). Then, *RLink* sends the data expressed in this internal form to your R session.

The type detection is based on the following set of rules:

1. Scalars (atomic elements) of the type String, Integer, Real, Complex, or TrueFalse are interpreted as one-element vectors of the corresponding R type. The following table shows the correspondence between the Wolfram Language and R basic types:

Type correspondence between the Wolfram Language and R for vector types.

2. Missing[] element, when found inside a (possibly nested) list representing an otherwise valid R vector, is interpreted as R missing element NA.

3. A list or regular array (list of lists ) of elements of the same basic type (String, Integer, Real, Complex, or TrueFalse), with possibly some Missing[] elements, is considered an R vector. If it is a multidimensional array, the "dims" attribute for a resulting R object is added, storing the dimensions of the array.

4. Any other list of elements (including lists of elements of different types, or lists of non-atomic elements) is interpreted as an R list, provided that the elements themselves have a valid *RLink* interpretation (meaning that this type identification procedure is applied to them recursively).

5. The Wolfram Language Null is interpreted as an R NULL object (represented by the head RNull).

6. Any data carrying explicit R attributes must be entered as RObject[data,RAttributes["name1":> value1,…]]. The type of such data is determined by the type of data. The values value1 etc. for the attributes must themselves have valid *RLink* interpretation (they can be any R objects supported by *RLink*).

7. Elements with the heads RCode, REnvironment, and RFunction are not transformed in any way (except for the attributes possibly present in them); in other words, their short and long forms coincide.

There are some ambiguities in the scheme just described. They are important enough to warrant a separate section, "Type Detection Ambiguities".

Any Wolfram Language expression that cannot be interpreted with these rules does not constitute a valid R object representation from *RLink*'s viewpoint, and cannot be communicated to R via *RLink*. An attempt to call ToRForm on such an expression will result in an error ($Failed will be returned).

### Vectors

R vectors are a core data type in R, combining collections of elements of the same basic types. The types supported by *RLink* are integer, double, complex, logical, and character. Note that multidimensional arrays are also represented in R by vectors, where the dimensions are specified via a special attribute "dim". On the Wolfram Language side, R vectors are represented as (possibly nested) lists, just in the usual way.

First, load the package and install the R runtime.

In[1]:= |

You can enter a vector of integer.

In[3]:= |

Out[3]= |

Its internal form is as follows.

In[4]:= |

Out[4]= |

The long form of an R vector will always have the head RVector. The first element inside this head is a string giving the vector type, the second is a one-dimensional list of data, and the last is a container for attributes possibly attached to a vector, RAttributes.

Vectors can contain missing elements, represented by Missing[].

In[5]:= |

Out[5]= |

In[6]:= |

Out[6]= |

Multidimensional arrays can also be entered normally.

In[7]:= |

Out[7]= |

Dimensions of an array are stored in the "dim" attribute (which corresponds to how such arrays are handled in R).

In[8]:= |

Out[8]= |

One important difference to note here is that while the Wolfram Language stores arrays in row-major order (which is also how they are stored in C, for example) , R stores them in column-major order (similar to Fortran). When an array is sent to R, it is converted to the column-major form. In the preceding example, it is reflected in the data list inside the long form of an array being reshuffled with respect to what you would get by calling Flatten (for example) on an array. When an array is sent back from R to the Wolfram Language, it is converted back to a row-major order. This allows you to work with arrays consistently in the Wolfram Language and R. This topic is discussed in more detail in the documentation for REvaluate. For more examples of how R vectors are represented in *RLink*, see the RVector documentation.

### Lists

R lists are containers for more general, possibly heterogeneous, collections of R objects. Elements of R lists can be any R objects, including other R lists. In the context of the simplified R object model used by *RLink*, this means that R lists can contain R vectors, R NULL elements, other R lists, R function references, R environment objects, and other R objects represented by expressions with the head RCode.

First, load the package and install the R runtime.

In[1]:= |

Normally, you can enter an R list as a Wolfram Language list.

In[3]:= |

Out[3]= |

Any R list is represented by *RLink* internally as a Wolfram Language expression with the head RList. For the previous example, here is the long form.

In[4]:= |

Out[4]= |

As you can see, elements of this list were interpreted as length-1 R vectors.

There is one important case, however, when a list will be interpreted by *RLink* differently: as previously discussed, this is when its elements are all of the same basic type; in that case, the entered list is interpreted as a vector. This interpretation ambiguity will be addressed in more detail in "Type Detection Ambiguities". See the RList documentation page for more examples of how *RLink* treats R lists.

### Null

*RLink* represents the R NULL object internally as the Wolfram Language expression RNull[].

First, load the package and install the R runtime.

In[1]:= |

The Wolfram Language* *Null is interpreted as RNull[] as well.

In[3]:= |

Out[3]= |

This is true in both directions.

In[4]:= |

Out[4]//InputForm= | |

It may be worth mentioning for Wolfram Language users that an R NULL object plays a role in R somewhat similar to a combination of Null and Sequence[] in the Wolfram Language. In particular, setting an element of an R list to NULL in R will effectively shrink the list, just as Sequence[] would in the Wolfram Language. However, in other instances NULL is used in R in ways similarly to Null in the Wolfram Language.

### Attributes of R Objects

Any object in R may have one or more attributes. An attribute is a key-value pair, where the key is a string (name of the attribute), while the value can be any R object. Attributes themselves are stored in an R list, linked to a given object.

Attributes play an important role in R. In particular, for matrices and multidimensional arrays, the attribute "dim" stores the dimensions of a given array. For any R object, the attribute "class" (when present) stores the information about the class of which this object is an instance. In both these examples, a great flexibility is achieved because attributes can be changed dynamically. This means that you can perform complex array reshuffling quite easily by simply manipulating the "dim" attribute, and you can change the class of a given object at run time, something not possible in most OO languages.

*RLink* uses the head RAttributes as a container for the attributes of a given R object. Attributes themselves are entered as delayed rules, with the string lhs of the rule being a name of an attribute, and the rhs being the value. When your input represents objects that do not have explicit attributes (the "dim" attribute is inferred from the dimensions of an array and does not need to be added explicitly), you do not have to use RAttributes. However, internally, it is used in all cases.

First, load the package and install the R runtime.

In[1]:= |

For example, a simple vector has an empty set of attributes.

In[3]:= |

Out[3]= |

In cases when you will need to provide explicit attributes to an R object, the latter being represented otherwise in the short form by some data data, you will have to use an RObject head (container), wrapping it around data, and adding an RAttributes container with attributes as a second element.

For example, you want to add an attribute "myAttr" with a value being another list of integers. Here is how.

In[4]:= |

Out[4]= |

Note that RObject is a container used for the short form of an R object. RObject never shows up in the long form, because any data handled by *RLink* that uses RObject will be a list, vector, or NULL, and in the long form will be represented by the heads RList, RVector, or RNull.

As an example, here is the long form of the preceding object.

In[5]:= |

Out[5]= |

As you see, the value of the attribute was itself transformed into the long form. Of course, the reverse transformation brings you back to the original object in its short form.

In[6]:= |

Out[6]= |

You can, if you like, use the long form in all your communications with R (through functions such as REvaluate and RSet), in which case you will never need RObject (which is the only non-system head used only for the short form representation of an R object).

As a more interesting example, consider conversion of a given integer vector into an R table, returning the latter to a Wolfram System session. This generates a list of random integers (an R vector).

In[7]:= |

Out[7]= |

This sends it to R, assigning it to a variable rnd in the R workspace.

In[8]:= |

Out[8]= |

This computes the frequencies of elements and returns a table object (*RLink* representation of it).

In[9]:= |

Out[9]= |

You can see that the list of attributes is non-empty, RObject is used for a short representation of the result, and moreover, one of the attribute values is itself an R object with a non-empty set of attributes, also represented by RObject head.

### Environments

R environments are a separate data type in R. They are used as a fundamental mechanism behind encapsulation. Every R function is defined in a certain environment and has access to variables defined in that environment. *RLink* currently has a very limited support for environments. Basically, every environment explicitly appearing as a part of some R object is represented by *RLink* as REnvironment[], meaning that the information about non-global environments is lost during the import to the Wolfram Language*.* Therefore, R objects referring to non-global environments cannot be exported back to R from the Wolfram Language. Closures are fully supported however, through the mechanism of function references.

First, load the package and install the R runtime.

In[1]:= |

This will return the current environment (which is global).

In[3]:= |

Out[3]= |

It has the type "environment".

In[4]:= |

Out[4]= |

In[5]:= |

Out[5]= |

This will query the environment of the closure, which cannot be a global one.

In[6]:= |

Out[6]= |

Thus, the information about this environment is now gone.

To summarize, environments are a special data type in R, used mostly by the inner workings of R. Sometimes, however, they are referred to explicitly by certain R objects. To be able to import these objects in the Wolfram Language, *RLink* has a head REnvironment that is used to generically represent an environment object. However, it does not differentiate between environments, so objects explicitly referring to some non-global R environments cannot be correctly exported back to R. The exception to this rule are closures, which are handled by a different mechanism in *RLink*. More details on environments in *RLink* can be found in the reference page for REnvironment.

### R Code in String Form

*RLink* does not support all core data types present in R. However, most of those data types that it does not support are usually not used for anything by the user (or, are used in rather special circumstances), and are mostly needed for R itself. In any case, it is useful to be able to import into the Wolfram Language arbitrary R objects, whether or not they contain objects of unsupported data types as their elements. To do that, *RLink* uses the following strategy: when it sees an object of such an unsupported type, it constructs a string code representation of that object, so that when this string is parsed and evaluated on the R side (R functions parse and eval), the original R object gets reconstructed. The R function deparse is used to construct such a string. Not all R objects will be correctly reconstructed by this procedure (environments are one notable exception and cannot be deparsed), but most will. The resulting deparsed code string is returned to the Wolfram Language, wrapped in an RCode wrapper.

One particular example of this procedure at work is when you create function references.

First, load the package and install the R runtime.

In[1]:= |

For example, for the R built-in function rank (which is partly implemented in the R top-level code), you can obtain the reference.

In[2]:= |

Out[2]= |

You can now look at the FullForm of this reference.

In[3]:= |

Out[3]//FullForm= | |

What you see here is the code of the factor wrapped in RCode, obtained through the deparsing procedure. You can extract this code in a more eye-friendly form.

In[4]:= |

Out[4]= |

You could, in principle, use that code to define a function manually in the R workspace.

One point to stress here is that R objects represented by RCode[code] are not generally guaranteed to identically reconstruct the original R objects when exported back to R, although in many cases they will. More details on such objects can be found in the reference page for RCode.

### Function References

Function references are *RLink*'s mechanism to represent R functions (both built-in and written in R) and enable you to call them with Wolfram Language arguments, from within the Wolfram Language. Formally, they represent in *RLink* R objects of the types "builtin" and "closure". Both types are fully supported in *RLink*, in the sense that they can be retrieved from R, sent to R, and used as parts of other R objects, etc. There is a separate tutorial, "Functions", describing function references in detail; here only a couple of examples will be considered for an illustration.

First, load the package and install the R runtime.

In[1]:= |

This creates a function reference for a simple user-defined function.

In[3]:= |

Out[3]= |

In[4]:= |

Out[4]= |

You could have created a similar reference via REvaluate.

In[5]:= |

Out[5]= |

And then it could have been used.

In[6]:= |

Out[6]= |

There are some differences between the two references created previously, and the method based on RFunction is preferred, although the method based on REvaluate is more general (for example, REvaluate can return closures—functions that are returned by other functions as their results—while the method based on RFunction normally should not be used to create closures). This is described in more detail in "Functions".

You can also create function references for built-in (primitive) R functions.

In[7]:= |

Out[7]= |

In[8]:= |

Out[8]= |

All function references have the head RFunction.

In[9]:= |

Out[9]//FullForm= | |

Also, not all of them are changed under the action of ToRForm and FromRForm.

More details on function references can be found in the reference page for RFunction and in "Functions".

### Other (Non-core) Data Types

R has a powerful type system, and the types described here represent only the (subset of) core R types. You may be wondering about some other types not covered here, such as factors and data frames, for example. Since these (and other non-core) types are subtypes of some of the core types (e.g. factors are integer vectors, and data frames are lists), *RLink* can work with these objects. However, having to always work with the most general form of them may not be convenient. To address this issue, *RLink* has a type extension system, which is described in "Type Detection Ambiguities". Also, *RLink* comes with a very basic support for factor and data frame data types, implemented using this type extension system. For other data types, this system provides the means for the user to add support, without affecting the code of the core *RLink*.

### Type Detection Ambiguities, and How to Force a Given Data Interpretation

#### Vectors versus Lists

The single most important ambiguity in the way *RLink* interprets the input data is when you provide a (possibly nested) list of elements of the same basic types, which, in principle, can be interpreted both as an R vector and as an R list. The default behavior of *RLink* is then to pick the R vector interpretation. You have seen examples of this already in "Vectors"

Sometimes, however, you may wish to force the R list interpretation for such objects. You should really think carefully before doing so, since *RLink* is much less efficient with R lists than with R vectors, as discussed in "Performance—Tuning" in the *RLink* user guide. But, assuming that this is what you would like to do, here is how: you have to explicitly use the RList head, wrapping your data as RList[data,RAttributes[]].

First, load the package and install the R runtime.

In[1]:= |

For example, consider a list of integers.

In[3]:= |

Out[3]= |

This will be interpreted as a vector by default.

In[4]:= |

Out[4]= |

Here the R list interpretation is forced.

In[5]:= |

Out[5]= |

And now the input list is interpreted as an R list, whose elements are one-element vectors (since R does not have scalars of basic types, treating those as one-element vectors).

In[6]:= |

Out[6]= |

Similar ambiguities happen for multidimensional lists.

In[7]:= |

Out[7]= |

These are interpreted by default as a multidimensional array with one singleton dimension.

In[8]:= |

Out[8]= |

By using the same construct, you can force a list interpretation.

In[9]:= |

Out[9]= |

For such inputs, the composition of ToRForm and FromRForm will *not* give a result that is identical to the input, as it would in most other cases.

In[10]:= |

Out[10]= |

Care must be taken only when sending the data to R, since expressions received from R will always be the same.

In[11]:= |

Out[11]= |

#### Scalars

Another ambiguity worth mentioning is that the scalars of the fundamental type, when used as input data on the Wolfram Language side, are always interpreted as one-element vectors of the corresponding R vector type.

First, load the package and install the R runtime.

In[1]:= |

In[3]:= |

Out[3]= |

This is consistent with the R interpretation of such data, but has a side effect that when returned back to the Wolfram Language, such scalars are wrapped in an extra List.

In[4]:= |

Out[4]= |

You should keep that in mind when working with *RLink*.

## Extending *RLink* Type System by Defining Your Own Data Types

*RLink* type system is designed to be user extensible. This is important, since R itself is a very extensible language/system, and having support for just the core types may not be enough for it to work conveniently with many extended R data types. This section explains how you can extend the core type system with new data types.

Note that the discussion in this section is somewhat more technical and less formal than in the rest of the *RLink* documentation. This section is aimed at more advanced users, since extending the type system is generally a more advanced task than simply using *RLink*, and the user who is extending the type system becomes involved, in a (purely technical) sense, in further development of *RLink* itself.

### Examples: Type Extensions Already Present in *RLink*—Data Frames and Factors

Apart from the end-user convenience, the type extension system is used by *RLink* itself to implement some R data types, such as factors and data frames. Since factors are actually integer vectors, and data frames are lists, neither one has to be among the core data types supported by *RLink*. However, the core R object representation based on the RObject head that *RLink* would generically provide for them will often be inconvenient, particularly if you would like to define or overload certain functions specifically on these types.

First, load the package and install the R runtime.

In[1]:= |

Now consider how it works. The following will construct a simple data frame.

In[3]:= |

Out[3]= |

You can notice the heads RDataFrame, RNames, RData, RFactor, and RFactorLevels, which are not in the core *RLink* API. These are heads representing the API for data frames and factors, implemented via the *RLink* type extension system. They live in the contexts RLink`DataTypes`Base` and RLink`DataTypes`Common`.

In[4]:= |

Out[4]= |

They are defined in /Kernel/DataTypes/Base.m and /Kernel/DataTypes/Common.m packages within the *RLink* project, respectively.

The core *RLink* representation of the preceding data frame is easy to obtain: you can temporarily unregister the data frame and factor types.

In[5]:= |

Out[5]= |

Evaluating the same code, you obtain the core *RLink* representation.

In[6]:= |

Out[6]= |

The advantages of having the representation based on extra heads are several. One is stronger typing, since the type is then associated with a specific head. Another is the ability to define and/or overload a number of helper functions. Working always with RObject would require more complex patterns for such definitions, which would be slower to match and to get right in the first place. Besides, overloading some system functions would require adding rules to RObject (UpValues), which is both undesirable and prone to errors.

To illustrate, first you need to re-enable the definitions that were disabled previously with RDataTypeUnregister. The easiest way to do that is to call the RDataTypeDefinitionsReload function, which dynamically reloads all extended data types definitions *RLink* knows about.

In[7]:= |

Some functionality can now be illustrated. First, it is possible to extract some parts of the data frame. For example, this extracts the data from the data frame.

In[8]:= |

Out[8]= |

In[9]:= |

Out[9]= |

In[10]:= |

Out[10]= |

You can extract the factor(s) present in the data.

In[11]:= |

Out[11]= |

You can also extract data from a factor.

In[12]:= |

Out[12]= |

Note that RGetData itself does not carry any rules.

The rules are attached to the heads representing specific types (RDataFrame or RFactor here). This means that different data types can safely overload the same generic heads without being concerned about how they were overloaded by other types. This would not be possible had you worked with the core representation based on RObject only, since in that case, either the functions you overload or the RObject head itself would have to accumulate rules from various data types. This may not look like a big issue, but this is what determines whether or not the type system is truly extensible, since the prerequisite for extensibility is that two different users should be able to extend the system with two different new types and be guaranteed that these new types will work in concert without consulting each other's implementations.

Apart from selectors, you can also implement data transformations. For example, the following will convert factors to integer vectors in the previously defined data frame.

In[14]:= |

Out[14]= |

You can also define or overload various functions for display and representation of a given data type. Here is an example.

In[15]:= |

Out[15]//TableForm= | |

The newly defined data types become first-class citizens in the *RLink* type system, in the sense that they can be used in all high-level functions (RSet, RFunction, returned by REvaluate). For example, you can assign the preceding data frame to some other variable in R workspace.

In[16]:= |

Out[16]= |

You can now use it with any R code you like. For example, you can use it to filter only records for those older than 20 years old.

In[17]:= |

Out[17]= |

You can also pass it as an argument to functions.

In[18]:= |

Out[18]= |

But most of all, you can add more functionality to your data type, without thinking about the rest of the system.

### Defining a Simple Data Type Interactively

This section explains how to define a new data type and make *RLink* know about it. There are two ways of doing it: you can execute the relevant code to register a new data type interactively (in the front end), in which case the definitions will be available to *RLink* only for a current *RLink* session. You can also place those definitions into a .m file (package), and make *RLink* know where it is, in which case these definitions will be loaded when *RLink* starts (InstallR) and can be reloaded at any time with the RDataTypeDefinitionsReload function.

The plan now is to first illustrate an interactive way of registering a data type and then look at how to make these definitions persistent. To register a data type, you need to call the RDataTypeRegister function. As an example, definitions for a simple new type will now be constructed. The type identity will be conveyed by the "class" attribute, which is used in R to identify an object as an instance of one or another class.

First, load the package and install the R runtime.

In[1]:= |

This will register a very simple data type (wrapper) that wraps around some core R data type, such as a vector.

In[3]:= |

Note that RDataTypeRegister has five arguments. The first gives the name of the type (which is strongly recommended to be a string, although not strictly required), the second gives the "high-level pattern" that should be identified with instances of this type, the third gives the transformation rule to transform such high-level representation to the lower-level RObject-based representation, the fourth is the pattern that should match the representation based on RObject, and the last one is the "reverse rule" to convert the representation based on RObject to the higher-level one. The RInstanceOf function is a helper function that is defined in the RDataTypeTools.m package located in the /Kernel/DataTypes subfolder and tests whether an expression represents an instance of a given type, based on the value of the "class" attribute. There are a few other helper functions defined in that package that may be useful for defining your data types.

Note that while it is not required that the rules (and the dispatch mechanism) are necessarily tied to the value of the "class" attribute, it is strongly recommended to do it in this way, since this minimizes the chances of conflicting rules for different data types, and since it corresponds to the function dispatch mechanism of R.

You can test that the definition is now effective by using ToRForm.

In[4]:= |

Out[4]= |

You can now send your data to R, using the custom data container just defined.

In[5]:= |

Out[5]= |

The result is automatically converted according to the inverse conversion rule provided in the call to RDataTypeRegister previously, when returned back from R.

In[6]:= |

Out[6]= |

This will also work on derivative R objects obtained through manipulations with the original object that do not change its type (class in R).

In[7]:= |

Out[7]= |

You can use RDataTypeRegisteredQ to test whether or not the type is currently registered.

In[8]:= |

Out[8]= |

You can now unregister the data type.

In[9]:= |

Out[10]= |

The following will get back the usual representation based on RObject.

In[11]:= |

Out[11]= |

Since new types can be registered and unregistered dynamically, you can develop your data type interactively. It is just important to remember that the new definitions will not be registered until the old ones get unregistered.

In[12]:= |

### Making Persistent Data Type Definitions

This section explains how to make the definitions you register with *RLink* become persistent, so that *RLink* can find them automatically, and you do not have to execute them manually in every *RLink* session. The way to do that is to store those definitions in a .m file (package) and let *RLink* know about its location.

The same example as before will be used. First, you have to create a file with the type definitions.

Before anything else, load the package and install the R runtime.

In[1]:= |

The following creates a temporary directory where the sample definitions for new data types will be stored.

In[3]:= |

Here is the string version of the sample data type code used in the reference page for the RDataTypeRegister function, wrapped in a package (namespace).

In[5]:= |

Export this to create a file with this definition.

In[6]:= |

Out[6]= |

Now you are ready to reload the data type definitions, including the new file.

In[7]:= |

The option "AddToRDataTypePath" is used to add a list of directories where files with type definitions reside to the *RLink* search path.

You can test that the definition is now effective, by using ToRForm.

In[8]:= |

Out[8]= |

You can perform all other tests similarly to what was shown in the previous section, to see that the definition is fully working. In contrast to the example in the previous section, the MyNewType head now belongs to a specific context—namely, RLink`DataTypes`myNewType`—rather than to the Global` context. In general, the context that is assigned for heads describing your new types is up to you. You can even omit the BeginPackage-EndPackage, in which case the context assigned to these heads will be the current working context (usually Global`).

Of course, you can place more than one type definition in a single package (context), and this may often be a sensible thing to do—particularly when several data types are related in some way or use some common functionality, which can then be made private to that package.

InstallR also takes the "AddToRDataTypePath" option, so passing this option to InstallR (at the start of the *RLink* session) is enough to have your definitions loaded, and you will not need explicit calls to RDataTypeDefinitionsReload. Also, RDataTypeDefinitionsReload reloads type definitions dynamically and can be very useful for development and/or debugging of your external type definitions stored in files. However, after the call to RDataTypeDefinitionsReload, all interactively registered definitions will be removed—only those definitions that persist on disk (in files) will be loaded.

### More Examples

Two less trivial examples are the already discussed implementations of the factor and data frame data types, located at /Kernel/DataTypes/Base.m. Currently, these implementations contain only very basic functionality, but can still illustrate how this is done. Here is, for example, the current implementation of factors. You can refer to the mentioned package for more details.

partWithMissing[expr_, inds_List] :=

With[{posNA = Position[inds, Missing[]]},

MapAt[Missing[] &, Part[expr, MapAt[1 &, inds, posNA]], posNA]];

ClearAll[RFactor];

RFactorQ[_RFactor] := True;

RFactorQ[r_RObject] := RInstanceOf["factor"][r];

RFactorQ[_] := False;

RFactor /:

RGetFactorLevels[ RFactor[_List, RFactorLevels[levs__], a : _ : None]] := {levs};

RFactor /: RGetData[ RFactor[p_List, __]] := p;

RFactor /:

RGetAttributes[RFactor[_List, RFactorLevels[__], a : (_RAttributes | None) : None]] :=

RGetAllAttributes[a];

Clear[RFactorToVector];

RFactorToVector[f_RFactor] :=

With[{data = partWithMissing[RGetFactorLevels[f], RGetData[f]]},

FromRForm @ ToRForm @

RObject[data, RRemoveAttributes[RAttributes @@ RGetAttributes[f], {"class" , "levels"}]]

];

RFactorToVector[_] = $Failed;

(* Register the type *)

RDataTypeRegister["factor",

RFactor[_List, RFactorLevels[__], a : (_RAttributes | None) : None],

RFactor[p_List, RFactorLevels[levs__], a : (_RAttributes | None) : None] :>

RObject[p, RAddAttributes[a, {"levels" :> {levs}, "class" :> "factor"}]],

_RObject ? RFactorQ,

RObject[p_List, a_RAttributes] ? RFactorQ :>

RFactor[p,

RFactorLevels @@ RExtractAttribute[a, "levels"],

RRemoveAttributesComplete[a, {"levels", "class"}]

]

]