R Data Types in RLink

R has a simple yet powerful type system. Being an interface between R and Mathematica, RLink implements a mapping between R types and Mathematica expressions. It is important to understand this mapping in some detail, in order to work with RLink effectively.

Simplified R Object Model

The following scheme illustrates the simplified object model of R, in the way it is used by RLink.

                |RCode
RObject     --> |RCoreObject + RAttributes
                |REnvironment

                |-- NULL                            
RCoreObject --> |-- RVEctor
                |-- RList
                |-- RFunction

RFunction    -->    |-- builtin
                |-- closure                
                
RVector        --> [(RNativeType|NA)..]

NA            -->    Missing element, can be at any position in a vector

                |--    integer
                |--    double
RNativeType --> |--    complex
                |--    logical: TRUE|FALSE        
                |--    character (string)
                
RList        -->    {RObject..}

RAttributes    --> RList

As you can see, within this model, any R object can be represented as an R vector, R list, R function, or R NULL object, plus any R object can have attributes, which themselves are stored in an R list. There are also objects of the types REnvironment and RCode, which represent R environments and generic R objects that do not have a special support in RLink, respectively. These two types are different from the rest in that they are used for representation (on the Mathematica side) only, and you generally cannot correctly reconstruct R objects from objects of those types.

Long and Short Forms of Data Representation in RLink

There are two different but equivalent ways in which RLink allows R objects to be represented by Mathematica expressions. One such representation is as close to a standard Mathematica way of representing similar objects as possible, and it is this form you will likely work with most of the time. Another representation is an internal RLink representation, which is typically longer and harder to read, but is completely unambiguous and better suited for communication with R. The closest analogy here is that the shorter form acts like Mathematica's InputForm, while the longer form is similar to Mathematica's FullForm (this is in fact a pretty close analogy).

Apart from a few special cases, which are detailed here, the mapping between the two forms is unambiguous and is realized by the functions ToRForm (short to long form) and FromRForm (long to short form). In the short form, R objects can have, apart from the usual Mathematica heads (List, Mathematica atoms—see "Atomic Objects"—, etc.), several special heads: RObject, RAttributes, RCode, REnvironment, and RFunction. The latter three heads are also present in the long form; in fact, they are not transformed in any way by ToRForm. The head RObject is a container used to carry the data for the attributes of a given R object (in cases where the set of attributes is non-empty), and the head RAttributes is a container to store the attributes. In the long form, R objects are represented by three additional heads: RVector, RList, and RNull, plus RAttributes, as before. The head RObject never appears in the long form.

You have to load the package before you can work with it.

Click for copyable input

A simple vector of integers can be represented in the short form.

Click for copyable input

Its long form is given by the following.

Click for copyable input

As another example, an integer matrix can be represented in the short form.

Click for copyable input

In the long form, it is given as follows.

Click for copyable input

You can use the functions ToRForm and FromRForm to convert one form to another.

Click for copyable input
Click for copyable input
Click for copyable input
Click for copyable input

Most of the time, you will not need the long form of R objects. It is, however, useful in some circumstances; in particular, sometimes you may want to check the interpretation of your short-form input by RLink.

For more examples of the short versus the long form of RLink expressions, see the reference pages for the functions ToRForm and FromRForm.

Automatic Type Detection

When you send some data to R through RLink, it tries to automatically detect the type of the data being sent. This is needed to map the data correctly to the type of R object where the data will be stored in your R session. What technically happens is that your input is transformed to a Mathematica expression giving the long internal form of it in RLink, as described in the previous section (thus, the type detection is a part of the ToRForm functionality). Then, RLink sends the data expressed in this internal form to your R session.

The type detection is based on the following set of rules:

1. Scalars (atomic elements) of the type String, Integer, Real, Complex, or True|False are interpreted as one-element vectors of the corresponding R type. The following table shows the correspondence between Mathematica and R basic types:

Integerinteger
Realdouble
Complexcomplex
Stringcharacter
True|Falselogical

Type correspondence between Mathematica and R for vector types.

2. Missing[] element, when found inside a (possibly nested) list representing an otherwise valid R vector, is interpreted as R missing element NA.

3. A list or regular array (list of lists ) of elements of the same basic type (String, Integer, Real, Complex, or True|False), with possibly some Missing[] elements, is considered an R vector. If it is a multidimensional array, the "dims" attribute for a resulting R object is added, storing the dimensions of the array.

4. Any other list of elements (including lists of elements of different types, or lists of non-atomic elements) is interpreted as an R list, provided that the elements themselves have a valid RLink interpretation (meaning that this type identification procedure is applied to them recursively).

5. Mathematica Null is interpreted as an R NULL object (represented by the head RNull).

6. Any data carrying explicit R attributes must be entered as RObject[data,RAttributes["name1":> value1,...]]. The type of such data is determined by the type of data. The values value1 etc. for the attributes must themselves have valid RLink interpretation (they can be any R objects supported by RLink).

7. Elements with the heads RCode, REnvironment, and RFunction are not transformed in any way (except for the attributes possibly present in them); in other words, their short and long forms coincide.

There are some ambiguities in the scheme just described. They are important enough to warrant a separate section, "Type Detection Ambiguities".

Any Mathematica expression that cannot be interpreted with these rules does not constitute a valid R object representation from RLink's viewpoint, and cannot be communicated to R via RLink. An attempt to call ToRForm on such an expression will result in an error ($Failed will be returned).

Vectors

R vectors are a core data type in R, combining collections of elements of the same basic types. The types supported by RLink are integer, double, complex, logical, and character. Note that multidimensional arrays are also represented in R by vectors, where the dimensions are specified via a special attribute "dim". On the Mathematica side, R vectors are represented as (possibly nested) lists, just in the usual way.

First, load the package and install the R runtime.

Click for copyable input

You can enter a vector of integer.

Click for copyable input

Its internal form is as follows.

Click for copyable input

The long form of an R vector will always have the head RVector. The first element inside this head is a string giving the vector type, the second is a one-dimensional list of data, and the last is a container for attributes possibly attached to a vector, RAttributes.

Vectors can contain missing elements, represented by Missing[].

Click for copyable input
Click for copyable input

Multidimensional arrays can also be entered normally.

Click for copyable input

Dimensions of an array are stored in the "dim" attribute (which corresponds to how such arrays are handled in R).

Click for copyable input

One important difference to note here is that while Mathematica stores arrays in row-major order (which is also how they are stored in C, for example) , R stores them in column-major order (similar to Fortran). When an array is sent to R, it is converted to the column-major form. In the preceding example, it is reflected in the data list inside the long form of an array being reshuffled with respect to what you would get by calling Flatten (for example) on an array. When an array is sent back from R to Mathematica, it is converted back to a row-major order. This allows you to work with arrays consistently in Mathematica and R. This topic is discussed in more detail in the documentation for REvaluate. For more examples of how R vectors are represented in RLink, see the RVector documentation.

Lists

R lists are containers for more general, possibly heterogeneous, collections of R objects. Elements of R lists can be any R objects, including other R lists. In the context of the simplified R object model used by RLink, this means that R lists can contain R vectors, R NULL elements, other R lists, R function references, R environment objects, and other R objects represented by expressions with the head RCode.

Normally, you can enter an R list as a Mathematica list.

Click for copyable input

Any R list is represented by RLink internally as a Mathematica expression with the head RList. For the previous example, here is the long form.

Click for copyable input

As you can see, elements of this list were interpreted as length-1 R vectors.

There is one important case, however, when a list will be interpreted by RLink differently: as previously discussed, this is when its elements are all of the same basic type; in that case, the entered list is interpreted as a vector. This interpretation ambiguity will be addressed in more detail in "Type Detection Ambiguities". See the RList documentation page for more examples of how RLink treats R lists.

Null

RLink represents the R NULL object internally as the Mathematica expression RNull[]. Mathematica Null is interpreted as RNull[] as well.

Click for copyable input

This is true in both directions.

Click for copyable input

It may be worth mentioning for Mathematica users that an R NULL object plays a role in R somewhat similar to a combination of Null and Sequence[] in Mathematica. In particular, setting an element of an R list to NULL in R will effectively shrink the list, just as Sequence[] would in Mathematica. However, in other instances NULL is used in R in ways similarly to Null in Mathematica.

Attributes of R Objects

Any object in R may have one or more attributes. An attribute is a key-value pair, where the key is a string (name of the attribute), while the value can be any R object. Attributes themselves are stored in an R list, linked to a given object.

Attributes play an important role in R. In particular, for matrices and multidimensional arrays, the attribute "dim" stores the dimensions of a given array. For any R object, the attribute "class" (when present) stores the information about the class of which this object is an instance. In both these examples, a great flexibility is achieved because attributes can be changed dynamically. This means that you can perform complex array reshuffling quite easily by simply manipulating the "dim" attribute, and you can change the class of a given object at run time, something not possible in most OO languages.

RLink uses the head RAttributes as a container for the attributes of a given R object. Attributes themselves are entered as delayed rules, with the string lhs of the rule being a name of an attribute, and the rhs being the value. When your input represents objects that do not have explicit attributes (the "dim" attribute is inferred from the dimensions of an array and does not need to be added explicitly), you do not have to use RAttributes. However, internally, it is used in all cases.

For example, a simple vector has an empty set of attributes.

Click for copyable input

In cases when you will need to provide explicit attributes to an R object, the latter being represented otherwise in the short form by some data data, you will have to use an RObject head (container), wrapping it around data, and adding an RAttributes container with attributes as a second element.

For example, you want to add an attribute "myAttr" with a value being another list of integers. Here is how.

Click for copyable input

Note that RObject is a container used for the short form of an R object. RObject never shows up in the long form, because any data handled by RLink that uses RObject will be a list, vector, or NULL, and in the long form will be represented by the heads RList, RVector, or RNull.

As an example, here is the long form of the preceding object.

Click for copyable input

As you see, the value of the attribute was itself transformed into the long form. Of course, the reverse transformation brings you back to the original object in its short form.

Click for copyable input

You can, if you like, use the long form in all your communications with R (through functions such as REvaluate and RSet), in which case you will never need RObject (which is the only non-system head used only for the short form representation of an R object).

As a more interesting example, consider conversion of a given integer vector into an R table, returning the latter to a Mathematica session. This generates a list of random integers (an R vector).

Click for copyable input

This sends it to R, assigning it to a variable rnd in the R workspace.

Click for copyable input

This computes the frequencies of elements and returns a table object (RLink representation of it).

Click for copyable input

You can see that the list of attributes is non-empty, RObject is used for a short representation of the result, and moreover, one of the attribute values is itself an R object with a non-empty set of attributes, also represented by RObject head.

Environments

R environments are a separate data type in R. They are used as a fundamental mechanism behind encapsulation. Every R function is defined in a certain environment and has access to variables defined in that environment. RLink currently has a very limited support for environments. Basically, every environment explicitly appearing as a part of some R object is represented by RLink as REnvironment[], meaning that the information about non-global environments is lost during the import to Mathematica. Therefore, R objects referring to non-global environments cannot be exported back to R from Mathematica. Closures are fully supported however, through the mechanism of function references.

This will return the current environment (which is global).

Click for copyable input

It has the type "environment".

Click for copyable input

This will create a closure.

Click for copyable input

This will query the environment of the closure, which cannot be a global one.

Click for copyable input

Thus, the information about this environment is now gone.

To summarize, environments are a special data type in R, used mostly by the inner workings of R. Sometimes, however, they are referred to explicitly by certain R objects. To be able to import these objects in Mathematica, RLink has a head REnvironment that is used to generically represent an environment object. However, it does not differentiate between environments, so objects explicitly referring to some non-global R environments cannot be correctly exported back to R. The exception to this rule are closures, which are handled by a different mechanism in RLink. More details on environments in RLink can be found in the reference page for REnvironment.

R Code in String Form

RLink does not support all core data types present in R. However, most of those data types that it does not support are usually not used for anything by the user (or, are used in rather special circumstances), and are mostly needed for R itself. In any case, it is useful to be able to import into Mathematica arbitrary R objects, whether or not they contain objects of unsupported data types as their elements. To do that, RLink uses the following strategy: when it sees an object of such an unsupported type, it constructs a string code representation of that object, so that when this string is parsed and evaluated on the R side (R functions parse and eval), the original R object gets reconstructed. The R function deparse is used to construct such a string. Not all R objects will be correctly reconstructed by this procedure (environments are one notable exception and cannot be deparsed), but most will. The resulting deparsed code string is returned to Mathematica, wrapped in an RCode wrapper.

One particular example of this procedure at work is when you create function references. For example, for the R built-in function rank (which is partly implemented in the R top-level code), you can obtain the reference.

Click for copyable input

You can now look at the FullForm of this reference.

Click for copyable input

What you see here is the code of the factor wrapped in RCode, obtained through the deparsing procedure. You can extract this code in a more eye-friendly form.

Click for copyable input

You could, in principle, use that code to define a function manually in the R workspace.

One point to stress here is that R objects represented by RCode[code] are not generally guaranteed to identically reconstruct the original R objects when exported back to R, although in many cases they will. More details on such objects can be found in the reference page for RCode.

Function References

Function references are RLink's mechanism to represent R functions (both built-in and written in R) and enable you to call them with Mathematica arguments, from within Mathematica. Formally, they represent in RLink R objects of the types "builtin" and "closure". Both types are fully supported in RLink, in the sense that they can be retrieved from R, sent to R, and used as parts of other R objects, etc. There is a separate tutorial, "Functions", describing function references in detail; here only a couple of examples will be considered for an illustration.

This creates a function reference for a simple user-defined function.

Click for copyable input

It can now be used.

Click for copyable input

You could have created a similar reference via REvaluate.

Click for copyable input

And then it could have been used.

Click for copyable input

There are some differences between the two references created previously, and the method based on RFunction is preferred, although the method based on REvaluate is more general (for example, REvaluate can return closures—functions that are returned by other functions as their results—while the method based on RFunction normally should not be used to create closures). This is described in more detail in "Functions".

You can also create function references for built-in (primitive) R functions.

Click for copyable input

Those can be used as well.

Click for copyable input

All function references have the head RFunction.

Click for copyable input

Also, not all of them are changed under the action of ToRForm and FromRForm.

Click for copyable input

More details on function references can be found in the reference page for RFunction and in "Functions".

Other (Non-core) Data Types

R has a powerful type system, and the types described here represent only the (subset of) core R types. You may be wondering about some other types not covered here, such as factors and data frames, for example. Since these (and other non-core) types are subtypes of some of the core types (e.g. factors are integer vectors, and data frames are lists), RLink can work with these objects. However, having to always work with the most general form of them may not be convenient. To address this issue, RLink has a type extension system, which is described in "Type Detection Ambiguities". Also, RLink comes with a very basic support for factor and data frame data types, implemented using this type extension system. For other data types, this system provides the means for the user to add support, without affecting the code of the core RLink.

Type Detection Ambiguities, and How to Force a Given Data Interpretation

Vectors versus Lists

The single most important ambiguity in the way RLink interprets the input data is when you provide a (possibly nested) list of elements of the same basic types, which, in principle, can be interpreted both as an R vector and as an R list. The default behavior of RLink is then to pick the R vector interpretation. You have seen examples of this already in "Vectors".

Sometimes, however, you may wish to force the R list interpretation for such objects. You should really think carefully before doing so, since RLink is much less efficient with R lists than with R vectors, as discussed in "Performance—Tuning" in the RLink user guide. But, assuming that this is what you would like to do, here is how: you have to explicitly use the RList head, wrapping your data as RList[data, RAttributes[]].

For example, consider a list of integers.

Click for copyable input

This will be interpreted as a vector by default.

Click for copyable input

Here the R list interpretation is forced.

Click for copyable input

And now the input list is interpreted as an R list, whose elements are one-element vectors (since R does not have scalars of basic types, treating those as one-element vectors).

Click for copyable input

Similar ambiguities happen for multidimensional lists.

Click for copyable input

These are interpreted by default as a multidimensional array with one singleton dimension.

Click for copyable input

By using the same construct, you can force a list interpretation.

Click for copyable input

For such inputs, the composition of ToRForm and FromRForm will not give a result that is identical to the input, as it would in most other cases.

Click for copyable input

Care must be taken only when sending the data to R, since expressions received from R will always be the same.

Click for copyable input

Scalars

Another ambiguity worth mentioning is that the scalars of the fundamental type, when used as input data on the Mathematica side, are always interpreted as one-element vectors of the corresponding R vector type.

Click for copyable input

This is consistent with the R interpretation of such data, but has a side effect that when returned back to Mathematica, such scalars are wrapped in an extra List.

Click for copyable input

You should keep that in mind when working with RLink.

Extending RLink Type System by Defining Your Own Data Types

RLink type system is designed to be user extensible. This is important, since R itself is a very extensible language/system, and having support for just the core types may not be enough for it to work conveniently with many extended R data types. This section explains how you can extend the core type system with new data types.

Note that the discussion in this section is somewhat more technical and less formal than in the rest of the RLink documentation. This section is aimed at more advanced users, since extending the type system is generally a more advanced task than simply using RLink, and the user who is extending the type system becomes involved, in a (purely technical) sense, in further development of RLink itself.

Examples: Type Extensions Already Present in RLink—Data Frames and Factors

Apart from the end-user convenience, the type extension system is used by RLink itself to implement some R data types, such as factors and data frames. Since factors are actually integer vectors, and data frames are lists, neither one has to be among the core data types supported by RLink. However, the core R object representation based on the RObject head that RLink would generically provide for them will often be inconvenient, particularly if you would like to define or overload certain functions specifically on these types.

First consider how it works. The following will construct a simple data frame.

Click for copyable input

You can notice the heads RDataFrame, RNames, RData, RFactor, and RFactorLevels, which are not in the core RLink API. These are heads representing the API for data frames and factors, implemented via the RLink type extension system. They live in the contexts RLink`DataTypes`Base` and RLink`DataTypes`Common`.

Click for copyable input

They are defined in /Kernel/DataTypes/Base.m and /Kernel/DataTypes/Common.m packages within the RLink project, respectively.

The core RLink representation of the preceding data frame is easy to obtain: you can temporarily unregister the data frame and factor types.

Click for copyable input

Evaluating the same code, you obtain the core RLink representation.

Click for copyable input

The advantages of having the representation based on extra heads are several. One is stronger typing, since the type is then associated with a specific head. Another is the ability to define and/or overload a number of helper functions. Working always with RObject would require more complex patterns for such definitions, which would be slower to match and to get right in the first place. Besides, overloading some system functions would require adding rules to RObject (UpValues), which is both undesirable and prone to errors.

To illustrate, first you need to re-enable the definitions that were disabled previously with RDataTypeUnregister. The easiest way to do that is to call the RDataTypeDefinitionsReload function, which dynamically reloads all extended data types definitions RLink knows about.

Click for copyable input

Some functionality can now be illustrated. First, it is possible to extract some parts of the data frame. For example, this extracts the data from the data frame.

Click for copyable input

This extracts the names.

Click for copyable input

This extracts the row names.

Click for copyable input

You can extract the factor(s) present in the data.

Click for copyable input

You can also extract data from a factor.

Click for copyable input

Note that RGetData itself does not carry any rules.

Click for copyable input

The rules are attached to the heads representing specific types (RDataFrame or RFactor here). This means that different data types can safely overload the same generic heads without being concerned about how they were overloaded by other types. This would not be possible had you worked with the core representation based on RObject only, since in that case, either the functions you overload or the RObject head itself would have to accumulate rules from various data types. This may not look like a big issue, but this is what determines whether or not the type system is truly extensible, since the prerequisite for extensibility is that two different users should be able to extend the system with two different new types and be guaranteed that these new types will work in concert without consulting each other's implementations.

Apart from selectors, you can also implement data transformations. For example, the following will convert factors to integer vectors in the previously defined data frame.

Click for copyable input

You can also define or overload various functions for display and representation of a given data type. Here is an example.

Click for copyable input

The newly defined data types become first-class citizens in the RLink type system, in the sense that they can be used in all high-level functions (RSet, RFunction, returned by REvaluate). For example, you can assign the preceding data frame to some other variable in R workspace.

Click for copyable input

You can now use it with any R code you like. For example, you can use it to filter only records for those older than 20 years old.

Click for copyable input

You can also pass it as an argument to functions.

Click for copyable input

But most of all, you can add more functionality to your data type, without thinking about the rest of the system.

Defining a Simple Data Type Interactively

This section explains how to define a new data type and make RLink know about it. There are two ways of doing it: you can execute the relevant code to register a new data type interactively (in the front end), in which case the definitions will be available to RLink only for a current RLink session. You can also place those definitions into a .m file (package), and make RLink know where it is, in which case these definitions will be loaded when RLink starts (InstallR) and can be reloaded at any time with the RDataTypeDefinitionsReload function.

The plan now is to first illustrate an interactive way of registering a data type and then look at how to make these definitions persistent. To register a data type, you need to call the RDataTypeRegister function. As an example, definitions for a simple new type will now be constructed. The type identity will be conveyed by the "class" attribute, which is used in R to identify an object as an instance of one or another class.

This will register a very simple data type (wrapper) that wraps around some core R data type, such as a vector.

Click for copyable input

Note that RDataTypeRegister has five arguments. The first gives the name of the type (which is strongly recommended to be a string, although not strictly required), the second gives the "high-level pattern" that should be identified with instances of this type, the third gives the transformation rule to transform such high-level representation to the lower-level RObject-based representation, the fourth is the pattern that should match the representation based on RObject, and the last one is the "reverse rule" to convert the representation based on RObject to the higher-level one. The RInstanceOf function is a helper function that is defined in the RDataTypeTools.m package located in the /Kernel/DataTypes subfolder and tests whether an expression represents an instance of a given type, based on the value of the "class" attribute. There are a few other helper functions defined in that package that may be useful for defining your data types.

Note that while it is not required that the rules (and the dispatch mechanism) are necessarily tied to the value of the "class" attribute, it is strongly recommended to do it in this way, since this minimizes the chances of conflicting rules for different data types, and since it corresponds to the function dispatch mechanism of R.

You can test that the definition is now effective by using ToRForm.

Click for copyable input

You can now send your data to R, using the custom data container just defined.

Click for copyable input

The result is automatically converted according to the inverse conversion rule provided in the call to RDataTypeRegister previously, when returned back from R.

Click for copyable input

This will also work on derivative R objects obtained through manipulations with the original object that do not change its type (class in R).

Click for copyable input

You can use RDataTypeRegisteredQ to test whether or not the type is currently registered.

Click for copyable input

You can now unregister the data type.

Click for copyable input

The following will get back the usual representation based on RObject.

Click for copyable input

Since new types can be registered and unregistered dynamically, you can develop your data type interactively. It is just important to remember that the new definitions will not be registered until the old ones get unregistered.

Click for copyable input

Making Persistent Data Type Definitions

This section explains how to make the definitions you register with RLink become persistent, so that RLink can find them automatically, and you do not have to execute them manually in every RLink session. The way to do that is to store those definitions in a .m file (package) and let RLink know about its location.

The same example as before will be used. First, you have to create a file with the type definitions. The following creates a temporary directory where the sample definitions for new data types will be stored.

Click for copyable input

Here is the string version of the sample data type code used in the reference page for the RDataTypeRegister function, wrapped in a package (namespace).

Click for copyable input

Export this to create a file with this definition.

Click for copyable input

Now you are ready to reload the data type definitions, including the new file.

Click for copyable input

The option "AddToRDataTypePath" is used to add a list of directories where files with type definitions reside to the RLink search path.

You can test that the definition is now effective, by using ToRForm.

Click for copyable input

You can perform all other tests similarly to what was shown in the previous section, to see that the definition is fully working. In contrast to the example in the previous section, the MyNewType head now belongs to a specific context—namely, RLink`DataTypes`myNewType`—rather than to the Global` context. In general, the context that is assigned for heads describing your new types is up to you. You can even omit the BeginPackage-EndPackage, in which case the context assigned to these heads will be the current working context (usually Global`).

Of course, you can place more than one type definition in a single package (context), and this may often be a sensible thing to do—particularly when several data types are related in some way or use some common functionality, which can then be made private to that package.

InstallR also takes the "AddToRDataTypePath" option, so passing this option to InstallR (at the start of the RLink session) is enough to have your definitions loaded, and you will not need explicit calls to RDataTypeDefinitionsReload. Also, RDataTypeDefinitionsReload reloads type definitions dynamically and can be very useful for development and/or debugging of your external type definitions stored in files. However, after the call to RDataTypeDefinitionsReload, all interactively registered definitions will be removed—only those definitions that persist on disk (in files) will be loaded.

More Examples

Two less trivial examples are the already discussed implementations of the factor and data frame data types, located at /Kernel/DataTypes/Base.m. Currently, these implementations contain only very basic functionality, but can still illustrate how this is done. Here is, for example, the current implementation of factors. You can refer to the mentioned package for more details.

partWithMissing[expr_, inds_List] :=
With[{posNA = Position[inds, Missing[]]},
MapAt[Missing[] &, Part[expr, MapAt[1 &, inds, posNA]], posNA]];




ClearAll[RFactor];
RFactorQ[_RFactor] := True;
RFactorQ[r_RObject] := RInstanceOf["factor"][r];
RFactorQ[_] := False;

RFactor /:
    RGetFactorLevels[ RFactor[_List, RFactorLevels[levs__], a : _ : None]] := {levs};

RFactor /: RGetData[ RFactor[p_List, __]] := p;

RFactor /:
    RGetAttributes[RFactor[_List, RFactorLevels[__], a : (_RAttributes | None) : None]] :=
        RGetAllAttributes[a];
        

Clear[RFactorToVector];
RFactorToVector[f_RFactor] :=
With[{data = partWithMissing[RGetFactorLevels[f], RGetData[f]]},
FromRForm @ ToRForm @
RObject[data, RRemoveAttributes[RAttributes @@ RGetAttributes[f], {"class" , "levels"}]]
];

RFactorToVector[_] = $Failed;


(*             Register the type             *)

RDataTypeRegister["factor",
    
RFactor[_List, RFactorLevels[__], a : (_RAttributes | None) : None],

RFactor[p_List, RFactorLevels[levs__], a : (_RAttributes | None) : None] :>
RObject[p, RAddAttributes[a, {"levels" :> {levs}, "class" :> "factor"}]],

_RObject ? RFactorQ,

RObject[p_List, a_RAttributes] ? RFactorQ :>
RFactor[p,
    RFactorLevels @@ RExtractAttribute[a, "levels"],
    RRemoveAttributesComplete[a, {"levels", "class"}]
]
]
Click for copyable input
New to Mathematica? Find your learning path »
Have a question? Ask support »