Visualizing XML Data
Creating a 3D Graphic from an XML File
The following example illustrates how to use Mathematica programming and SymbolicXML to visualize data in XML format. The molecule description markup language (MoDL) is an XML application that describes molecules. For details, see http://www.oasis-open.org/cover/modl.html. In this example, we convert a MoDL description of the methane molecule into a Mathematica 3D graphic.
The following is the MoDL file that contains the description of the methane molecule.
<head animation="on" clockperiod="5s" stepsize="1s" loop="true">
<meta name="title" content="Methane Dance" />
<atom radius="0.3" color="1 1 0" />
<atom radius="0.25" color="1 0 0" />
<atom type="C" id="c1" position="0 0 0.4" />
<atom type="H" id="h1" position="0.5 0.1 -0.4" />
<atom type="H" id="h2" position="-1 0.1 0.3" />
<atom type="H" id="h3" position="0.3 0.7 1" />
<atom type="H" id="h4" position="0.2 -0.9 0.8" />
<bond atom1="c1" atom2="h1" />
<bond atom1="c1" atom2="h2" />
<bond atom1="c1" atom2="h3" />
<bond atom1="c1" atom2="h4" />
<molecule type="CH4" id="m" />
<TRANSLATE object="m" t="0.2" position="-3 1 -2" />
<TRANSLATE object="m" t="0.4" position="-1 0 -3" />
<TRANSLATE object="m" t="0.6" position="1 -1 -1" />
<TRANSLATE object="m" t="0.8" position="1 0 1" />
<TRANSLATE object="m" t="1" position="0 0 0" />
<ROTATE object="m" t="0.2" axis="1 0 0" />
<ROTATE object="m" t="0.4" axis="0 0 1" angle="-1.571"/>
<ROTATE object="m" t="0.6" axis="0 -1 0" />
<ROTATE object="m" t="0.8" axis="-1 0 0" angle="-0.78" />
<ROTATE object="m" t="1" axis="0 0 1" />
Here we import the file into Mathematica in the form of a SymbolicXML expression.
In order to convert the resulting SymbolicXML expression into a Graphics3D expression, we will need the standard package Graphics`Shapes`.
The following code defines a function called MoDLToGraphics3D that turns the SymbolicXML expression into a Graphics3D expression. This function relies on a number of auxiliary functions that are defined in the later part of this section, which deals with the details of implementation.
Applying this function to the original SymbolicXML expression generates a 3D graphic representing the methane molecule.
The details of implementation of the MoDLToGraphics3D function, which performs the actual transformation from SymbolicXML to a 3D graphic, are provided below.
Notice that the original MoDL file contains a head and a body. In the head, a number of definitions are made, which are used throughout the body. We have extracted these definitions into the variable defs. We then map the function ProcessDefinition across the list of definitions. The function ProcessDefinition constructs a Mathematica expression out of a definition and stores it in the variable moldef, which is dynamically scoped inside of MoDLToGraphics3D.
A DEFINE element in the head typically defines either an atom or a molecule. First, consider an atom definition.
The DEFINE element essentially associates a unique key (in this case C) to an atom element. The atom element specifies its color and radius. We will turn this into a Mathematica expression of the form Atom[radius, color]. We will then store it in moldef[name], where name is the key specified in the name attribute of the DEFINE element.
In this case, a is the entire sequence of attributes of the atom element. GetRad and GetColor are functions we will define later which extract the radius and color from this sequence. For now, assume that GetRad returns a number and that GetColor returns an RGBColor expression. We now need to process definitions of molecule elements. Like the atom definitions, molecules are given a unique key in the name attribute of the DEFINE element. The molecule element then contains atom elements and bond elements.
The atom elements contain three attributes: type, id, and position. The type attribute references the key from previous atom definitions. The id attribute is a unique key for this instance of the type of atom defined. In other words, what was defined previously in the atom definitions were types of atoms, like carbon or hydrogen. The atom elements inside a molecule element represent a distinct atom of some previously defined type.
The molecule element also contains bond elements. These have two attributes: atom1 and atom2. These reference the id of the atom elements in that molecule expression.
When we call ProcessDefinition on a molecule definition, we will want to store a list of the atoms and bonds in moldef.
In the definition of ProcessDefinition, subdef is the list of atoms and bonds. We map ProcessSubdef onto this list. That is, what we assign to moldef[name] is a list of the result of ProcessSubdef on each atom and bond element in the molecule. When ProcessSubdef is called on an atom, it extracts that atom's type from moldef, appends the position to that expression, and stores the results under moldef[id]. When ProcessSubdef is called on a bond, it simply returns a Bond expression containing the positions of the two atoms it references.
We need an auxiliary function before we define GetRad, GetColor, and GetPos. Since positions are written as space-separated lists of numbers in MoDL, we first write a function which turns this string into a Mathematica list.
The functions GetPos, GetRad, and GetColor should take in a sequence of attributes of any length and create a list from the position attribute. In SymbolicXML, attributes are stored as Mathematica rules. Both GetPos and GetColor will need to use MolStringListToList. GetRad needs only to convert a string to a number.
Here is the definition of MoDLToGraphics3D again for reference.
Block scopes the variables defs, body, moldef, themols, and theatoms. We already discussed moldef, and defs simply contains a list of the DEFINE elements. The function body just contains the body element of the SymbolicXML expression. That leaves themols and theatoms.
After ProcessDefinition is mapped to defs, themols is defined. Molecules in the body have a type attribute, which references the key of the molecule type defined in the head. The Cases statement then matches the molecules in the body and returns that molecule's type definition in moldef.
In our example, we only have one molecule in the body. If more molecules existed, the lists of Atom and Bond expressions would be merged together by Flatten. As we will see, the Graphics3D expression is simply made by drawing each Atom and Bond. Also, the body may contain other atoms as well. The definition of theatoms simply matches these elements, reads their type from moldef, and appends their positions. Thus, theatoms would contain a list of more atoms to be drawn.
The last line of MoDLToGraphics3D joins the Atom and Bond expressions in themols with the Atom expressions in theatoms. It then maps MolToGraphics onto this list. MolToGraphics is simply a function that returns a sphere for Atom expressions and a line for Bond expressions. Of course, we also need to define MolToGraphics. The definition is straightforward, provided you are familiar with Graphics`Shapes` and Graphics expressions in Mathematica.
The result is a 3D graphic of methane or any other molecule you have defined in MoDL.