Manipulating XML Data
XML applications are used for more than just document layout. XML is also an excellent format for storing structured data. Many commercial database vendors are now adding XML support to their products, allowing you to work with databases using XML as an intermediate format.
Mathematica's symbolic pattern-matching capabilities make it an ideal tool for extracting and manipulating information from XML documents. To illustrate this, let us manipulate an XML file containing data on major league baseball players. We first import this file into Mathematica as a SymbolicXML expression.
Each player's information is stored in a PlayerRecord element. We can easily extract this with Cases.
As we can see, the XML document contains records for 294 players. Since we do not want to sift through all the American League hitters, we will just take a look at the Yankees. Inside each PlayerRecord element, there is a TEAM element which specifies a player's team. By passing a slightly more sophisticated pattern to Cases, we can extract a list of all players on the Yankees team.
The variable yankees now contains a list of SymbolicXML expressions for all the Yankees players. Just to see what the syntax of each PlayerRecord element is, let's take a look at the first element of yankees.
We can see that the player's name is stored in the PLAYER element of each PlayerRecord element. Suppose we just want to look at the names of the Yankees hitters we have already extracted. We can extract the name from one PlayerRecord element easily enough.
We can then use Map to extract all the names from yankees.
Alternately, we could have used Cases on yankees with an appropriate pattern.
SymbolicXML is a general-purpose format for expressing arbitrary XML data. In some cases, you may find it more useful to convert SymbolicXML into a different type of Mathematica expression. This type of conversion is easy to do using pattern-matching. In the following example, we import an XML file containing data about baseball pitchers and translate the resulting SymbolicXML expression into a list of Mathematica rules.
Here, we have transformed the SymbolicXML expression for a PlayerRecord node into a simpler expression. All the information about the player is stored in a list of Mathematica Rules with Pitcher as the head.
In addition to transforming the data into a different expression syntax as above, we can also modify the data and leave the overall expression in SymbolicXML. This way we can alter our data, but still export it to an XML file for use with other applications. As an example, we will work with the salaries of our American League hitters. First, we delete any PlayerRecord entries where the salary is not available.
Next, we create a function to extract name-salary pairs from our PlayerRecord data. We will then sort these pairs by salary and look at the top ten.
As a simple example of how to change the data in our SymbolicXML expression, we will create a function which doubles players' salaries.