Mathematica 9 is now available
5.1 XML

Documentation5. Applications

 

5.1 XML

XML is a general data format that is becoming increasingly important. Data that is formatted in XML can readily be used by applications that are able to process it. In this case the choice of an XML format means that you will save considerable development effort. In addition there are an increasing number of existing data formats that use XML. Some of the more important for mathematical and scientific purposes include XHTML (an XML compliant version of HTML), MathML (a way to store mathematical information), and SVG (a graphics format). A large list of XML applications is available at http://www.xml.org/xml/registry.jsp.

Mathematica 4.2 introduced a large number of features for working with XML, all of which are available in webMathematica 2.0. XML can be very useful for webMathematica with its support for specific XML applications and as a general format for data interchange. The use of MathML, SVG and XHTML will be covered in their own sections. This section will give an overview to XML and the XML features of Mathematica. It will also give some examples of why this functionality is useful to webMathematica.

5.1.1 Introduction to XML

This section will give a very brief introduction to XML. For more information go to one of the many references such as those detailed at http://www.w3.org/XML/, for example, http://www.w3.org/XML/1999/XML-in-10-points.

A sample XML document is shown below:

<?xml version="1.0"?>
<library>
   <book>
      <title>A New Kind of Science</title>
      <author>Stephen Wolfram</author>
   </book>
   <book>
      <title>The Lord of the Rings</title>
      <author>J.R.R. Tolkien</author>
   </book>
</library>

In the example above we see a data format for a library. The library contains books and each book has a title and an author. This shows how XML is suitable for structured data. In addition we see how XML looks a little like HTML, except that the tags (words bracketed by '<' and '>') are not restricted to a fixed set since new tags can be introduced that are suitable for a particular application. Unlike HTML, the format of the XML is stricter with a valid XML document being required to follow rules that do not apply to HTML. This is demonstrated in the next section.

XML Compliance

One issue with XML is that documents must be well-formed, following the rules of XML. Some basic examples of compliance are described in this section.

An XML document must include a header. For example, it must start with something like the following.

<?xml version="1.0"?>

Empty elements must either have an end tag, or the start tag must end with />. Thus, the following is legal.

<br/><hr/>

However, this is not legal.

<br><hr>

For non empty tags the end tag is required. Thus the following is legal.

<p>Here is a paragraph.</p><p>Here is another.</p>

However, this is not legal.

<p>Here is a paragraph.<p>Here is another.

5.1.2 Mathematica Support for XML

This section gives a very brief introduction to the Mathematica tools for working with XML; more information is available in the online documentation. One important point is that XML is suitable for holding structured data, which also applies to Mathematica expressions (the basic data type of Mathematica). This makes it easy to import XML data into Mathematica.

The following is a simple example.

xml=
"<?xml version=\"1.0\"?>\n <library>\n <book>\n <title>A New Kind of Science</title><author>Stephen Wolfram</author>\n </book>\n <book> \n <title>The Lord of the Rings</title> <author>J.R.R. Tolkien</author>\n </book>\n</library>";

This XML can be imported into Mathematica, which represents it with Symbolic XML. Because of the nature of Mathematica expressions, Symbolic XML is a Mathematica native form of XML that is isomorphic to textual XML.

sym =ImportString[ xml, "XML"]

XMLObject["Document"][{XMLObject["Declaration"]["Version" -> "1.0"]}, XMLElement["library", {}, {XMLElement["book", {}, {XMLElement["title", {}, {"A New Kind of Science"}], XMLElement["author", {}, {"Stephen Wolfram"}]}], XMLElement["book", {}, {XMLElement["title", {}, {"The Lord of the Rings"}], XMLElement["author", {}, {"J.R.R. Tolkien"}]}]}], {}]

We can use standard Mathematica programming features to process Symbolic XML; for example, we can extract all the authors.

Cases[sym, XMLElement[ "author", a_, {d_}] -> d, Infinity]

{"Stephen Wolfram", "J.R.R. Tolkien"}

newSym = sym/.XMLElement[ t_, a_, {d_}] -> XMLElement[t, a, {ToLowerCase[d]}]

XMLObject["Document"][{XMLObject["Declaration"]["Version" -> "1.0"]}, XMLElement["library", {}, {XMLElement["book", {}, {XMLElement["title", {}, {"a new kind of science"}], XMLElement["author", {}, {"stephen wolfram"}]}], XMLElement["book", {}, {XMLElement["title", {}, {"the lord of the rings"}], XMLElement["author", {}, {"j.r.r. tolkien"}]}]}], {}]

Here we generate a new XML format that was formed by modifying the original input.

ExportString[newSym, "XML"]

"<?xml version='1.0'?>\n<library>\n <book>\n  <title>a new kind of science</title>\n  <author>stephen wolfram</author>\n </book>\n <book>\n  <title>the lord of the rings</title>\n  <author>j.r.r. tolkien</author>\n </book>\n</library>"

This type of transformation can of course be done in other ways. For example, the use of XSLT stylesheet technology provides one way. However, there is an overhead to setting up an XSLT stylesheet to make the transformation. The use of Mathematica, with its uniform programming principles, is often a quick and simple way to get the task carried out.

There are many more features of the Mathematica XML tools, for example, working with attributes, entities, namespaces, validation, and CDATA. More information is available from the Mathematica documentation.

5.1.3 webMathematica XML Applications

Many webMathematica applications involve generating HTML to be read by browsers. However, the output from a webMathematica site may not go to a browser; it may involve some data to be read by an application that will then do further processing. This section will study an example that shows how this can be done.

The source for this example is in webMathematica/Examples/XML/Phone.jsp. It also uses an XML file webMathematica/Examples/XML/phone.xml. If you installed webMathematica as described above, you should be able to connect to this JSP via http://localhost:8080/webMathematica/Examples/XML/Phone.jsp. (You may have some other URL for accessing your server.)

We first see the XML data.

<?xml version="1.0"?>

<EmployeeList>
<Person Name="Tom Jones" Email="tomj" Phone="235-1231" />
<Person Name="Janet Rogers" Email="jrogers" Phone="235-1129" />
<Person Name="Bob Norris" Email="bobn" Phone="235-1237" />
<Person Name="Kit Smithers" Email="ksmit" Phone="235-0729" />
<Person Name="Jamie Lemay" Email="jlemay" Phone="235-6393" />
</EmployeeList>

The contents of Phone.jsp are shown below.

<%@ page language="java" %>
<%@ taglib uri="/webMathematica-taglib" prefix="msp" %>

<msp:evaluate>
   xml = Import[ ToFileName[ MSPPageDirectory[], "phone.xml"], "XML"] ;
   If[ MSPValueQ[ $$name],
      patt = "*" <> $$name <> "*";
      xml = DeleteCases[xml,
         XMLElement[ "Person", {___, "Name"->n_/;!StringMatchQ[n, patt], ___}, _ ], Infinity]] ;
   MSPReturn[ ExportString[ xml, "XML"], "text/xml"]
</msp:evaluate>

This example first imports the XML file into Mathematica. It uses the command MSPPageDirectory because the XML data is located in the same directory as Phone.jsp. It then checks to see if a parameter name was sent. If this is the case, then it uses this to discard XML elements that don't match this name. You should be able to see the operation of this parameter with a URL such as http://localhost:8080/webMathematica/Examples/XML/Phone.jsp?name=T. (You may have some other URL for accessing your server.)

Of course, you may want to use this XML data for further processing. If you have a system that is XML-aware this is quite straightforward. One useful application that is XML-aware is of course Mathematica. For example, the following will call your webMathematica site and retrieve the information.

XML`Parser`XMLGet["http://localhost:8080/webMathematica/Examples/XML/Phone.jsp"]

XMLObject["Document"][{XMLObject["Declaration"]["Version" -> "1.0"]}, XMLElement["EmployeeList", {}, {XMLElement["Person", {"Name" -> "Tom Jones", "Email" -> "tomj", "Phone" -> "235-1231"}, {}], XMLElement["Person", {"Name" -> "Janet Rogers", "Email" -> "jrogers", "Phone" -> "235-1129"}, {}], XMLElement["Person", {"Name" -> "Bob Norris", "Email" -> "bobn", "Phone" -> "235-1237"}, {}]}], {}]

You may even wish to use this in a Mathematica program.

Contact[query_String]:=
Cases[XML`Parser`XMLGet["http://localhost:8080/webMathematica/Examples/XML/Phone.jsp?name="<>query], XMLElement["Person", x_List, {}] :> x, Infinity]

Contact[ "Tom"]

{{"Name" -> "Tom Jones", "Email" -> "tomj", "Phone" -> "235-1231"}}

Of course your client could be written in some system other than Mathematica, such as Visual Basic, Python, or Java.



Any questions about topics on this page? Click here to get an individual response.Buy NowMore Information
THIS IS DOCUMENTATION FOR AN OBSOLETE PRODUCT. CURRENT WEBMATHEMATICA DOCUMENTATION IS NOW AVAILABLE.