WEBMATHEMATICA TUTORIAL

XML

XML is a general data format that is becoming increasingly important. Data that is formatted in XML can readily be used by applications that are able to process it. In this case the choice of an XML format means that you will save considerable development effort. In addition there are an increasing number of existing data formats that use XML. Some of the more important for mathematical and scientific purposes include XHTML (an XML compliant version of HTML), MathML (a way to store mathematical information), and SVG (a graphics format). A large list of XML applications is available at http://www.xml.org.

Mathematica contains a large number of features for working with XML, all of which are available in webMathematica. XML can be very useful for webMathematica with its support for specific XML applications and as a general format for data interchange. The use of MathML, SVG, and XHTML will be covered in their own sections. This section will give an overview of XML and the XML features of Mathematica. It will also give some examples of why this functionality is useful to webMathematica.

Introduction to XML

This section will give a very brief introduction to XML. For more information, go to one of the many references such as those detailed at http://www.w3.org/XML/, for example, http://www.w3.org/XML/1999/XML-in-10-points.

A sample XML document is shown below.

<?xml version="1.0"?>
<library>
<book>
<title>A New Kind of Science</title>
<author>Stephen Wolfram</author>
</book>
<book>
<title>The Lord of the Rings</title>
<author>J.R.R. Tolkien</author>
</book>
</library>

The example above shows a data format for a library. The library contains books and each book has a title and an author. This shows how XML is suitable for structured data. In addition, you can see how XML looks a little like HTML, except that the tags (words bracketed by '<' and '>') are not restricted to a fixed set since new tags, that are suitable for a particular application, can be introduced. Unlike HTML, the format of XML is stricter with a valid XML document being required to follow rules that do not apply to HTML. This is demonstrated in the next section.

XML Compliance

One issue with XML is that documents must be wellformed, following the rules of XML. Some basic examples of compliance are described in this section.

An XML document must include a header. For example, it must start with something like the following.

<?xml version="1.0"?>

Empty elements must either have an end tag, or the start tag must end with />. Thus, the following is legal.

<br/><hr/>

However, this is not legal.

<br><hr>

For nonempty tags, the end tag is required. Thus, the following is legal.

<p>Here is a paragraph.</p><p>Here is another.</p>

However, this is not legal.

<p>Here is a paragraph.<p>Here is another.

Mathematica Support for XML

Mathematica provides some very convenient ways to work with XML. Many of these are based on the strong correspondence between structured XML documents and Mathematica expressions (the basic data type of Mathematica). This makes it easy to import XML data into Mathematica and then work with it. This section gives a very brief introduction to working with XML in Mathematica; more information is available in the online documentation.

The following is a simple example.

In[1]:=
Click for copyable input

This XML can be imported into Mathematica, which represents it with symbolic XML. Because of the nature of Mathematica expressions, symbolic XML is a Mathematica native form of XML that is isomorphic to textual XML.

In[2]:=
Click for copyable input
Out[2]=

You can use standard Mathematica programming features to process symbolic XML; for example, to extract all the authors.

In[3]:=
Click for copyable input
Out[3]=
In[4]:=
Click for copyable input
Out[4]=

This outputs the new XML expression.

In[5]:=
Click for copyable input
Out[5]=

This type of transformation can of course be done in other ways. For example, the use of XSLT stylesheet technology provides one way. However, there is an overhead to setting up an XSLT stylesheet to make the transformation. The use of Mathematica, with its uniform programming principles, is often a quick and simple way to get the task carried out.

There are many more features of the Mathematica XML tools, for example, working with attributes, entities, namespaces, validation, and CDATA. More information is available from the Mathematica documentation.

webMathematica XML Applications

Many webMathematica applications involve generating HTML to be read by browsers. However, the output from a webMathematica site may not go to a browser; it may involve some data to be read by an application that will then do further processing. This section will study an example that shows how this can be done.

The source for this example is in webMathematica/Examples/XML/Phone.jsp and webMathematica/Examples/XML/Processed.jsp. It also uses an XML file webMathematica/Examples/XML/phone.xml. If you installed webMathematica as described above, you should be able to connect to this JSP via http://localhost:8080/webMathematica/Examples/XML/Phone.jsp. (You may have some other URL for accessing your server.)

This shows the XML data.

<?xml version="1.0"?>

<EmployeeList>
<Person Name="Tom Jones" Email="tomj" Phone="235-1231" />
<Person Name="Janet Rogers" Email="jrogers" Phone="235-1129" />
<Person Name="Bob Norris" Email="bobn" Phone="235-1237" />
<Person Name="Kit Smithers" Email="ksmit" Phone="235-0729" />
<Person Name="Jamie Lemay" Email="jlemay" Phone="235-6393" />
</EmployeeList>

The contents of Processed.jsp are shown below.

<%@ page contentType="text/xml"%>
<%@ taglib uri="http://www.wolfram.com/msp" prefix="msp" %>

<msp:evaluate>
xml = Import[ToFileName[MSPPageDirectory[], "phone.xml"], "XML"] ;
xml = First[Cases[xml, _XMLElement]];
If[MSPValueQ[$$patt],
xml = DeleteCases[xml,
XMLElement["Person", {___,
"Name"->n_/;!StringMatchQ[n, $$patt], ___}, _], Infinity]
];
ExportString[xml, "XML"]
</msp:evaluate>

This example first imports the XML file into Mathematica. It uses the command MSPPageDirectory because the XML data is located in the same directory as Processed.jsp. It then checks to see if a parameter patt was sent. If this is the case, then it uses this to discard XML elements that do not match this name. You should be able to see the operation of this parameter with a URL such as http://localhost:8080/webMathematica/Examples/XML/Processed.jsp?patt=T*. (You may have some other URL for accessing your server.) It ends by converting the symbolic XML into a string version of the XML and returning this.

Of course, you may want to use this XML data for further processing. If you have a system that is XML-aware, this is quite straightforward. One useful application that is XML-aware is of course Mathematica. For example, the following will call your webMathematica site and retrieve the information.

In[1]:=
Click for copyable input
Out[1]=

You may even wish to use this in a Mathematica program.

In[2]:=
Click for copyable input
In[3]:=
Click for copyable input
Out[3]=

Of course your client could be written in some system other than Mathematica, such as Visual Basic, Python, or Java.

Using XML as an interchange format for communication between two programs is discussed in more detail in the section on web services.