Converting a Notebook to HTML
Suppose you need to export a notebook in a specific XML format (apart from standard formats listed under the File Save As Special menu). One option would be to export to NotebookML and then use some external tool (e.g., XSLT rules) to transform to the desired form of XML. But often it is just as easy to perform the manipulation within Mathematica, converting the notebook expression directly into SymbolicXML and saving the latter. Anyone with a basic command of Mathematica patterns and programming should be able to do this. Users coming from an XSLT background may even feel a sense of deja vu; since Mathematica expressions are essentially trees, the techniques are much the same.
As an example, let us recreate an abridged version of the File Save As Special HTML functionality. First, create an example notebook.
Our method will be to define a recursive function, transform, to process the original notebook expression from top to bottom, similar to the templates of XSLT. First, we establish a default definition to discard anything not explicitly matched by other patterns. (Given our "top-down" approach, perhaps this should be the last definition, but we place it here to reduce extraneous output in the intermediate results.)
The above definition uses Sequence for the following reason: since transform will be applied recursively, the best "null'' result is one that can dropped in the midst of a list of arguments without disrupting the syntax.
We start with the notebook expression itself.
- The argument pattern must be robust enough to accept all variants. (Even though the notebook options are discarded in this conversion, a BlankNullSequence (___) is included to allow for them).
- The only thing done with the contents argument is to pass it back to transform .
- The third argument is always a List. Forgetting this is a common pitfall.
- Those familiar with HTML will notice that we have dropped the head element.
The same general theme is followed for the remaining definitions.
Next, we discard cell-grouping information, since the HTML has no use for it.
Mathematica sectional heads are translated to their HTML counterparts.
Now for the Text cells. This introduces a complication, as the contents of a Mathematica Text-style cell can be a simple string or a TextData-wrapped list if the text has substructure of its own, i.e., font changes and so forth. Thus, we need a definition for both cases.
Simple strings should just be passed on as is. Once again, this perhaps should be placed later in the sequence of definitions, in keeping with a top-down style, but it helps make the intermediate results more meaningful.
Finally, we deal with (simple) font changes.
Here is the final product.
You can get output in a more human-readable form by using ExportString.
We can verify that this is well-formed XML.
And, of course, the SymbolicXML can be exported to a file, suitable for viewing with a web browser.
An alternative to a recursive function is to apply a list of replacement rules using ReplaceRepeated.
The two methods produce identical results.
Here is how the two methods differ.
- Since the recursion occurs implicitly via ReplaceRepeated, the latter implementation is cleaner in spots. In particular, contrast the handling of Text cells: the TextData rule can be separated from the Cell rule. The same could be accomplished for the recursive function, but at the cost of additional patterns for the various forms that contents might take (for example, _List versus _String and so on). ReplaceRepeated, by acting on all subexpressions, obviates this need.
- There is no default rule for the second method. Any unhandled parts of the original Mathematica expression will pass through unchanged, probably rendering invalid XML.
Finally, we use Clear to remove the definitions of all the symbols.