XML (.xml)

Background & Context

    • MIME type: text/xml
    • XML general-purpose markup language and structured document format.
    • Primarily used for the exchange of data across different systems in computer networks.
    • Uses a hierarchical model for the representation of structured data.
    • Stores data in a tree-based structure consisting of markup tags, attributes, and character contents.
    • Plain text file, normally encoded as UTF-8.
    • XML is an acronym derived from Extensible Markup Language.
    • Is a subset of the Standard Generalized Markup Language (SGML).
    • Developed since 1996 by the XML Working Group.
    • Published in 2001 as W3C standard recommendation RFC 3076.

Import & Export

  • Import["file.xml"] uses a specific converter for XML-based file formats if possible; otherwise, it imports the file as generic XML and returns an XMLObject expression.
  • Import["file.xml","XML"] always imports as generic XML.
  • Since both XML and the Wolfram Language represent data as a tree structure, there is a natural mapping from one to the other. The Wolfram Language stores XML data structures as nested XMLElement objects, and an entire XML document as XML data embedded in an XMLObject.
  • Export["file.xml",expr] exports an XMLObject or XMLElement expression to XML.
  • Expressions of types other than XMLObject or XMLElement are exported as ExpressionML.
  • Import["file.xml"] returns an XMLObject expression, representing the entire XML document in symbolic form as a tree of XMLElement expressions.
  • Import by default returns numeric data stored in XML as strings.
  • Export["file.xml",XMLObject[]] or Export["file.xml",XMLElement[]] creates an XML file from a symbolic XML representation.
  • Import["file.xml",elem] imports the specified element from an XML file.
  • Import["file.xml",{elem,suba,subb,}] imports a subelement.
  • Import["file.xml",{{elem1,elem2,}}] imports multiple elements.
  • The import format can be specified with Import["file","XML"] or Import["file",{"XML",elem,}].
  • Import["file.html","XML"] converts HTML to well-formed XML before importing.
  • Export["file.xml",expr, elem] creates an XML file by treating expr as specifying element elem.
  • Export["file.xml",{expr1,expr2,},{{elem1,elem2,}}] treats each expri as specifying the corresponding elemi.
  • Export["file.xml",expr,opt1->val1,] exports expr with the specified option elements taken to have the specified values.
  • Export["file.xml",{elem1->expr1,elem2->expr2,},"Rules"] uses rules to specify the elements to be exported.
  • See the following reference pages for full general information:
  • Import, Exportimport from or export to a file
    CloudImport, CloudExportimport from or export to a cloud object
    ImportString, ExportStringimport from or export to a string
    ImportByteArray, ExportByteArrayimport from or export to a byte array

Import Elements

  • General Import elements:
  • "Elements" list of elements and options available in this file
    "Summary"summary of the file
    "Rules"list of rules for all available elements
  • Data representation elements:
  • "CDATA"CDATA sections as a list of strings
    "Comments"XML comments as a list of strings
    "EmbeddedDTD"DTD included in the XML document
    "Plaintext"a plain text representation of the file
    "Tags"list of all tags occurring in the file
    "XMLObject"entire document as a symbolic XML expression
    "XMLElement"nested XMLElement objects
  • Import uses the "XMLObject" element by default.

Options

  • Import options:
  • "AllowRemoteDTDAccess"Truewhether to attempt to retrieve an external DTD over a network
    "AllowUnrecognizedEntities"Automaticwhether to allow parsing to work around unrecognized entities in the XML document
    "IncludeDefaultedAttributes"Falsewhether to fill in default values for attributes
    "IncludeEmbeddedObjects"Noneembedded objects (of "Comments" and "ProcessingInstructions") to include
    "IncludeNamespaces"Automaticwhether to return fully qualified tag and attribute names
    "NormalizeWhitespace"Automaticwhether to remove leading and trailing whitespace and reduce consecutive spaces to a single space in character data
    "PreserveCDATASections"Falsewhether to preserve character data sections as special objects
    "ReadDTD"Truewhether to read an external DTD
    "ValidateAgainstDTD"Automaticwhether to validate the document against the specified DTD
  • Export options:
  • "AttributeQuoting""'"specifies the delimiter for attribute values
    "ElementFormatting"Automaticindentation of elements and line breaking of long strings in the exported document
    "Entities"Nonerules for replacing characters with named entities
    "NamespacePrefixes"{}namespace prefix designations, of the form "namespace"->"prefix"

Examples

Basic Examples  (1)

Import an XML sample file as symbolic XML:

Show the Import elements available in this file:

Import all CDATA sections:

Convert to plain text:

Get the list of all XML tags that occur in this sample file: