HTML (.html, .htm)

Background & Context

    • Registered MIME type: text/html
    • HTML markup language and file format.
    • Predominant language for the creation of web pages.
    • HTML is an acronym derived from Hypertext Markup Language.
    • Plain text format.
    • Describes the structure and aspects of the appearance of web pages.
    • First published in 1993 as an Internet Engineering Task Force (IETF) working draft.
    • Maintained since 1996 by the World Wide Web Consortium (W3C).
    • Most recent version is 4.01, published in 1999 as W3C recommendation.
    • International standard ISO/IEC 15445:2000.
    • Predecessor of XHTML.

Import & Export

  • Import["file.html"] gives a plain text representation of an HTML file.
  • Import["file.html","Data"] extracts tabular data from HTML.
  • Export["file.html",expr] creates an HTML version of expr.
  • Export["dir",expr] translates expr to HTML, saving the output in the specified directory.
  • Import["file.html"] returns a string, representing the textual content of the file as formatted plain text.
  • Export["file.html",expr] exports a notebook, a cell, a list of cells, or other notebook elements to HTML.
  • Export always creates a complete HTML document, and not merely a fragment of HTML.
  • The output consists of one or more HTML files and two directories, HTMLFiles and HTMLLinks.
  • The Wolfram Language by default converts typeset expressions to GIF images when exporting to HTML.
  • The Wolfram Language can export Tooltip and Hyperlink expressions to HTML, creating HTML image maps if necessary.
  • Import["file.html",elem] imports the specified element from an HTML file.
  • Import["file.html",{elem,suba,subb,}] imports a subelement.
  • Import["file.html",{{elem1,elem2,}}] imports multiple elements.
  • The import format can be specified with Import["file","HTML"] or Import["file",{"HTML",elem,}].
  • Export["file.html",expr, elem] creates an HTML file by treating expr as specifying element elem.
  • Export["file.html",{expr1,expr2,},{{elem1,elem2,}}] treats each expri as specifying the corresponding elemi.
  • Export["file.html",expr,opt1->val1,] exports expr with the specified option elements taken to have the specified values.
  • Export["file.html",{elem1->expr1,elem2->expr2,},"Rules"] uses rules to specify the elements to be exported.
  • See the following reference pages for full general information:
  • Import, Exportimport from or export to a file
    CloudImport, CloudExportimport from or export to a cloud object
    ImportString, ExportStringimport from or export to a string
    ImportByteArray, ExportByteArrayimport from or export to a byte array

Notebook Interface

  • Save Selection As exports the selected part of a notebook as a web page.
  • Save As exports an entire notebook as a web page.

Import Elements

  • General Import elements:
  • "Elements" list of elements and options available in this file
    "Summary"summary of the file
    "Rules"list of rules for all available elements
  • Import elements:
  • "Data"textual and numerical content from HTML table and list elements
    "FullData"full tabular content, including empty HTML table and list elements
    "Hyperlinks"hyperlinks, given as a list of strings
    "Images"images embedded in the HTML document
    "Plaintext"HTML document formatted as text
    "Source"raw HTML source as a single string
    "Title"HTML page title
    "ImageLinks"URLs of embedded images
    "XMLObject"symbolic XML representation of the entire document
  • Import by default uses the "Plaintext" element.
  • When importing a plain HTML document as "XMLObject", the Wolfram Language will attempt to convert it to well-formed XHTML and import the resulting XML file.
  • Export elements:
  • "Notebook"a Notebook expression
    "NotebookObject"a NotebookObject expression
    "Expression"an arbitrary Wolfram Language expression

Options

  • Import options:
  • CharacterEncodingAutomaticraw character encoding to use when importing the file
    "Numeric"Truewhether to import data fields as numbers when possible
  • If the character encoding of the file is not specified in the HTML file, Import uses the encoding specified by CharacterEncoding. A complete list of possible encodings is given by $CharacterEncodings.
  • Using CharacterEncoding->Automatic, Import uses the encoding specified in the HTML file. If not specified, it uses "UTF8" encoding. If any sequence of bytes stored in the file cannot be represented in "UTF8", "ISOLatin1" is used.
  • The Wolfram Language always uses the UTF-8 encoding when exporting to HTML.
  • Export options:
  • "Content"Falsewhether to export MathML content elements
    "ConversionRules"Automaticspecifies mappings from Wolfram System cell styles to HTML elements, including both inline and block-level versions of the markup
    "ConvertClosed"Truewhether to export forward-closed cell groups
    "ConvertLinkedNotebooks"Falsewhether to convert linked notebooks to HTML
    "ConvertReverseClosed"Falsewhether to export reverse-closed cell groups
    "CSS"Automaticwhat CSS stylesheet to use or link to
    "FullDocument"Truewhether to export a complete HTML document
    "Graphics3DOutput"Automatichow to represent 3D graphics expressions
    "GraphicsOutput""GIF"how to represent graphics expressions
    "HeadAttributes"{}attributes to be inserted in HTML <head> tag, given as a list of rules
    "HeadElements"{}subelements of HTML <head> tag
    "ManipulateOutput""CDF"how to represent Manipulate expressions
    "MathOutput""GIF"how to represent typeset expressions
  • Possible settings for "GraphicsOutput", "Graphics3DOutput", "ManipulateOutput", and "MathOutput" are:
  • "CDF"converts the targeted expressions to embedded CDF
    "DisplayForm"converts the targeted expressions to ASCII approximations of their appearance
    "GIF"converts the targeted expressions to GIF
    "InputForm"converts the targeted expressions to their InputForm
    "PNG"converts the targeted expressions to PNG
    "JPEG"converts the targeted expressions to JPEG
    "SVG"converts the targeted expressions to SVG
  • "MathOutput"->"MathML" will convert all typeset expressions to MathML.
  • "Graphics3DOutput"->Automatic converts 3D graphics using the same method specified for "GraphicsOutput".
  • The choice of option to use for output conversion is based upon the type of the object at the top level of the cell. For example, a cell that contains only a 2D graphic will use "GraphicsOutput", while a cell that contains a 2D graphic embedded in typesetting or code will use "MathOutput".
  • Possible settings for "CSS" are:
  • Automaticcreates a CSS stylesheet from the Wolfram System stylesheet
    Nonedoes not create a stylesheet file or inline CSS style
    "file.css"uses a stylesheet file
  • Allowed settings for "ConvertClosed" and "ConvertReverseClosed" are:
  • Falsedoes not export closed groups
    Trueexports all cell groups
    "LinkedPage"exports each forward-closed group to a separate page

Examples

open allclose all

Basic Examples  (3)

Import all images from a web page:

Read an HTML file as plain text:

Show the Import elements available in this file:

Import the tabular content from this file:

This exports a mathematical expression to HTML:

Show the names of the files saved into the "HTMLFiles" directory:

Import the rasterized typeset expression:

Scope  (5)

This translates a Cell expression to HTML:

The Wolfram Language can export graphics with embedded tooltips and hyperlinks to HTML image maps:

The mapping of Wolfram System style elements to HTML can be specified as "ConversionRules":

Export a formatted table to HTML:

Create an HTML fragment: