HTML (.html, .htm)
- Import fully supports HTML version 4.01.
- Output from Export[…,"HTML"] conforms to the XHTML 1.1 standard.
Background & Context
-
- Registered MIME type: text/html
- HTML markup language and file format.
- Predominant language for the creation of web pages.
- HTML is an acronym derived from Hypertext Markup Language.
- Plain text format.
- Describes the structure and aspects of the appearance of web pages.
- First published in 1993 as an Internet Engineering Task Force (IETF) working draft.
- Maintained since 1996 by the World Wide Web Consortium (W3C).
- Most recent version is 4.01, published in 1999 as W3C recommendation.
- International standard ISO/IEC 15445:2000.
- Predecessor of XHTML.
Import & Export
- Import["file.html"] gives a plain text representation of an HTML file.
- Import["file.html","Data"] extracts tabular data from HTML.
- Export["file.html",expr] creates an HTML version of expr.
- Export["dir",expr] translates expr to HTML, saving the output in the specified directory.
- Import["file.html"] returns a string, representing the textual content of the file as formatted plain text.
- Export["file.html",expr] exports a notebook, a cell, a list of cells, or other notebook elements to HTML.
- Export always creates a complete HTML document, and not merely a fragment of HTML.
- The output consists of one or more HTML files and two directories, HTMLFiles and HTMLLinks.
- The Wolfram Language by default converts typeset expressions to GIF images when exporting to HTML.
- The Wolfram Language can export Tooltip and Hyperlink expressions to HTML, creating HTML image maps if necessary.
- Import["file.html",elem] imports the specified element from an HTML file.
- Import["file.html",{elem,suba,subb,…}] imports a subelement.
- Import["file.html",{{elem1,elem2,…}}] imports multiple elements.
- The import format can be specified with Import["file","HTML"] or Import["file",{"HTML",elem,…}].
- Export["file.html",expr, elem] creates an HTML file by treating expr as specifying element elem.
- Export["file.html",{expr1,expr2,…},{{elem1,elem2,…}}] treats each expri as specifying the corresponding elemi.
- Export["file.html",expr,opt1->val1,…] exports expr with the specified option elements taken to have the specified values.
- Export["file.html",{elem1->expr1,elem2->expr2,…},"Rules"] uses rules to specify the elements to be exported.
- See the following reference pages for full general information:
-
Import, Export import from or export to a file CloudImport, CloudExport import from or export to a cloud object ImportString, ExportString import from or export to a string ImportByteArray, ExportByteArray import from or export to a byte array
Import Elements
- General Import elements:
-
"Elements" list of elements and options available in this file "Summary" summary of the file "Rules" list of rules for all available elements - Import elements:
-
"Data" textual and numerical content from HTML table and list elements "FullData" full tabular content, including empty HTML table and list elements "Hyperlinks" hyperlinks, given as a list of strings "Images" images embedded in the HTML document "Plaintext" HTML document formatted as text "Source" raw HTML source as a single string "Title" HTML page title "ImageLinks" URLs of embedded images "XMLObject" symbolic XML representation of the entire document - Import by default uses the "Plaintext" element.
- When importing a plain HTML document as "XMLObject", the Wolfram Language will attempt to convert it to well-formed XHTML and import the resulting XML file.
- Export elements:
-
"Notebook" a Notebook expression "NotebookObject" a NotebookObject expression "Expression" an arbitrary Wolfram Language expression
Options
- Import options:
-
CharacterEncoding Automatic raw character encoding to use when importing the file "Numeric" True whether to import data fields as numbers when possible - If the character encoding of the file is not specified in the HTML file, Import uses the encoding specified by CharacterEncoding. A complete list of possible encodings is given by $CharacterEncodings.
- Using CharacterEncoding->Automatic, Import uses the encoding specified in the HTML file. If not specified, it uses "UTF8" encoding. If any sequence of bytes stored in the file cannot be represented in "UTF8", "ISOLatin1" is used.
- The Wolfram Language always uses the UTF-8 encoding when exporting to HTML.
- Export options:
-
"Content" False whether to export MathML content elements "ConversionRules" Automatic specifies mappings from Wolfram System cell styles to HTML elements, including both inline and block-level versions of the markup "ConvertClosed" True whether to export forward-closed cell groups "ConvertLinkedNotebooks" False whether to convert linked notebooks to HTML "ConvertReverseClosed" False whether to export reverse-closed cell groups "CSS" Automatic what CSS stylesheet to use or link to "FullDocument" True whether to export a complete HTML document "Graphics3DOutput" Automatic how to represent 3D graphics expressions "GraphicsOutput" "GIF" how to represent graphics expressions "HeadAttributes" {} attributes to be inserted in HTML <head> tag, given as a list of rules "HeadElements" {} subelements of HTML <head> tag "ManipulateOutput" "CDF" how to represent Manipulate expressions "MathOutput" "GIF" how to represent typeset expressions - Possible settings for "GraphicsOutput", "Graphics3DOutput", "ManipulateOutput", and "MathOutput" are:
-
"CDF" converts the targeted expressions to embedded CDF "DisplayForm" converts the targeted expressions to ASCII approximations of their appearance "GIF" converts the targeted expressions to GIF "InputForm" converts the targeted expressions to their InputForm "PNG" converts the targeted expressions to PNG "JPEG" converts the targeted expressions to JPEG "SVG" converts the targeted expressions to SVG - "MathOutput"->"MathML" will convert all typeset expressions to MathML.
- "Graphics3DOutput"->Automatic converts 3D graphics using the same method specified for "GraphicsOutput".
- The choice of option to use for output conversion is based upon the type of the object at the top level of the cell. For example, a cell that contains only a 2D graphic will use "GraphicsOutput", while a cell that contains a 2D graphic embedded in typesetting or code will use "MathOutput".
- Possible settings for "CSS" are:
-
Automatic creates a CSS stylesheet from the Wolfram System stylesheet None does not create a stylesheet file or inline CSS style "file.css" uses a stylesheet file - Allowed settings for "ConvertClosed" and "ConvertReverseClosed" are:
-
False does not export closed groups True exports all cell groups "LinkedPage" exports each forward-closed group to a separate page
Examples
open allclose allBasic Examples (3)
Import all images from a web page:
Read an HTML file as plain text:
Show the Import elements available in this file:
Import the tabular content from this file:
This exports a mathematical expression to HTML:
Show the names of the files saved into the "HTMLFiles" directory:
Scope (5)
This translates a Cell expression to HTML:
The Wolfram Language can export graphics with embedded tooltips and hyperlinks to HTML image maps:
The mapping of Wolfram System style elements to HTML can be specified as "ConversionRules":