PDF (.pdf)
Background & Context
-
- MIME type: application/pdf
- Adobe Acrobat format.
- Standard format for exchanging and archiving multi-page documents.
- PDF is an acronym for Portable Document Format.
- Binary file format.
- Stores text, fonts, images, and 2D vector graphics in a device‐ and resolution‐independent way.
- Can also store embedded raster images.
- Supports multiple lossy and lossless compression methods.
Import & Export
- Import["file.pdf"] imports a PDF file, returning a list of rasterized images for each page.
- Import["file.pdf",elem] imports the specified element from a PDF file.
- Import["file.pdf",{elem,suba,subb,…}] imports a subelement.
- The import format can be specified with Import["file","PDF"] or Import["file",{"PDF",elem,…}].
- Export["file.pdf",expr] creates a PDF file from an arbitrary expression, cell, or notebook object.
- Export["file.pdf",expr,elem] creates a PDF file by treating expr as specifying element elem.
- Export["file.pdf",{expr1,expr2,…},{{elem1,elem2,…}}] treats each expri as specifying the corresponding elemi.
- Export["file.pdf",expr,opt1->val1,…] exports expr with the specified option elements taken to have the specified values.
- Export["file.pdf",{elem1->expr1,elem2->expr2,…},"Rules"] uses rules to specify the elements to be exported.
- Export["file.pdf",expr] effectively renders expr as if it were being printed to the default printer. If expr is not a notebook, it will effectively create a notebook with the same properties as the notebook performing the evaluation, or a default notebook if the evaluation did not start in a notebook. The front end's PrintingStyleEnvironment option is used to pick the environment for printing.
- The Wolfram Language attempts to preserve a vector description of the content where possible, but content requiring modern rendering methods not supported by PDF will be rasterized. This includes all 3D graphics and 2D content with transparency, color gradients, textures or shading.
- See the following reference pages for full general information:
-
Import, Export import from or export to a file CloudImport, CloudExport import from or export to a cloud object ImportString, ExportString import from or export to a string ImportByteArray, ExportByteArray import from or export to a byte array
Import Elements
- General Import elements:
-
"Elements" list of elements and options available in this file "Summary" summary of the file "Rules" list of rules for all available elements - Structure elements:
-
"ContentsGraph" graph of the table of contents from the document "ContentsStartPage" list of rules giving table of contents name and page numbers "PageCount" number of pages "Summary" summary of the file - Data representation elements for the whole PDF document:
-
"Plaintext" a string giving the textual content of the whole document "FormattedText" a sequence of formatted text for the whole document - Data representation elements given as a list representing each page of the document:
-
"PageFormattedText" a list of formatted text, each representing a page "PageGraphics" a list of Graphics objects, each representing a page "PageImages" a list of Image objects, each representing a page "PagePlaintext" a list of strings, each representing the plaintext of a page "PagePositionedText" a list of Text objects including text coordinates "PageObjects" a list of text and images "PagePositionedObjects" a list of text and images together with their coordinates - Import by default uses the "PageImages" element.
- Metadata elements:
-
"Author" author of the document "CreationDate" creation date of the document, given as a DateObject "Creator" program that created the content "Keywords" keywords from the document "ModificationDate" modification date of the document, given as a DateObject "MetaInformation" metadata given as strings and date objects "Producer" program that converted the data to PDF "Subject" the subject of the document "Title" document title "Version" version of the PDF specification for the file - Hyperlink, annotation, and form field elements:
-
"FormFieldRules" association of page numbers and lists of rules giving form field names and values "HighlightedText" association of page numbers and list of strings for each highlighted section of text on each page "Hyperlinks" association of page numbers and list of Hyperlink objects for each link on each page "TextAnnotations" association of page numbers and text from annotations "URLs" association of page numbers and list of URL objects for each link on each page - Embedded images elements:
-
"EmbeddedImageCount" association of page numbers and number of images "EmbeddedImages" association of page numbers and embedded images for each page - Attachments elements:
-
"AttachmentCount" number of attachments "AttachmentList" lists of processed attachments as expressions "AttachmentNames" list of attachment names "AttachmentDetails" lists of associations giving attachment content and metadata "RawAttachmentList" attachments given as a list of byte arrays "AttachmentData" list of associations giving raw attachment data and metadata - The element "AttachmentDetails" is a list giving an association for each attachment. Each association typically has the following keys:
-
"Name" name assigned to the attachment "Content" imported content "CreationDate" creation date recorded for the attachment "ModificationDate" modification date recorded for the attachment "ByteCount" number of bytes in the attachment - The element "AttachmentData" is a list giving an association for each attachment. Each association typically has the following keys:
-
"Name" name assigned to the attachment "RawContent" raw content as a byte array "CreationDate" creation date recorded for the attachment "ModificationDate" modification date recorded for the attachment "ByteCount" number of bytes in the attachment - For elements with multiple parts, use subelements for partial data import in either of the {elem,page,index} or {elem,index} form, where page and index can be any of the following:
-
n nth item -n counts from the end n;;m from n through m n;;m;;s from n through m with steps of s {n1,n2,…} specific items ni - Use {"FormFieldRules",page,names} to import form values corresponding to the fields names.
Options
- Import options:
-
ImageResolution $ImageResolution image resolution in dpi for rasterization ImageSize Automatic - final displayed image size in printer's points
"Password" None document password given as a string RasterSize Automatic raster size in pixels for rasterization "RenderedElements" All parts of the document to render in "PageImages" - Possible settings for "RenderedElements":
-
"Annotations" annotations such as highlighting or additional text boxes "FormFields" data from filled-out form fields All render all elements from the document None render no additional elements from the document - Export options:
-
ImageSize Automatic overall image size ImageResolution 72 image resolution for rasterization in dpi "AllowRasterization" Automatic whether to rasterize a graphic that requires advanced versions of PDF - Possible settings for "AllowRasterization":
-
Automatic rasterize a graphic that contains features such as transparency or gradients that require advanced versions of PDF to render True always rasterize graphics False always use vector graphics, deploying advanced PDF features where necessary for faithful rendering