PDF (.pdf)
Background & Context

-
- MIME type: application/pdf
- Adobe Acrobat format.
- Standard format for exchanging and archiving multi-page documents.
- PDF is an acronym for Portable Document Format.
- Binary file format.
- Stores text, fonts, images, and 2D vector graphics in a device‐ and resolution‐independent way.
- Can also store embedded raster images.
- Supports multiple lossy and lossless compression methods.
Import & Export

- Import["file.pdf"] imports a PDF file, returning a list of rasterized images for each page.
- Import["file.pdf",elem] imports the specified element from a PDF file.
- Import["file.pdf",{elem,suba,subb,…}] imports a subelement.
- The import format can be specified with Import["file","PDF"] or Import["file",{"PDF",elem,…}].
- Export["file.pdf",expr] creates a PDF file from an arbitrary expression, cell, or notebook object.
- The Wolfram Language does not rasterize fonts or 2D vector graphics when exporting to PDF.
- Export["file.pdf",expr,elem] creates a PDF file by treating expr as specifying element elem.
- Export["file.pdf",{expr1,expr2,…},{{elem1,elem2,…}}] treats each expri as specifying the corresponding elemi.
- Export["file.pdf",expr,opt1->val1,…] exports expr with the specified option elements taken to have the specified values.
- Export["file.pdf",{elem1->expr1,elem2->expr2,…},"Rules"] uses rules to specify the elements to be exported.
- See the reference pages for full general information on Import and Export.
- ImportString and ExportString support PDF.
Import Elements




- General Import elements:
-
"Elements" list of elements and options available in this file "Rules" full list of rules for each element and option "Options" list of rules for options, properties, and settings - Structure elements:
-
"ContentsGraph" graph of the table of contents from the document "ContentsStartPage" list of rules giving table of contents name and page numbers "PageCount" number of pages "Summary" summary of the file - Data representation elements for the whole PDF document:
-
"Plaintext" a string giving the textual content of the whole document "FormattedText" a sequence of formatted text for the whole document - Data representation elements given as a list representing each page of the document:
-
"PageFormattedText" a list of formatted text, each representing a page "PageGraphics" a list of Graphics objects, each representing a page "PageImages" a list of Image objects, each representing a page "PagePlaintext" a list of strings, each representing the plaintext of a page - Import by default uses the "ImageList" element.
- Metadata elements:
-
"Author" author of the document "CreationDate" creation date of the document, given as a DateObject "Creator" program that created the content "Keywords" keywords from the document "ModificationDate" modification date of the document, given as a DateObject "MetaInformation" metadata given as strings and date objects "Producer" program that converted the data to PDF "Subject" the subject of the document "Title" document title "Version" version of the PDF specification for the file - Hyperlink, annotation, and form field elements:
-
"FormFieldRules" association of page numbers and lists of rules giving form field names and values "HighlightedText" association of page numbers and list of strings for each highlighted section of text on each page "Hyperlinks" association of page numbers and list of Hyperlink objects for each link on each page "TextAnnotations" association of page numbers and text from annotations "URLs" association of page numbers and list of URL objects for each link on each page - Embedded images elements:
-
"EmbeddedImageCount" association of page numbers and number of images "EmbeddedImages" association of page numbers and embedded images for each page - Attachments elements:
-
"AttachmentCount" number of attachments "AttachmentList" list of attachments, imported as a Wolfram Language expression if possible "AttachmentNames" list of attachment names "AttachmentDetails" list of associations giving content and attachment metadata "RawAttachmentList" attachments given as a list of byte arrays "AttachmentData" list of associations giving raw attachment data as byte arrays and metadata - The element "AttachmentDetails" is a list giving an association for each attachment. Each association typically has the following keys:
-
"Name" name assigned to the attachment "Content" imported content "CreationDate" creation date recorded for the attachment "ModificationDate" modification date recorded for the attachment "ByteCount" number of bytes in the attachment - The element "AttachmentData" is a list giving an association for each attachment. Each association typically has the following keys:
-
"Name" name assigned to the attachment "RawContent" raw content as a byte array "CreationDate" creation date recorded for the attachment "ModificationDate" modification date recorded for the attachment "ByteCount" number of bytes in the attachment - For elements with multiple parts, use subelements for partial data import in either of the {elem,page,index} or {elem,index} form, where page and index can be any of the following:
-
n nth item -n counts from the end n;;m from n through m n;;m;;s from n through m with steps of s {n1,n2,…} specific items ni - Use {"FormFieldRules",page,names} to import form values corresponding to the fields names.
Options


- Import options:
-
"Password" None document password given as a string "TextOutlines" True whether to import characters as outlines "Render" All parts of the document to render in ImageList RasterSize Automatic raster size in pixels for rasterization ImageResolution $ImageResolution image resolution in dpi for rasterization "AttachmentRules" < > rules to control how to import attachments - Possible settings for "Render":
-
"Annotations" annotations such as highlighting or additional text boxes "FormFields" data from filled out form fields All render all elements from the document None render no additional elements from the document - Export options:
-
ImageSize Automatic overall image size ImageResolution 72 image resolution for rasterization in dpi "AllowRasterization" Automatic whether to rasterize a graphic that requires advanced versions of PDF - Possible settings for "AllowRasterization":
-
Automatic rasterize a graphic that contains features such as transparency or gradients that require advanced versions of PDF to render True always rasterize graphics False always use vector graphics, deploying advanced PDF features where necessary for faithful rendering