MBOX (.mbox)

Background & Context

    • MIME type: application/mbox
    • Unix mailbox format.
    • Holds a collection of email messages.
    • Native archive format of email clients such as Unix mail, Thunderbird, and many others.
    • Textual format with encoded binary data.
    • Stores messages in EML format, concatenated with separator lines.
    • Supports RFC 4155.

Import

  • Import["file.mbox"] imports an MBOX file, returning a list of message summaries given as associations.
  • Import["file.mbox"] returns an expression of the form {msg1,msg2,}, where the msgi are associations giving basic elements of individual mail messages.
  • Import["file.mbox",elem] imports the specified element from an MBOX file.
  • Import["file.mbox",{elem,suba,subb,}] imports a subelement.
  • Import["file.mbox",{{elem1,elem2,}}] imports multiple elements.
  • The import format can be specified with Import["file","MBOX"] or Import["file",{"MBOX",elem,}].
  • See the following reference pages for full general information:
  • Importimport from a file
    CloudImportimport from a cloud object
    ImportStringimport from a string
    ImportByteArrayimport from a byte array

Import Elements

  • General Import elements:
  • "Elements" list of elements and options available in this file
    "Summary"summary of the file
    "Rules"list of rules for all available elements
  • Complete mailbox elements:
  • "MessageSummaries"list of associations giving basic elements for each message
    "MessageElements"list of associations giving main elements for each message
    "FullMessageElements"list of associations giving all available message elements
    "MessageCount"number of messages appearing in the mailbox
  • Import by default uses the "MessageSummaries" element.
  • Summary elements:
  • "From"sender names and email addresses
    "ToList"lists of recipient names and addresses
    "CcList"
  • lists of copied recipient names and addresses
  • "BccList"lists of blind-copied recipient names and addresses
    "OriginatingDate"client dates and times from email headers
    "Subject"subjects of the emails
    "BodyPreview"list of short previews of message bodies
    "HasAttachments"whether each message contains any attachments
    "MessageID"message ID for each message
  • "MessageSummary" includes all summary elements.
  • Additional message elements:
  • "FromAddress"sender raw email addresses
    "FromName"sender full names
    "ToAddressList"lists of recipient addresses
    "ToNameList"lists of recipient full names
    "CcAddressList"lists of copied recipient addresses
    "CcNameList"lists of copied recipient full names
    "BccAddressList"lists of blind-copied recipient addresses
    "BccNameList"lists of blind-copied recipient full names
    "ReplyToList"lists of reply-to names and addresses
    "ReplyToAddressList"lists of reply-to addresses
    "ReplyToNameList"lists of reply-to full names
    "Body"message bodies as strings
    "AttachmentList"lists of processed attachments as expressions
  • "MessageElements" includes all summary and message elements excluding "BodyPreview" and "HasAttachments".
  • More detailed information for each email can be imported from the following categories.
  • Message-body elements:
  • "BodyPreview"list of short previews of message bodies
    "Body"message bodies as strings
    "NewBodyContent"parts of the bodies that are not replies or forwards
    "QuotedContent"parts of the bodies that are quoted
  • Threading elements:
  • "ThreadCount"number of threads in the mailbox
    "ThreadGraph"threads in the mailbox represented as a Graph
    "ThreadEmailCount"number of emails in each thread
    "ThreadTimeInterval"interval from the first to last email in each thread
    "ThreadDuration"duration from first to last email in each thread
    "ThreadMessageIDList"list of message IDs for all emails in each thread
    "ThreadFromList"list of senders for each thread
    "ReferenceMessageIDGraph"a Graph of connections to "reference" messages
  • Message-routing elements:
  • "Precedence"declared mail precedences
    "ReturnPath"declared return paths for the mail
    "ReturnReceiptRequested"whether return receipts are requested
    "DeliveryChainHostnames"lists of hostnames on mail delivery chains
    "DeliveryChainRecords"lists of full records on mail delivery chains
  • Mail-header elements:
  • "Plaintext"complete raw email as a string
    "HeaderString"complete email headers as a string
    "HeaderRules"list of rules for all headers
    "CharacterEncoding"character encoding for email content
    "ContentType"MIME content type of email body
    "MIMEVersion"version of the MIME standard
    "ReplyToMessageID"lists of any IDs of messages to which each message replies
    "ReferenceMessageIDList"ID of "reference" messages, typically on a thread
  • Message-origination elements:
  • "OriginatingMailClient"types of originating mail clients
    "OriginatingIPAddress"IP addresses of originating client machines
    "OriginatingHostname"hostnames of originating client machines
    "OriginatingCountry"geoIP-inferred originating countries
    "OriginatingDate"client dates and times from email headers
    "OriginatingTimeZone"client time zones based on email headers
    "ServerOriginatingDate"dates and times on originating servers
    "ServerOriginatingTimeZone"time zones of originating servers
  • Attachment elements:
  • "HasAttachments"whether each message contains any attachments
    "AttachmentNames"list of attachment names
    "AttachmentList"lists of processed attachments as expressions
    "AttachmentSummaries"lists of associations giving basic attachment elements
    "AttachmentData"lists of associations giving raw encoded data and metadata
    "AttachmentDecodedData"lists of associations giving raw decoded data and metadata
    "AttachmentDetails"lists of associations giving attachment content and metadata
  • The element "AttachmentDetails" is lists giving an association for each attachment. The elements of this association are typically as follows:
  • "Name"name assigned to the attachment
    "MIMEType"MIME type of the content
    "Content"imported content
    "ContentDisposition"content disposition of the attachment
    "ModificationDate"modification date recorded for the attachment
    "ByteCount"number of bytes in the decoded content
  • The element "AttachmentDecodedData" is lists giving an association for each attachment. The elements of this association are typically as follows:
  • "Name"name assigned to the attachment
    "MIMEType"MIME type of the content
    "DecodedContent"raw decoded content as a byte array
    "ContentDisposition"content disposition of the attachment
    "ModificationDate"modification date recorded for the attachment
    "ByteCount"number of bytes in the raw decoded content
  • The element "AttachmentData" is lists giving an association for each attachment. The elements of this association are typically as follows:
  • "Name"name assigned to the attachment
    "MIMEType"MIME type of the content
    "RawContent"raw encoded content as a string
    "ContentTransferEncoding"content transfer encoding of "RawContent"
    "ContentDisposition"content disposition of the attachment
    "ModificationDate"modification date recorded for the attachment
    "ByteCount"number of bytes in the raw encoded content
  • "AttachmentSummaries" includes "Name", "MIMEType" and the "ByteCount" of the decoded contents for each attachment.
  • Subelements for partial data import for a message element elem can take message specifications in the form {elem,msgs}, where msgs can be any of the following:
  • nnth email
    -ncounts from the end
    messageidspecific email message ID
    {spec1,spec2,}a list of email indices or message IDs
  • Subelements can also be given in the form {elem,msgs,keys} for "FullMessageElements", "MessageElements" and "MessageSummaries" where keys can be any element in the association.
  • Subelements for accessing part of a thread element elem in the form of {elem,spec} can take the following specification spec:
  • nnth thread, based on the starting data
    messageidthe thread containing the specific message ID

Options

  • Import option:
  • "AttachmentRules"<||>rules to control how to import attachments
  • Possible settings for "AttachmentRules" are an association containing:
  • fmtNoneimport attachments of format fmt as None
    fmtelemImport element elem when importing fmt attachments
    fmtfunuse a pure function fun on the decoded byte array
  • The format specification fmt can be any format supported by $ImportFormats or a MIME type.

Examples

open allclose all

Basic Examples  (3)

Import message summaries for a sample MBOX file:

Determine the number of messages in a MBOX file:

Extract message subjects from a MBOX file:

Import message dates:

Scope  (6)

Determine the number of messages in the MBOX file:

Import message summaries:

Extract further information for a particular message using the "MessageID":

Import a message by its position in the mailbox:

Import specific elements of all messages in the mailbox:

Import message elements from a mailbox:

Import Elements  (62)

Available Elements  (1)

List of available elements:

Data Representation  (10)

"MessageSummaries"  (2)

Get summaries for the messages in the MBOX file:

View the message summaries as a Dataset:

"MessagesElements"  (2)

Import messages as a list of associations:

Import the second message in the MBOX file:

"FullMessageElements"  (1)

Import the first message in full:

Import specific elements of all messages in the mailbox:

"Subject"  (1)

Extract message subjects from an MBOX file:

"Body"  (1)

Import the message bodies:

"Plaintext"  (1)

Import the first message in its original internet message format:

"BodyPreview"  (2)

Import a summary of new message content:

"BodyPreview" extracts and summarizes new content in the message:

Compare with the full message content:

Content Parsing  (2)

"NewBodyContent"  (1)

Extract new unquoted body content from a message:

Compare the extracted content with the original body:

"QuotedContent"  (1)

Extract quoted content from a message:

Compare the extracted content with the original message body:

Threading Elements  (8)

"ThreadCount"  (1)

Import the number of threads in a mailbox:

"ThreadGraph"  (1)

Import the graph of messages in a mailbox:

"ThreadEmailCount"  (1)

Import the number of emails in each thread:

"ThreadTimeInterval"  (1)

Import the time interval for each thread in a mailbox:

"ThreadDuration"  (1)

Import the duration of each thread in a mailbox:

"ThreadMessageIDList"  (1)

Import all message IDs for each thread in a mailbox:

"ThreadFromList"  (1)

Import all sender names and email addresses for each thread in a mailbox:

"ReferenceMessageIDGraph"  (1)

Import the graph of all references in a mailbox:

Mail Address Header Elements  (19)

"From"  (1)

Import the sender's name and email address:

"FromName"  (1)

Import the sender's full name:

"FromAddress"  (1)

Import the sender's email address:

"ToList"  (1)

Import a list of recipient names and addresses:

"ToAddressList"  (1)

Import a list of recipient addresses:

"ToNameList"  (1)

Import a list of recipient names:

"CcList"  (1)

Import a list of names and addresses for copied recipients:

"CcAddressList"  (1)

Import a list of addresses for copied recipients:

"CcNameList"  (1)

Import a list of names for copied recipients:

"BccList"  (1)

Import a list of names and addresses for hidden recipients:

"BccAddressList"  (1)

Import a list of addresses for hidden recipients:

"BccNameList"  (1)

Import a list of names for hidden recipients:

"ReturnPath"  (1)

Import the return path:

"ReplyToList"  (1)

Import a list of reply-to names and addresses:

"ReplyToAddressList"  (1)

Import a list of reply-to addresses:

"ReplyToNameList"  (1)

Import a list of reply-to names:

"MessageID"  (1)

Retrieve a list of message IDs:

"ReplyToMessageID"  (1)

Retrieve the ID of the replied-to message:

"ReferenceMessageIDList"  (1)

Retrieve the ID of the referenced messages:

General Header Elements  (4)

"HeaderString"  (1)

Import the complete mail header of the first message as a string:

"HeaderRules"  (1)

Import the header of the first message as a list of rules:

"CharacterEncoding"  (1)

Import the character encoding for the first message:

"ContentType"  (1)

Import the MIME content type of the email body:

Advanced Header Elements  (11)

"Precedence"  (1)

Import the declared mail precedence:

"ReturnReceiptRequested"  (1)

Import any return-receipt addresses:

"DeliveryChainHostnames"  (1)

Import the list of host names in the delivery chain:

"DeliveryChainRecords"  (1)

Import the delivery chain record as an Association:

"OriginatingMailClient"  (1)

Determine the mail client used to send each message:

"OriginatingIPAddress"  (1)

Import the IP address of the machine used to send a message:

"OriginatingHostname"  (1)

Import the hostname of the machine used to send a message:

"OriginatingDate"  (1)

Import the date and time each message was sent:

"OriginatingTimezone"  (1)

Import the client time zone from the mail headers:

"ServerOriginatingDate"  (1)

Import the date and time on the originating server:

"ServerOriginatingTimezone"  (1)

Import the timezone of the originating server:

Attachment Elements  (7)

"HasAttachments"  (1)

Determine whether a message has an attachment:

"AttachmentNames"  (1)

Get the file names of any attachments:

"AttachmentList"  (1)

Import any attachments as a list of rules:

"AttachmentSummaries"  (1)

Get summaries for attachments in a particular message:

"AttachmentData"  (1)

Import the raw attachment data of the second message:

"AttachmentDecodedData"  (1)

Import the decoded attachment data of the second message:

"AttachmentDetails"  (1)

Import the attachments and details of the second message:

Import Options  (3)

"AttachmentRules"  (3)

Import WAV files as the "Length" element:

Do not import any GIF images:

Specify a pure function to control the import of GIF images:

Applications  (6)

Basic Applications  (2)

Import the subject of all messages from "Alice Johnson":

Find the position of all messages sent after a particular date:

Content Parsing  (1)

Find all mentions of the word "lunch" in an MBOX, ignoring quoted content:

Find all mentions of the word "lunch" in an MBOX, including duplicates in quoted content:

Handling Attachments  (2)

Import messages with attachments larger than 50000 bytes:

Manually import the first attachment of the second message from the raw ASCII-encoded string:

Analyzing Email Threads  (1)

Gather some basic elements from each email in an MBOX and show each message ID:

Find all emails that are a reply to another email:

Construct a graph with message IDs being the vertices and the reply-to connections being the edges, labeling each message with the name of the sender:

Tooltip each message with the new body content: