WARC (.warc)

  • See the following reference pages for full general information:
  • Importimport from a file
    CloudImportimport from a cloud object
    ImportStringimport from a string
    ImportByteArrayimport from a byte array
  • Background & Context

      • MIME type: application/warc.
      • Web archive format.
      • Used to archive full webpages.
      • Revision of the Internet Archive's ARC File Format.
      • Supports ISO 28500.

    Import Elements

    • General Import elements:
    • "Elements" list of elements and options available in this file
      "Rules"full list of rules for each element and option
      "Options"list of rules for options, properties and settings
    • Additional elements include:
    • "Dataset" dataset containing common interpreted WARC elements
      "RawDataset"dataset containing all interpreted WARC elements
      "RawStringDataset"dataset containing common unformatted WARC headers
      "RawData"dataset containing all unformatted WARC headers
    • The "Dataset" and "RawDataset" elements interpret a date as a DateObject, and a payload as an HTTPRequest.
    • The "RawStringDataset" and "RawData" elements do not perform any interpretation.
    • The "Dataset" and "Headers" elements always return the following information for each WARC element:
    • "URL"URL of the element
      "ContentType"MIME content type
      "Content"the main content of the element
      "AccessDate"when the resource was accessed
      "WARCType"the type of WARC element
      "WARCVersion"version for the WARC element
      "WARCRecordID"unique element identifier
    • The "RawDataset" and "RawData" elements may return additional elements, such as "WARC-Block-Digest".


    Basic Examples  (1)

    Import a WARC file:

    Import all headers: