ArrowIPC (.arrow, .arrows, .feather, .ftr)

Background & Context

    • Registered MIME types: application/vnd.apache.arrow.file, application/vnd.apache.arrow.stream
    • Arrow IPC columnar data format.
    • Used for efficient serialization of large columnar datasets.
    • The primitive unit of serialized data in the columnar format is called record batch.
    • Arrow IPC file format is used for serializing a fixed number of record batches and supports random access.
    • Arrow IPC streaming format is used for sending an arbitrary-length sequence of record batches.
    • Feather version 2 is a file format represented as the Arrow IPC file on disk.
    • Feather version 1 is a legacy file format distinct from Arrow IPC files.
    • Developed by the Apache Software Foundation.
    • Binary file format.
    • Supports multiple compression methods.

Import & Export

  • Import["file.arrow"] imports an ArrowIPC file as a Tabular object.
  • Import["file.arrow",elem] imports the specified elements.
  • Import["file.arrow",{elem,subelem1,}] imports subelements subelemi, useful for partial data import.
  • The import format can be specified with Import["file","ArrowIPC"] or Import["file",{"ArrowIPC",elem,}].
  • Export["file.arrow",expr] exports a Tabular object to ArrowIPC file format.
  • Supported expressions expr include:
  • {v1,v2,}a single column of data
    {{v11,v12,},{v21,v22,},}lists of rows of data
    arrayan array such as SparseArray, QuantityArray, etc.
    dataseta Dataset or a Tabular object
  • See the following reference pages for full general information:
  • Import, Exportimport from or export to a file
    CloudImport, CloudExportimport from or export to a cloud object
    ImportString, ExportStringimport from or export to a string
    ImportByteArray, ExportByteArrayimport from or export to a byte array

Import Elements

  • General Import elements:
  • "Elements" list of elements and options available in this file
    "Summary"summary of the file
    "Rules"list of rules for all available elements
  • Data representation elements:
  • "Data"two-dimensional array
    "Dataset"table data as a Dataset
    "Tabular"a Tabular object
  • Import by default uses the "Tabular" element.
  • Subelements for partial data import for the "Tabular" element can take row and column specifications in the form {"Tabular",rows,cols}, where rows and cols can be any of the following:
  • nnth row or column
    -ncounts from the end
    n;;mfrom n through m
    n;;m;;sfrom n through m with steps of s
    {n1,n2,}specific rows or columns ni
  • Data descriptor elements:
  • "ColumnLabels"names of columns
    "ColumnTypes"association with data type for each column
    "Schema"TabularSchema object
  • Metadata elements:
  • "ColumnCount"number of columns stored in file
    "Dimensions"data dimensions
    "RowCount"number of rows stored in file
    "MetaInformation"metadata

Options

  • General Import options:
  • IncludeMetaInformationAllmetadata types to import
    "UseMemoryMappedFile"Truewhether to use memory-mapped reader
  • General Export options:
  • "Compression"Nonecompression method
    CompressionLevelAutomaticcompression level
    "Schema"Automaticschema used to construct Tabular object
    "Streamable"Falseif true, then Arrow IPC streaming format is used
  • The following settings for "Compression" are supported:
  • Noneno compression
    "LZ4Frame"LZ4 Frame compression
    "ZSTD"ZSTD compression

Examples

open allclose all

Basic Examples  (3)

Import Tabular object from Arrow IPC file:

Import the file summary:

Export Tabular object to Arrow IPC file:

Scope  (3)

Import  (3)

Show all elements available in the file:

By default, a Tabular object is returned:

Import column types:

Import Elements  (14)

"ColumnCount"  (1)

Get the number of columns:

"ColumnLabels"  (1)

Read column names:

"ColumnTypes"  (1)

Import column types:

"Data"  (2)

Get the data from a file:

Import only selected rows:

Import only selected columns:

"Dataset"  (2)

Get the data as a Dataset:

Import only selected rows:

Import only selected columns:

"Dimensions"  (1)

Import data dimensions:

"MetaInformation"  (1)

Import metadata:

"RowCount"  (1)

Get the number of rows:

"Schema"  (1)

Get the TabularSchema object:

"Summary"  (1)

Get the file summary:

"Tabular"  (2)

Get the data from a file as a Tabular object:

Import only selected rows:

Import only selected columns:

Import Options  (3)

IncludeMetaInformation  (1)

By default, all metadata stored in a file is imported and embedded in the Tabular object:

Do not import metadata:

"Schema"  (1)

By default, column labels and their types stored in a file are used when Tabular or Dataset objects are imported:

Use "Schema" option to specify column labels and types:

"UseMemoryMappedFile"  (1)

By default, memory mapping is disabled. "UseMemoryMappedFile"->True to enable memory mapping:

Export Options  (6)

"Compression"  (2)

Compression is disabled by default:

Compare supported compression methods:

CompressionLevel  (2)

By default, Automatic value of CompressionLevel is used. It corresponds to a different default value for each compression method.

Use maximal compression for each method:

"Streamable"  (2)

By default, Export uses Arrow IPC file format:

Use "Streamable" option to generate Arrow IPC streaming format: