Wolfram Language & System Documentation Center

ORC (.orc)

Import and Export support ORC Version 0.12.

Background & Context

- Efficient, general-purpose, column-oriented data format.
- Developed by the Apache Software Foundation.
- ORC is an acronym for Optimized Row Columnar.
- Binary file format.
- Supports multiple compression methods.

Import & Export

Import["file.orc"] imports an ORC file as a Tabular object.
Import["file.orc",elem] imports the specified elements.
Import["file.orc",{elem,subelem₁,…}] imports subelements subelem_i, useful for partial data import.
The import format can be specified with Import["file","ORC"] or Import["file",{"ORC",elem,…}].
Export["file.orc",expr] exports a Tabular object to ORC file format.
Supported expressions expr include:

	{v₁,v₂,…}	a single column of data
	{{v₁₁,v₁₂,…},{v₂₁,v₂₂,…},…}	lists of rows of data
	array	an array such as SparseArray, QuantityArray, etc.
	tseries	a TimeSeries, EventSeries or a TemporalData object
	dataset	a Dataset or a Tabular object

See the following reference pages for full general information:

	Import, Export	import from or export to a file
	CloudImport, CloudExport	import from or export to a cloud object
	ImportString, ExportString	import from or export to a string
	ImportByteArray, ExportByteArray	import from or export to a byte array

Import Elements

General Import elements:
"Elements" list of elements and options available in this file

"Summary" summary of the file

"Rules" list of rules for all available elements
Data representation elements:

	"Data"	two-dimensional array
	"Dataset"	table data as a Dataset
	"EventSeries"	table data as an EventSeries
	"Tabular"	a Tabular object
	"TimeSeries"	table data as a TimeSeries

Import by default uses the "Tabular" element.
Subelements for partial data import for the "Tabular" element can take row and column specifications in the form {"Tabular",rows,cols}, where rows and cols can be any of the following:

	n	n^th row or column
	-n	counts from the end
	n;;m	from n through m
	n;;m;;s	from n through m with steps of s
	{n₁,n₂,…}	specific rows or columns n_i

Column specifications can also be any of the following:
"col" single column "col"

{col₁,col₂,…} list of column names col_i
Data descriptor elements:
"ColumnLabels" names of columns

"ColumnTypes" association with data type for each column

"Schema" TabularSchema object
Metadata elements:

	"ColumnCount"	number of columns stored in file
	"Dimensions"	data dimensions
	"RowCount"	number of rows stored in file
	"MetaInformation"	metadata

Options

General Import options:

IncludeMetaInformation	All	metadata types to import
"Schema"	Automatic	schema used to construct Tabular object
"TimeColumn"	Automatic	column to use for times in "EventSeries" and "TimeSeries" elements

Possible settings for the "Schema" option include:

	schema	a complete TabularSchema specification
	propval	a schema property and value (see reference page for TabularSchema)
	<\|"prop₁"val₁,…\|>	an association of schema properties and values

General Export options:
"Compression" None compression method

"CompressionStrategy" "Speed" compression strategy
The following settings for "Compression" are supported:
None no compression

"LZ4" LZ4 compression

"GZIP" GZIP Hadoop compression

"Snappy" Snappy compression

"ZSTD" ZSTD compression
The following settings for "CompressionStategy" are supported:
"Size" optimize size of file

"Speed" optimize the speed of export

Examples

open all close all

Basic Examples (3)

Import Tabular object from ORC file:

Wolfram Language code: Import["ExampleData/USstates.orc"]

Import the file summary:

Wolfram Language code: Import["ExampleData/USstates.orc", "Summary"]

Export Tabular object to ORC file:

Wolfram Language code: tabular = Import["ExampleData/USstates.orc"];

Wolfram Language code: Export["file.orc", tabular]

Scope (3)

Import (3)

Show all elements available in the file:

Wolfram Language code: Import["ExampleData/USstates.orc", "Elements"]

By default, a Tabular object is returned:

Wolfram Language code: Import["ExampleData/USstates.orc"]//TabularQ

Import column types:

Wolfram Language code: Import["ExampleData/USstates.orc", "ColumnTypes"]

Import Elements (19)

"ColumnCount" (1)

Get the number of columns:

Wolfram Language code: Import["ExampleData/USstates.orc", "ColumnCount"]

"ColumnLabels" (1)

Read column names:

Wolfram Language code: Import["ExampleData/USstates.orc", "ColumnLabels"]

"ColumnTypes" (1)

Import column types:

Wolfram Language code: Import["ExampleData/USstates.orc", "ColumnTypes"]

"Data" (3)

Get the data from a file:

Wolfram Language code: Import["ExampleData/USstates.orc", "Data"]//Short

Import only selected rows:

Wolfram Language code: Import["ExampleData/USstates.orc", {"Data", 1 ;; 3}]

Import only selected columns:

Wolfram Language code: Import["ExampleData/USstates.orc", {"Data", All, {1, 3}}]//Short

Import only selected columns using column names:

Wolfram Language code: Import["ExampleData/USstates.orc", {"Data", All, {"Name", "Area"}}]//Short

"Dataset" (3)

Get the data as a Dataset:

Wolfram Language code: Import["ExampleData/USstates.orc", "Dataset"]

Import only selected rows:

Wolfram Language code: Import["ExampleData/USstates.orc", {"Dataset", 1 ;; 3}]

Import only selected columns:

Wolfram Language code: Import["ExampleData/USstates.orc", {"Dataset", All, {1, 3}}]

Import only selected columns using column names:

Wolfram Language code: Import["ExampleData/USstates.orc", {"Dataset", All, {"Name", "Area"}}]

"Dimensions" (1)

Import data dimensions:

Wolfram Language code: Import["ExampleData/USstates.orc", "Dimensions"]

"EventSeries" (1)

Export a Tabular object to an ORC file:

Wolfram Language code: file = Export["file.orc", ResourceData["Sample Tabular Data: Sales Data"]]

Import an ORC file as an EventSeries:

Wolfram Language code: Import[file, "EventSeries"]

Import a single row from an ORC file:

Wolfram Language code: Import[file, {"EventSeries", 5}]

Import some specific rows from an ORC file:

Wolfram Language code: Import[file, {"EventSeries", {1, 5, 7}}]

Import the first 10 rows of an ORC file:

Wolfram Language code: Import[file, {"EventSeries", 1 ;; 10}]

Import only selected columns using column names:

Wolfram Language code: Import[file, {"EventSeries", All, {"Product", "Date", "Quantity"}}]

"MetaInformation" (1)

Import metadata:

Wolfram Language code: Import["ExampleData/USstates.orc", "MetaInformation"]

"RowCount" (1)

Get the number of rows:

Wolfram Language code: Import["ExampleData/USstates.orc", "RowCount"]

"Schema" (1)

Get the TabularSchema object:

Wolfram Language code: Import["ExampleData/USstates.orc", "Schema"]

"Summary" (1)

Get the file summary:

Wolfram Language code: Import["ExampleData/USstates.orc", "Summary"]

"Tabular" (3)

Get the data from a file as a Tabular object:

Wolfram Language code: Import["ExampleData/USstates.orc", "Tabular"]

Import only selected rows:

Wolfram Language code: Import["ExampleData/USstates.orc", {"Tabular", 1 ;; 5}]

Import only selected columns:

Wolfram Language code: Import["ExampleData/USstates.orc", {"Tabular", All, {1, 3}}]

Import only selected columns:

Wolfram Language code: Import["ExampleData/USstates.orc", {"Tabular", All, {"Name", "Area"}}]

"TimeSeries" (1)

Export a Tabular object to an ORC file:

Wolfram Language code: file = Export["file.orc", ResourceData["Sample Tabular Data: Sales Data"]]

Import an ORC file as a TimeSeries:

Wolfram Language code: Import[file, "TimeSeries"]

Import a single row from an ORC file:

Wolfram Language code: Import[file, {"TimeSeries", 5}]

Import some specific rows from an ORC file:

Wolfram Language code: Import[file, {"TimeSeries", {1, 5, 7}}]

Import the first 10 rows of an ORC file:

Wolfram Language code: Import[file, {"TimeSeries", 1 ;; 10}]

Import only selected columns using column names:

Wolfram Language code: Import[file, {"TimeSeries", All, {"Product", "Date", "Quantity"}}]

Import Options (3)

IncludeMetaInformation (1)

By default, all metadata stored in a file is imported and embedded in the Tabular object:

Wolfram Language code:

tabular = Import["ExampleData/USstates.orc"];
tabular["Metadata"]

Do not import metadata:

Wolfram Language code:

tabular = Import["ExampleData/USstates.orc", IncludeMetaInformation -> None];
tabular["Metadata"]

"Schema" (1)

Export Tabular object to Parquet file:

Wolfram Language code:

file = Export["out.parquet", Tabular[Association["RawSchema" -> Association["ColumnProperties" -> 
     Association["A" -> Association["ElementType" -> "String"], 
      "B" -> Association["ElementType" -> "String"]], "KeyColumns" -> None, 
    "Backend" -> "WolframKernel"], "BackendData" -> 
   Association["ColumnData" -> DataStructure["ColumnTable", 
      {{TabularColumn[Association["Data" -> {{0, {0, 11, 22, 33, 44, 55}, 
             "Jan 03 2006Jan 04 2006Jan 05 2006Jan 06 2006Jan 09 2006"}, {}, None}, 
          "ElementType" -> "String"]], TabularColumn[Association[
          "Data" -> {{3, {0, 5, 10, 15, 20, 25}, "11.8212.0412.0911.8812.43"}, {}, None}, 
          "ElementType" -> "String"]]}}]]]]];

By default, column labels and their types stored in a file are used when Tabular or Dataset objects are imported:

Wolfram Language code:

tabular = Import[file];
tabular["ColumnTypes"]

Use "Schema" option to specify column labels and types:

Wolfram Language code:

tabular = Import[file, "Schema" -> {"ColumnKeys" -> {"Date", "Value"}, "ElementType" -> {"Date" -> "Date", "Value" -> "Real32"}}];
tabular["ColumnTypes"]

"TimeColumn" (1)

Export a Tabular object to an ORC file:

Wolfram Language code:

file = Export["file.orc", Tabular[Association["RawSchema" -> Association["ColumnProperties" -> 
     Association["Date" -> Association["ElementType" -> TypeSpecifier["Date"]["Integer32", "Day", 
          "Gregorian", None]], "Value" -> Association["ElementType" -> "Real32"]], 
    "KeyColumns" -> None, "Backend" -> "WolframKernel"], "Options" -> {}, 
  "BackendData" -> Association["ColumnData" -> DataStructure["ColumnTable", 
      {{TabularColumn[Association["Data" -> {5, {{NumericArray[{13150, 13151, 13152, 13153, 13156}, 
               "Integer32"], {}, None}}, None}, "ElementType" -> "Date"["Integer32", "Day", 
            "Gregorian", None]]], TabularColumn[Association[
          "Data" -> {NumericArray[{11.819999694824219, 12.039999961853027, 12.09000015258789, 
             11.880000114440918, 12.430000305175781}, "Real32"], {}, None}, 
          "ElementType" -> "Real32"]]}}]]]]];

By default, the time column is selected automatically for "TimeSeries" and "EventSeries" elements:

Wolfram Language code: Import[file, "TimeSeries"]

Use the "TimeColumn" option to specify the time column:

Wolfram Language code: Import[file, "TimeSeries", "TimeColumn" -> "Value"]

Export Options (4)

"Compression" (2)

Compression is disabled by default:

Wolfram Language code:

tabular = Import["ExampleData/USstates.orc"];
Export["out.orc", tabular]//FileSize

Compare supported compression methods:

Wolfram Language code:

tabular = Import["ExampleData/USstates.orc"];
AssociationMap[(FileSize@Export["out.orc", tabular, "Compression" -> #])&, {"LZ4", "GZIP", "Snappy", "ZSTD"}]

"CompressionStrategy" (2)

By default, "Speed" value of "CompressionStrategy" is used:

Wolfram Language code:

tabular = Import["ExampleData/USstates.orc"];
AssociationMap[(FileSize@Export["out.orc", tabular, "CompressionStrategy" -> "Speed", "Compression" -> #])&, {"LZ4", "GZIP", "Snappy", "ZSTD"}]

Use "Size" compression strategy:

Wolfram Language code:

tabular = Import["ExampleData/USstates.orc"];
AssociationMap[(FileSize@Export["out.orc", tabular, "CompressionStrategy" -> "Size", "Compression" -> #])&, {"LZ4", "GZIP", "Snappy", "ZSTD"}]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

ORC (.orc)

Background & Context

Import & Export

Import Elements

Options

Examples

Basic Examples (3)

Scope (3)

Import (3)

Import Elements (19)

"ColumnCount" (1)

"ColumnLabels" (1)

"ColumnTypes" (1)

"Data" (3)

"Dataset" (3)

"Dimensions" (1)

"EventSeries" (1)

"MetaInformation" (1)

"RowCount" (1)

"Schema" (1)

"Summary" (1)

"Tabular" (3)

"TimeSeries" (1)

Import Options (3)

IncludeMetaInformation (1)

"Schema" (1)

"TimeColumn" (1)

Export Options (4)

"Compression" (2)

"CompressionStrategy" (2)

	"col"	single column "col"
	{col₁,col₂,…}	list of column names col_i

	None	no compression
	"LZ4"	LZ4 compression
	"GZIP"	GZIP Hadoop compression
	"Snappy"	Snappy compression
	"ZSTD"	ZSTD compression

ORC (.orc)

Background & Context

Import & Export

Import Elements

Options

Examples

Basic Examples (3)

Scope (3)

Import (3)

Import Elements (19)

"ColumnCount" (1)

"ColumnLabels" (1)

"ColumnTypes" (1)

"Data" (3)

"Dataset" (3)

"Dimensions" (1)

"EventSeries" (1)

"MetaInformation" (1)

"RowCount" (1)

"Schema" (1)

"Summary" (1)

"Tabular" (3)

"TimeSeries" (1)

Import Options (3)

IncludeMetaInformation (1)

"Schema" (1)

"TimeColumn" (1)

Export Options (4)

"Compression" (2)

"CompressionStrategy" (2)

See Also

Related Guides

History