CSV (.csv)

背景

    • MIME 类型:text/comma-separated-valuestext/csv
    • CSV 表格数据格式.
    • 按行来存储数值和文本信息,使用逗号分隔字段.
    • 通常作为交换格式用于电子表格应用程序.
    • CSV 是 Comma-Separated Values(逗号分隔的值)的缩写.
    • 纯文本格式.
    • 与 TSV 类似.
    • 支持 RFC 4180.

Import 与 Export

  • Import["file.csv"] 返回包含字符串和数字的列表的列表,表示存储在文件中的行与列.
  • Import["file.csv",elem] 导入指定的参数.
  • Import["file.csv",{elem,subelem1,}] 导入子参数 subelemi,对于导入部分数据非常有用.
  • 导入格式可以用 Import["file","CSV"]Import["file",{"CSV",elem,}] 指定.
  • Export["file.csv",expr]expr 创建一个 CSV 文件.
  • 支持 expr 的表达式包括:
  • {v1,v2,}单列数据
    {{v11,v12,},{v21,v22,},}数据的列表行
    array例如 SparseArrayQuantityArray 等的数组
    tseries一个 TimeSeriesEventSeries 或一个 TemporalData 对象
    Dataset[]一个数据集
    Tabular[]表格对象
  • 请到以下参考页面了解完整的基本信息:
  • Import, Export从文件导入或导出到文件
    CloudImport, CloudExport从云对象导入或导出到云对象
    ImportString, ExportString从字符串导入或导出到字符串
    ImportByteArray, ExportByteArray从字节数组导入或导出到字节数组

Import 参数

  • Import 通用参数:
  • "Elements" 该文件可用的参数和选项列表
    "Summary"文件摘要
    "Rules"所有可用参数的规则列表
  • 表示数据的参数:
  • "Data"二维数组
    "Grid"将数据作为 Grid 对象表格
    "RawData"字符串的二维数组
    "Dataset"将数据作为 Dataset
    "Tabular"表格数据作为 TableView 对象
  • Data descriptor elements:
  • "ColumnLabels"names of columns
    "ColumnTypes"association of column names and types
    "Schema"TabularSchema object
  • 默认情况下,ImportExport 使用"Data"参数.
  • 导入部分数据的子参数,任何数据表示参数 elem 可以使用 {elem, rows, cols} 格式指定行列,其中 rowscols 可为以下任意:
  • nn 行或列
    -n从结尾计算
    n;;mnm
    n;;m;;snm,步长为 s
    {n1,n2,}指定行或列 ni
  • 元数据参数:
  • "ColumnCount"列数
    "Dimensions"行数列表和最大列数
    "RowCount"行数

选项

  • ImportExport 选项:
  • "EmptyField"""如何表示空白字段
    "QuotingCharacter""\""用于分割非数值字段的字符
  • 包含逗号和分隔符的数据字段,通常用引号字符套嵌. 默认情况下,Export 将双引号字符作为分隔符. 用 "QuotingCharacter" 指定不同字符.
  • 默认情况下,并不导入双引号字符分隔的文本字段.
  • Import 选项:
  • CharacterEncoding"UTF8ISOLatin1"文件中使用的原始字符编码
    "ColumnTypeDetectionDepth"Automaticnumber of rows used for header detection
    "CurrencyTokens"None当导入数值时会跳过货币单位
    "DateStringFormat"None日期格式,按 DateString 规范给出
    "FieldSeparator"","string token taken to separate columns
    "FillRows"Automatic是否填满行最大化列长
    "HeaderLines"Automaticnumber of lines to assume as headers
    "IgnoreEmptyLines"False是否忽略空白行
    MissingValuePatternAutomaticpatterns used to specify missing elements
    "NumberPoint""."小数点字符串
    "Numeric"Automaticwhether to import data fields as numbers if possible
    "Schema"Automaticschema used to construct Tabular object
    "SkipInvalidLines"Falsewhether to skip invalid lines
    "SkipLines"Automatic在文件开头跳过的行数
  • 默认情况下,Import 试图将数据解释为 "UTF8" 编码文本. 如果文件中任何储存序列不能用 "UTF8" 表示,Import 将使用 "ISOLatin1" 代替.
  • CharacterEncoding -> Automatic, Import 尝试推断文件中的字符编码.
  • "HeaderLines""SkipLines" 的可能设置为:
  • Automatictry to determine number of rows to skip or use as header
    n跳过的 n 行或作为 Dataset 开头使用
    {rows,cols}跳过的行和列或作为开头使用
  • Import 将表格输入转换为由 "DateStringFormat" 指定格式的 DateObject.
  • Export 选项:
  • AlignmentNone数据与表格列的对齐方式
    CharacterEncoding"UTF8"文件中使用的原始字符编码
    "FillRows"False是否填满行最大化列长
    "IncludeQuotingCharacter"Automaticwhether to add quotations around exported values
    "TableHeadings"Automatic表格列和行的标头
  • Alignment 可用设置为 NoneLeftCenterRight.
  • "IncludeQuotingCharacter" can be set to the following values:
  • Nonedo not enclose any values in quotes
    Automaticonly enclose values in quotes when needed
    Allenclose all valid values in quotes
  • "TableHeadings" 可以设置为以下值:
  • Noneskip column labels
    Automaticexport column labels
    {"col1","col2",}列标签列表
    {rhead,chead}指定行和列的单独标签
  • Export 使用运行 Wolfram 语言的计算机系统的常用规范编码行分隔字符.

范例

打开所有单元关闭所有单元

基本范例  (3)

导入 CSV 文件:

从文件读取并绘制所有数据:

Import summary of a CSV file:

导出表达式中一个数组至 CSV:

范围  (8)

Import  (4)

Import metadata from a CSV file:

Import a CSV file as a Tabular object with automatic header detection:

Import without headers, while skipping the first line:

Import a sample row of a CSV:

Analyze a single column of a file; start by looking at column labels and their types:

Get all values for one column:

Compute the mean:

Export  (4)

导出一个 Tabular 对象:

使用 "TableHeadings" 选项从 Tabular 对象中去除开头:

导出一个 TimeSeries:

导出一个 EventSeries:

导出一个 QuantityArray:

导入参数  (27)

"ColumnCount"  (1)

Get the number of columns from a CSV file:

"ColumnLabels"  (1)

Get the inferred column labels from a CSV file:

"ColumnTypes"  (1)

Get the inferred column types from a CSV file:

"Data"  (6)

导入一个 CSV 文件作为值的二维列表:

这也是默认参数:

从 CSV 文件导入单行:

从 CSV 文件导入部分指定行:

从 CSV 文件导入前 10 行:

从 CSV 文件中导入单行和列:

从 CSV 文件导入单行:

"Dataset"  (2)

将 CSV 文件作为 Dataset 导入:

Use "HeaderLines" and "SkipLines" options to only import the data of interest:

"Dimensions"  (2)

从 CSV 文件导入维数:

若文件中的所有行不含有相同列数,将使用最大行数:

"Grid"  (1)

将 CSV 数据作为 Grid 导入:

"RawData"  (3)

将 CSV 数据作为字符串行导入:

对比 "Data"

默认使用 "RawData", "Numeric"->False

使用 "Numeric"->True:

默认使用 "RawData", "FillRows"->True

使用 "FillRows"->False

"RowCount"  (1)

从 CSV 文件获取列数:

"Schema"  (1)

Get the TabularSchema object:

"Summary"  (1)

CSV 文件摘要:

"Tabular"  (7)

Import a CSV file as a Tabular object:

Use "HeaderLines" and "SkipLines" options to only import the data of interest:

Import a single row:

Import multiple rows:

Import the first 5 rows:

Import a single element at a given row and column:

Import a single column:

导入选项  (15)

CharacterEncoding  (1)

字符串编码可通过 $CharacterEncodings 设定为人任意值:

"ColumnTypeDetectionDepth"  (1)

By default, several dozen rows from the beginning of the file are used to detect column types:

Use more rows to detect column types:

"CurrencyTokens"  (1)

自动跳过货币标志:

Use the "CurrencyTokens" option to skip selected currency tokens:

"DateStringFormat"  (1)

使用指定数据格式将数据转换为 DateObject

默认情况下,没有进行任何转换:

"EmptyField"  (1)

对 CSV 数据中的空字段指定默认值:

"FieldSeparator"  (1)

By default, "," is used as a field separator:

Use tab as a field separator:

"FillRows"  (1)

对于 "Data" 参数,行长度被自动保存:

填补行:

对于 "RawData" 参数,默认导入完整字段:

"HeaderLines"  (1)

The header line is automatically detected by default:

Use "HeaderLines" option when automatic header detection is incorrect:

Specify row headers:

Specify row and column headers:

"IgnoreEmptyLines"  (1)

"IgnoreEmptyLines" 从导入数据中除去无数据的行:

MissingValuePattern  (1)

By default, an automatic set of values is considered missing:

Use MissingValuePatternNone to disable missing element detection:

Use string patterns to find missing elements:

"Numeric"  (1)

"Numeric"->True 解释数字:

在默认情况下,所有都导入为字符串:

"NumberPoint"  (1)

By default, "." is used to specify decimal point character for floating-point data:

Use "NumberPoint" option to specify decimal point character for floating-point data:

"QuotingCharacter"  (1)

The default quoting character is a double quote:

A different quoting character can be specified:

"Schema"  (1)

Import automatically infers column labels and types from data stored in a CSV file:

Use "Schema" option to specify column labels and types:

"SkipLines"  (1)

CSV 文件可能包含命令行:

跳过命令行:

跳过命令行,并用下一行作为 Tabular 开头:

导出选项  (7)

排列  (1)

默认情况下,对于任何排列不添加任意附加字符串:

左对齐列的值:

中间对齐列的值:

CharacterEncoding  (1)

字符串编码可通过 $CharacterEncodings 设定为任意值:

"EmptyField"  (1)

在默认情况下,空参数导出为空字符串:

对空参数指定不同值:

"FillRows"  (1)

在默认情况下,行长度被自动保存:

使用 "FillRows"->True 来导出完整字段:

"IncludeQuotingCharacter"  (1)

By default, Export only exports quotation characters for values that need them:

Use "IncludeQuotingCharacter"All to enclose all values in quotes:

Use "IncludeQuotingCharacter"None to export all values without quotes. Note that headers are always enclosed in quotes:

"QuotingCharacter"  (1)

The default quoting character used for non-numeric elements is a double quote:

Specify a different quoting character:

Use "QuotingCharacter"->"" to export all values without quotes. Note that headers are always enclosed in quotes:

"TableHeadings"  (1)

By default, column headers are exported:

Use "TableHeadings"None to skip column headers:

Export data using custom column headers:

Export data using custom column and row headers:

应用  (1)

将欧洲城市和其人口列表导出到 CSV 文件:

导回数据并转换为表达式:

可能存在的问题  (13)

If all rows in the file do not have the same number of columns, some rows may be considered as invalid:

Entries of the format "nnnEnnn" are interpreted as numbers with scientific notation:

Use the "Numeric" option to override this interpretation:

Numeric interpretation may result in a loss of precision:

Use the "Numeric" option to override this interpretation:

Starting from Version 14.2, currency tokens are not automatically skipped:

Use the "CurrencyTokens" option to skip such tokens:

Starting from Version 14.2, quoting characters are added when the column of integer values contains numbers greater than Developer`$MaxMachineInteger:

Use "IncludeQuotingCharacter"->None to get the previous result:

Starting from Version 14.2, some strings are automatically considered missing:

Use MissingValuePatternNone to override this interpretation:

Starting from Version 14.2, real numbers with 0 fractional part are exported as integers:

Use "Backend"->"Table" to get the previous result:

Starting in Version 14.2, there is an automatic column type identification:

Use "Backend""Table" if nonhomogeneous types in columns are expected:

Starting from Version 14.2, integers greater than Developer`$MaxMachineInteger are imported as real numbers:

Use "Backend"->"Table" to get the previous result:

Starting from Version 14.2, date and time columns of Tabular objects are exported using DateString:

Use "Backend"->"Table" to get the previous result:

Some CSV data generated from older versions of the Wolfram Language may have incorrectly delimited text fields and will not import as expected in Version 11.2 or higher:

Using "QuotingCharacter""" will give the previously expected result:

The top-left corner of data is lost when importing a Dataset with row and column headers:

Dataset may look different depending on the dimensions of the data: