TSV (.tsv)
背景

-
- MIME 类型:text/tab-separated-values
- TSV 表格数据格式.
- 按行存储数值和文本信息,使用制表符分隔字段.
- TSV 是 Tab-Separated Values 的缩写.
- 纯文本格式.
- 类似于 CSV.
Import 与 Export

- Import["file.tsv"] 返回字符串和数字的二维数组,表示存储在文件中的行列.
- Import["file.tsv",elem] 从一个 TSV 文件中导入指定的参数.
- Import["file.tsv",{elem,sub1,…}] 导入子参数,特别是有助于导入部分数据的参数.
- 导入格式可以用 Import["file","TSV"] 或 Import["file",{"TSV",elem,…}] 指定.
- Export["file.tsv",expr] 从 expr 创建 TSV 文件.
- 支持的 expr 表达式包括:
-
{v1,v2,…} 单列数据 {{v11,v12,…},{v21,v22,…},…} 数据行列表 array 数组包括 SparseArray、QuantityArray 等 tseries TimeSeries、EventSeries 或 TemporalData 对象 Dataset[…] 数据集 Tabular[…] a tabular object - 请到以下参考页面了解完整的基本信息:
-
Import, Export 从文件导入或导出到文件 CloudImport, CloudExport 从云对象导入或导出到云对象 ImportString, ExportString 从字符串导入或导出到字符串 ImportByteArray, ExportByteArray 从字节数组导入或导出到字节数组
Import 参数


- Import 通用参数:
-
"Elements" 该文件可用的参数和选项列表 "Rules" 所有可用参数的规则列表 "Summary" 文件摘要 - 表示数据的参数:
-
"Data" 二维数组 "Grid" 作为 Grid 对象的表格数据 "RawData" 二维数组字符串 "Dataset" 作为 Dataset 列表数据 "Tabular" table data as a Tabular object - Data descriptor elements:
-
"ColumnLabels" names of columns "ColumnTypes" association of column names and types "Schema" TabularSchema object - 默认情况下,Import 与 Export 使用 "Data" 参数.
- 对于部分数据导入,任何数据表示参数 elem 可用 {elem, rows, cols} 格式指定行和列,其中 rows 和 cols 可为以下任意:
-
n 第 n 行或列 -n 从末尾计数 n;;m 从 n 到 m n;;m;;s 从 n 到 m 步长为 s {n1,n2,…} 特定的行和列 ni - 元数据参数:
-
"ColumnCount" 列数 "Dimensions" 行数列表和最大列数 "RowCount" 行数
选项




- Import 与 Export 选项:
-
"EmptyField" "" 如何表示空白字段 "QuotingCharacter" "\"" character used to delimit non-numeric fields - 数据区域内包含逗号或行分隔符通常用双引号字符包围. 默认情况下,Export 使用创引号字符作为分隔符. 指定不同字符使用 "QuotingCharacter".
- 默认情况下,并不导入双引号字符分隔的文本字段.
- Import 选项:
-
CharacterEncoding "UTF8ISOLatin1" 文件中行字符串使用的编码 "ColumnTypeDetectionDepth" Automatic number of rows used for header detection "CurrencyTokens" None 当导入数值值时会跳过货币单位 "DateStringFormat" None 日期格式,按 DateString 规范给出 "FieldSeparator" "\t" string token taken to separate columns "FillRows" Automatic 是否将行按最大列长填满 "HeaderLines" Automatic 假设为开头的行数 "IgnoreEmptyLines" False 是否忽略空白行 MissingValuePattern Automatic patterns used to specify missing elements "NumberPoint" "." 小数点字符串 "Numeric" Automatic 如果可以的话是否将数据字段导入为数字 "Schema" Automatic schema used to construct Tabular object "SkipInvalidLines" False whether to skip invalid lines "SkipLines" Automatic 文件开头处跳过的行数 - 在默认情况下,Import 尝试将数据解释为 "UTF8" 编码文本. 如何文件中存储的任意位数序列不能用 "UTF8" 表示,则 Import 使用 "ISOLatin1" 替代.
- 用 CharacterEncoding -> Automatic, Import 将尝试推断文件的字符编码.
- 以下为 "HeaderLines" 和 "SkipLines" 的可能设定:
-
Automatic try to automatically determine the number of rows to skip or use as header n 逃过 n 行或作为 Dataset 标题使用 {rows,cols} 跳过行和列或作为标题使用 - Import 把由 "DateStringFormat" 选项指定格式化的表格项转换 DateObject.
- Export 选项:
-
Alignment None 数据如何在表格行中对应 CharacterEncoding "UTF8" 文件中行字符串使用的编码 "FillRows" False 是否将行按最大列长填满 "IncludeQuotingCharacter" Automatic whether to add quotations around exported values "TableHeadings" Automatic 表格行和列的标题 - Alignment 的可用设定为 None、Left、Center 和 Right.
- "IncludeQuotingCharacter" can be set to the following values:
-
None do not enclose any values in quotes Automatic only enclose values in quotes when needed All enclose all valid values in quotes - "TableHeadings" 可设置为以下值:
-
None skip column labels Automatic export column labels {"col1","col2",…} list of column labels {rhead,chead} specifies separate labels for the rows and columns - Export 使用运行 Wolfram 语言的计算机系统的常用规范编码行分隔字符.
范例
打开所有单元关闭所有单元范围 (8)
Import (4)
Import metadata from a CSV file:
Import a TSV file as a Tabular object with automatic header detection:
Import without headers, while skipping the first line:
Analyze a single column of a file; start by looking at column labels and their types:
Export (4)
Export a Tabular object:
Use "TableHeadings" option to remove header from a Tabular object:
导出一个 TimeSeries:
导出一个 EventSeries:
导出一个 QuantityArray:
导入参数 (26)
"Data" (6)
"Dataset" (3)
"Grid" (1)
导入 TSV 数据为 Grid:
"RawData" (3)
"Schema" (1)
Get the TabularSchema object:
"Tabular" (6)
Import a CSV file as a Tabular object:
Use "HeaderLines" and "SkipLines" options to only import the data of interest:
导入选项 (15)
CharacterEncoding (1)
字符串编码可用 $CharacterEncodings 设定为任意值:
"ColumnTypeDetectionDepth" (1)
"CurrencyTokens" (1)
使用 "CurrencyTokens"->None 来包含货币符号:
"DateStringFormat" (1)
用指定的数据格式将数据转换为 DateObject:
"HeaderLines" (1)
MissingValuePattern (1)
By default, an automatic set of values is considered missing:
Use MissingValuePatternNone to disable missing element detection:
"Numeric" (1)
使用 "Numeric"->True 解释数字:
"NumberPoint" (1)
"QuotingCharacter" (1)
"Schema" (1)
Import automatically infers column labels and types from data stored in a TSV file:
"SkipLines" (1)
导出选项 (7)
CharacterEncoding (1)
字符串编码可通过 $CharacterEncodings 设置任意值:
"FillRows" (1)
Row lengths are preserved by default:
用 "FillRows"->False 保持行的长度:
"IncludeQuotingCharacter" (1)
"QuotingCharacter" (1)
"TableHeadings" (1)
By default, column headers are exported:
Use "TableHeadings"None to skip column headers:
可能问题 (12)
If all rows in the file do not have the same number of columns, some rows may be considered as invalid:
Entries of the format "nnnDnnn" or "nnnEnnn" are interpreted as numbers with scientific notation:
Use the "Numeric" option to override this interpretation:
Numeric interpretation may result in a loss of precision:
Use the "Numeric" option to override this interpretation:
Starting from Version 14.2, currency tokens are not automatically skipped:
Use the "CurencyTokens" option to skip such tokens:
Starting from Version 14.2, quoting characters are added when the column of integer values contains numbers greater than Developer`$MaxMachineInteger:
Use "IncludeQuotingCharacter"->None to get the previous result:
Starting from Version 14.2, some strings are automatically considered missing:
Use MissingValuePatternNone to override this interpretation:
Starting from Version 14.2, real numbers with 0 fractional part are exported as integers:
Use "Backend"->"Table" to get the previous result:
Starting from Version 14.2, integers greater than Developer`$MaxMachineInteger are imported as real numbers:
Use "Backend"->"Table" to get the previous result:
Starting from Version 14.2, date and time columns of Tabular objects are exported using DateString:
Use "Backend"->"Table" to get the previous result:
由旧版本 Wolfram 语言生成的部分 TSV 数据可能含有不正确的文本分割区域,且在版本 11.2 中无法按预期导入:
使用 "QuotingCharacter""" 将会给出之前预测的结果:
The top-left corner of data is lost when importing a Dataset with row and column headers:
Dataset may look different depending on the dimensions of the data: