SFF (.sff)
- Import supports most common variants of the SFF file format, including those with and without an index.
Background & Context
-
- MIME type: chemical/seq-na-sff
- SFF molecular biology format.
- Standard flowgram format for storing and exchanging DNA sequences with base qualities.
- Commonly used by the 454 Life Sciences DNA pyrosequencing platform.
- Binary format.
- Stores nucleic acid sequences and base qualities as character strings and lists, respectively.
- Meta-information about the sequencing run are stored in the file.
Import
- Import["file.sff"] imports DNA sequencing data from an SFF file.
- Import["file.sff"] returns an array representing the sequencing data stored in the file.
- Import["file.sff",elem] imports the specified element from an SFF file.
- Import["file.sff",{{elem1,elem2,…}}] imports multiple elements.
- The import format can be specified with Import["file","SFF"] or Import["file",{"SFF",elem,…}].
- See the following reference pages for full general information:
-
Import import from a file CloudImport import from a cloud object ImportString import from a string ImportByteArray import from a byte array
Import Elements
- General Import elements:
-
"Elements" list of elements and options available in this file "Summary" summary of the file "Rules" list of rules for all available elements - File metadata:
-
"Header" file header given as a list of rules "XMLManifest" XML manifest as an XML object - Data representation elements for each sequencing read:
-
"Sequence" DNA sequences as a list of strings "Qualities" base qualities as a list of lists "FlowgramValues" flowgram values as a list of lists "FlowIndexPerBase" flow index values as a list of lists "ClipQualities" coordinates for quality-trimming the sequences as an array "ClipAdapter" coordinates for adapter-trimming the sequences as an array "ReadName" names of the reads as a list of strings - Additional data elements:
-
"Data" all data representation elements combined in a list "LabeledData" list of rules for each sequence stored in the file - Import uses the "Data" element by default for the SFF format.
- The Wolfram Language uses the standard IUB/IUPAC abbreviations for nucleic acids:
-
A adenosine C cytidine G guanine T thymidine U uracil R purine (G or A) Y pyrimidine (T or C) K ketone (G or T) M amino group (A or C) S strong interaction (G or C) W weak interaction (A or T) B C or G or T D A or G or T H A or C or T V A or C or G N any nucleic acid (A or C or G or T) - gap of indeterminate length - The Wolfram Language uses integers for the base qualities.