SFF (.sff)

  • Import supports most common variants of the SFF file format, including those with and without an index.

BackgroundBackground

    MIME type: chemical/seq-na-sff
    SFF molecular biology format.
    Standard flowgram format for storing and exchanging DNA sequences with base qualities.
    Commonly used by the 454 Life Sciences DNA pyrosequencing platform.
    Binary format.
    Stores nucleic acid sequences and base qualities as character strings and lists, respectively.
    Meta-information about the sequencing run are stored in the file.

Import and ExportImport and Export

  • Import["file.sff"] imports DNA sequencing data from an SFF file.
  • Import["file.sff"] returns an array representing the sequencing data stored in the file.
  • Import["file.sff",elem] imports the specified element from an SFF file.
  • Import["file.sff",{{elem1,elem2,}}] imports multiple elements.
  • The import format can be specified with Import["file","SFF"] or Import["file",{"SFF",elem,}].
  • See the reference page for full general information on Import.
  • ImportString supports the SFF format.

ElementsElements

  • General Import elements:
  • "Elements"list of elements and options available in this file
    "Rules"full list of rules for each element and option
    "Options"list of rules for options, properties, and settings
  • File metadata:
  • "Header"file header given as a list of rules
    "XMLManifest"XML manifest as an XML object
  • Data representation elements for each sequencing read:
  • "Sequence"DNA sequences as a list of strings
    "Qualities"base qualities as a list of lists
    "FlowgramValues"flowgram values as a list of lists
    "FlowIndexPerBase"flow index values as a list of lists
    "ClipQualities"coordinates for quality-trimming the sequences as an array
    "ClipAdapter"coordinates for adapter-trimming the sequences as an array
    "ReadName"names of the reads as a list of strings
  • Additional data elements:
  • "Data"all data representation elements combined in a list
    "LabeledData"list of rules for each sequence stored in the file
  • Import uses the element by default for the SFF format.
  • The Wolfram Language uses the standard IUB/IUPAC abbreviations for nucleic acids:
  • Aadenosine
    Ccytidine
    Gguanine
    Tthymidine
    Uuracil
    Rpurine (G or A)
    Ypyrimidine (T or C)
    Kketone (G or T)
    Mamino group (A or C)
    Sstrong interaction (G or C)
    Wweak interaction (A or T)
    BC or G or T
    DA or G or T
    HA or C or T
    VA or C or G
    Nany nucleic acid (A or C or G or T)
    -gap of indeterminate length
  • The Wolfram Language uses integers for the base qualities.

ExamplesExamplesopen allclose all

Basic Examples  (5)Basic Examples  (5)

This reads the file header from a sample SFF file:

In[1]:=
Click for copyable input
Out[1]//Short=

Read the DNA sequences:

In[1]:=
Click for copyable input
Out[1]//Short=

Read the DNA sequences with qualities, flowgram values, etc.:

In[1]:=
Click for copyable input
Out[1]=

Import names of the reads in the file:

In[1]:=
Click for copyable input
Out[1]//Short=

Retrieve a sequence entry by name:

In[2]:=
Click for copyable input
Out[2]//Short=

Retrieve the XML manifest of the sequencing run in the file and extract the analysis name:

In[1]:=
Click for copyable input
Out[1]=
Introduced in 2012
(9.0)