SFF (.sff)

  • Import supports most common variants of the SFF file format, including those with and without an index.
  • Background

      MIME type: chemical/seq-na-sff
      SFF molecular biology format.
      Standard flowgram format for storing and exchanging DNA sequences with base qualities.
      Commonly used by the 454 Life Sciences DNA pyrosequencing platform.
      Binary format.
      Stores nucleic acid sequences and base qualities as character strings and lists, respectively.
      Meta-information about the sequencing run are stored in the file.

    Import and Export

    • Import["file.sff"] imports DNA sequencing data from an SFF file.
    • Import["file.sff"] returns an array representing the sequencing data stored in the file.
    • Import["file.sff",elem] imports the specified element from an SFF file.
    • Import["file.sff",{{elem1,elem2,}}] imports multiple elements.
    • The import format can be specified with Import["file","SFF"] or Import["file",{"SFF",elem,}].
    • See the reference page for full general information on Import.
    • ImportString supports the SFF format.

    Elements

    • General Import elements:
    • "Elements"list of elements and options available in this file
      "Rules"full list of rules for each element and option
      "Options"list of rules for options, properties, and settings
    • File metadata:
    • "Header"file header given as a list of rules
      "XMLManifest"XML manifest as an XML object
    • Data representation elements for each sequencing read:
    • "Sequence"DNA sequences as a list of strings
      "Qualities"base qualities as a list of lists
      "FlowgramValues"flowgram values as a list of lists
      "FlowIndexPerBase"flow index values as a list of lists
      "ClipQualities"coordinates for quality-trimming the sequences as an array
      "ClipAdapter"coordinates for adapter-trimming the sequences as an array
      "ReadName"names of the reads as a list of strings
    • Additional data elements:
    • "Data"all data representation elements combined in a list
      "LabeledData"list of rules for each sequence stored in the file
    • Import uses the "Data" element by default for the SFF format.
    • The Wolfram Language uses the standard IUB/IUPAC abbreviations for nucleic acids:
    • Aadenosine
      Ccytidine
      Gguanine
      Tthymidine
      Uuracil
      Rpurine (G or A)
      Ypyrimidine (T or C)
      Kketone (G or T)
      Mamino group (A or C)
      Sstrong interaction (G or C)
      Wweak interaction (A or T)
      BC or G or T
      DA or G or T
      HA or C or T
      VA or C or G
      Nany nucleic acid (A or C or G or T)
      -gap of indeterminate length
    • The Wolfram Language uses integers for the base qualities.

    Examples

    open allclose all

    Basic Examples  (5)

    This reads the file header from a sample SFF file:

    In[1]:=
    Click for copyable input
    Out[1]//Short=

    Read the DNA sequences:

    In[1]:=
    Click for copyable input
    Out[1]//Short=

    Read the DNA sequences with qualities, flowgram values, etc.:

    In[1]:=
    Click for copyable input
    Out[1]=

    Import names of the reads in the file:

    In[1]:=
    Click for copyable input
    Out[1]//Short=

    Retrieve a sequence entry by name:

    In[2]:=
    Click for copyable input
    Out[2]//Short=

    Retrieve the XML manifest of the sequencing run in the file and extract the analysis name:

    In[1]:=
    Click for copyable input
    Out[1]=

    Scope  (3)

    See Also

    "MOL"  "PDB"  "XYZ"  "FASTA"  "FASTQ"

    Introduced in 2012
    (9.0)