SFF (.sff)

Background & Context

    • MIME type: chemical/seq-na-sff
    • SFF molecular biology format.
    • Standard flowgram format for storing and exchanging DNA sequences with base qualities.
    • Commonly used by the 454 Life Sciences DNA pyrosequencing platform.
    • Binary format.
    • Stores nucleic acid sequences and base qualities as character strings and lists, respectively.
    • Meta-information about the sequencing run are stored in the file.


  • Import["file.sff"] imports DNA sequencing data from an SFF file.
  • Import["file.sff"] returns an array representing the sequencing data stored in the file.
  • Import["file.sff",elem] imports the specified element from an SFF file.
  • Import["file.sff",{{elem1,elem2,}}] imports multiple elements.
  • The import format can be specified with Import["file","SFF"] or Import["file",{"SFF",elem,}].
  • See the reference page for full general information on Import.
  • ImportString supports the SFF format.

Import Elements

  • General Import elements:
  • "Elements"list of elements and options available in this file
    "Rules"full list of rules for each element and option
    "Options"list of rules for options, properties, and settings
  • File metadata:
  • "Header"file header given as a list of rules
    "XMLManifest"XML manifest as an XML object
  • Data representation elements for each sequencing read:
  • "Sequence"DNA sequences as a list of strings
    "Qualities"base qualities as a list of lists
    "FlowgramValues"flowgram values as a list of lists
    "FlowIndexPerBase"flow index values as a list of lists
    "ClipQualities"coordinates for quality-trimming the sequences as an array
    "ClipAdapter"coordinates for adapter-trimming the sequences as an array
    "ReadName"names of the reads as a list of strings
  • Additional data elements:
  • "Data"all data representation elements combined in a list
    "LabeledData"list of rules for each sequence stored in the file
  • Import uses the "Data" element by default for the SFF format.
  • The Wolfram Language uses the standard IUB/IUPAC abbreviations for nucleic acids:
  • Aadenosine
    Rpurine (G or A)
    Ypyrimidine (T or C)
    Kketone (G or T)
    Mamino group (A or C)
    Sstrong interaction (G or C)
    Wweak interaction (A or T)
    BC or G or T
    DA or G or T
    HA or C or T
    VA or C or G
    Nany nucleic acid (A or C or G or T)
    -gap of indeterminate length
  • The Wolfram Language uses integers for the base qualities.


Basic Examples  (5)

This reads the file header from a sample SFF file:

Read the DNA sequences:

Read the DNA sequences with qualities, flowgram values, etc.:

Import names of the reads in the file:

Retrieve a sequence entry by name:

Retrieve the XML manifest of the sequencing run in the file and extract the analysis name:

Scope  (3)

Trim the sequences according to the quality-trimming coordinates:

Convert the SFF file to a FASTQ file, adding 64 to the quality scores for the character encoding:

Plot the flowgram intensity values:

Introduced in 2012