FASTQ (.fastq, .fq)

MIME type: chemical/seq-na-fastq
FASTQ molecular biology format.
Standard format for storing and exchanging DNA sequences with base qualities.
Plain text format.
Stores nucleic acid sequences and base qualities as character strings.
Various conventions are in use to represent meta-information.
  • Import and Export support all common variants of the FASTQ file format, including short-read sequencing data and long sequences.

Import and ExportImport and Export

  • Import["file.fastq"] imports DNA sequences from a FASTQ file.
  • Export["file.fastq", expr] exports a sequence or a list of sequences to the FASTQ format.
  • Import["file.fastq"] returns a list of strings representing the sequences stored in the file.
  • Export["file.fastq", {seq, qual}] exports a character string representing a DNA sequence with base qualities to FASTQ.
  • Export["file.fastq", {{seq1, seq2, ...}, {qual1, qual2, ...}}] exports multiple DNA sequences with base qualities.
  • Import["file.fastq", elem] imports the specified element from a FASTQ file.
  • Import["file.fastq", {{elem1, elem2, ...}}] imports multiple elements.
  • The import format can be specified with Import["file", "FASTQ"] or Import["file", {"FASTQ", elem, ...}].
  • Export["file.fastq", expr, elem] creates a FASTQ file by treating expr as specifying element elem.
  • Export["file.fastq", {expr1, expr2, ...}, {{elem1, elem2, ...}}] treats each as specifying the corresponding .
  • Export["file.fastq", expr, opt1->val1, ...] exports expr with the specified option elements taken to have the specified values.
  • Export["file.fastq", {elem1->expr1, elem2->expr2, ...}, "Rules"] uses rules to specify the elements to be exported.
  • See the reference pages for full general information on Import and Export.
  • ImportString and ExportString support the FASTQ format.

ElementsElements

  • General Import elements:
  • "Elements"list of elements and options available in this file
    "Rules"full list of rules for each element and option
    "Options"list of rules for options, properties, and settings
  • Data representation elements:
  • "Header"raw header lines
    "Sequence"DNA sequences as a list of strings
    "Qualities"base qualities as a list of strings
  • Import uses the element by default for the FASTQ format.
  • Additional data elements:
  • "Data", , and elements combined in a list
    "LabeledData"list of rules for each sequence stored in the file
  • Mathematica uses the standard IUB/IUPAC abbreviations for nucleic acids:
  • Aadenosine
    Ccytidine
    Gguanine
    Tthymidine
    Uuracil
    Rpurine (G or A)
    Ypyrimidine (T or C)
    Kketone (G or T)
    Mamino group (A or C)
    Sstrong interaction (G or C)
    Wweak interaction (A or T)
    BC or G or T
    DA or G or T
    HA or C or T
    VA or C or G
    Nany nucleic acid (A or C or G or T)
    -gap of indeterminate length
  • Mathematica uses ASCII characters for the base qualities.

OptionsOptions

  • Advanced Export option:
  • "LineWidth"70maximum number of characters in a line

ExamplesExamplesopen allclose all

Basic Examples (6)Basic Examples (6)

This reads the raw header lines from a sample FASTQ file:

In[1]:=
Click for copyable input
Out[1]//Short=

Read the DNA sequence:

In[1]:=
Click for copyable input
Out[1]//Short=

Read the DNA sequence with qualities:

In[1]:=
Click for copyable input
Out[1]=

This converts a short sequence to the FASTQ format, automatically adding default header information:

In[1]:=
Click for copyable input
Out[1]=

This exports two sequences:

In[1]:=
Click for copyable input
Out[1]=

This exports a pair of headers and sequences:

In[1]:=
Click for copyable input
Out[1]=

Importing the previous output using the element gives raw headers and sequences:

In[2]:=
Click for copyable input
Out[2]=

Import as a list of rules:

In[3]:=
Click for copyable input
Out[3]=
New in 9
New to Mathematica? Find your learning path »
Have a question? Ask support »