FASTQ (.fastq, .fq)

Background & Context

    • MIME type: chemical/seq-na-fastq
    • FASTQ molecular biology format.
    • Standard format for storing and exchanging DNA sequences with base qualities.
    • Plain text format.
    • Stores nucleic acid sequences and base qualities as character strings.
    • Various conventions are in use to represent meta-information.

Import & Export

  • Import["file.fastq"] imports DNA sequences from a FASTQ file.
  • Export["file.fastq",expr] exports a sequence or a list of sequences to the FASTQ format.
  • Import["file.fastq"] returns a list of strings representing the sequences stored in the file.
  • Export["file.fastq",{seq,qual}] exports a character string representing a DNA sequence with base qualities to FASTQ.
  • Export["file.fastq",{{seq1,seq2,},{qual1,qual2,}}] exports multiple DNA sequences with base qualities.
  • Import["file.fastq",elem] imports the specified element from a FASTQ file.
  • Import["file.fastq",{{elem1,elem2,}}] imports multiple elements.
  • The import format can be specified with Import["file","FASTQ"] or Import["file",{"FASTQ",elem,}].
  • Export["file.fastq",expr,elem] creates a FASTQ file by treating expr as specifying element elem.
  • Export["file.fastq",{expr1,expr2,},{{elem1,elem2,}}] treats each expri as specifying the corresponding elemi.
  • Export["file.fastq",expr,opt1->val1,] exports expr with the specified option elements taken to have the specified values.
  • Export["file.fastq",{elem1->expr1,elem2->expr2,},"Rules"] uses rules to specify the elements to be exported.
  • See the following reference pages for full general information:
  • Import, Exportimport from or export to a file
    CloudImport, CloudExportimport from or export to a cloud object
    ImportString, ExportStringimport from or export to a string
    ImportByteArray, ExportByteArrayimport from or export to a byte array

Import Elements

  • General Import elements:
  • "Elements" list of elements and options available in this file
    "Summary"summary of the file
    "Rules"list of rules for all available elements
  • Data representation elements:
  • "Header"raw header lines
    "Sequence"DNA sequences as a list of strings
    "Qualities"base qualities as a list of strings
  • Import uses the "Sequence" element by default for the FASTQ format.
  • Additional data elements:
  • "Data""Header", "Sequence", and "Qualities" elements combined in a list
    "LabeledData"list of rules for each sequence stored in the file
  • The Wolfram Language uses the standard IUB/IUPAC abbreviations for nucleic acids:
  • Aadenosine
    Ccytidine
    Gguanine
    Tthymidine
    Uuracil
    Rpurine (G or A)
    Ypyrimidine (T or C)
    Kketone (G or T)
    Mamino group (A or C)
    Sstrong interaction (G or C)
    Wweak interaction (A or T)
    BC or G or T
    DA or G or T
    HA or C or T
    VA or C or G
    Nany nucleic acid (A or C or G or T)
    -gap of indeterminate length
  • The Wolfram Language uses ASCII characters for the base qualities.

Options

  • Advanced Export option:
  • "LineWidth"70maximum number of characters in a line

Examples

Basic Examples  (6)

This reads the raw header lines from a sample FASTQ file:

Read the DNA sequence:

Read the DNA sequence with qualities:

This converts a short sequence to the FASTQ format, automatically adding default header information:

This exports two sequences:

This exports a pair of headers and sequences:

Importing the previous output using the "Data" element gives raw headers and sequences:

Import as a list of rules: