FASTQ (.fastq, .fq)
- Import and Export support all common variants of the FASTQ file format, including short-read sequencing data and long sequences.
Background & Context
-
- MIME type: chemical/seq-na-fastq
- FASTQ molecular biology format.
- Standard format for storing and exchanging DNA sequences with base qualities.
- Plain text format.
- Stores nucleic acid sequences and base qualities as character strings.
- Various conventions are in use to represent meta-information.
Import & Export
- Import["file.fastq"] imports DNA sequences from a FASTQ file.
- Export["file.fastq",expr] exports a sequence or a list of sequences to the FASTQ format.
- Import["file.fastq"] returns a list of strings representing the sequences stored in the file.
- Export["file.fastq",{seq,qual}] exports a character string representing a DNA sequence with base qualities to FASTQ.
- Export["file.fastq",{{seq1,seq2,…},{qual1,qual2,…}}] exports multiple DNA sequences with base qualities.
- Import["file.fastq",elem] imports the specified element from a FASTQ file.
- Import["file.fastq",{{elem1,elem2,…}}] imports multiple elements.
- The import format can be specified with Import["file","FASTQ"] or Import["file",{"FASTQ",elem,…}].
- Export["file.fastq",expr,elem] creates a FASTQ file by treating expr as specifying element elem.
- Export["file.fastq",{expr1,expr2,…},{{elem1,elem2,…}}] treats each expri as specifying the corresponding elemi.
- Export["file.fastq",expr,opt1->val1,…] exports expr with the specified option elements taken to have the specified values.
- Export["file.fastq",{elem1->expr1,elem2->expr2,…},"Rules"] uses rules to specify the elements to be exported.
- See the following reference pages for full general information:
-
Import, Export import from or export to a file CloudImport, CloudExport import from or export to a cloud object ImportString, ExportString import from or export to a string ImportByteArray, ExportByteArray import from or export to a byte array
Import Elements
- General Import elements:
-
"Elements" list of elements and options available in this file "Summary" summary of the file "Rules" list of rules for all available elements - Data representation elements:
-
"Header" raw header lines "Sequence" DNA sequences as a list of strings "Qualities" base qualities as a list of strings - Import uses the "Sequence" element by default for the FASTQ format.
- Additional data elements:
-
"Data" "Header", "Sequence", and "Qualities" elements combined in a list "LabeledData" list of rules for each sequence stored in the file - The Wolfram Language uses the standard IUB/IUPAC abbreviations for nucleic acids:
-
A adenosine C cytidine G guanine T thymidine U uracil R purine (G or A) Y pyrimidine (T or C) K ketone (G or T) M amino group (A or C) S strong interaction (G or C) W weak interaction (A or T) B C or G or T D A or G or T H A or C or T V A or C or G N any nucleic acid (A or C or G or T) - gap of indeterminate length - The Wolfram Language uses ASCII characters for the base qualities.
Examples
Basic Examples (6)
This reads the raw header lines from a sample FASTQ file:
Read the DNA sequence with qualities:
This converts a short sequence to the FASTQ format, automatically adding default header information:
This exports a pair of headers and sequences:
Importing the previous output using the "Data" element gives raw headers and sequences: