FASTA (.fasta, .fa, .fna, .fsa, .mpfa)
  FASTA (.fasta, .fa, .fna, .fsa, .mpfa)
Background & Context
   - 
      
- MIME type: chemical/seq-aa-fasta, chemical/seq-na-fasta
 - FASTA molecular biology format.
 - Standard format for storing and exchanging DNA and protein sequences.
 - Plain text format.
 
- Stores nucleic acid or protein sequences as character strings.
 - Various conventions are in use to represent meta-information.
 - Developed in 1988 by William Pearson and David Lipman as part of the FASTA sequence-alignment software.
 
 
Import & Export
   - Import["file.fasta"] imports DNA or protein sequences from a FASTA file.
 - Export["file.fasta",expr] exports a sequence or a list of sequences to the FASTA format.
 - Import["file.fasta"] returns a list of strings representing the sequences stored in the file.
 - Export["file.fasta",str] exports a character string representing a DNA sequence to FASTA.
 - Export["file.fasta",{str1,str2,…}] exports multiple DNA sequences.
 - Import["file.fasta",elem] imports the specified element from a FASTA file.
 - Import["file.fasta",{elem,suba,subb,…}] imports a subelement.
 - Import["file.fasta",{{elem1,elem2,…}}] imports multiple elements.
 - The import format can be specified with Import["file","FASTA"] or Import["file",{"FASTA",elem,…}].
 - Export["file.fasta",expr,elem] creates a FASTA file by treating expr as specifying element elem.
 - Export["file.fasta",{expr1,expr2,…},{{elem1,elem2,…}}] treats each expri as specifying the corresponding elemi.
 - Export["file.fasta",expr,opt1->val1,…] exports expr with the specified option elements taken to have the specified values.
 - Export["file.fasta",{elem1->expr1,elem2->expr2,…},"Rules"] uses rules to specify the elements to be exported.
 - See the following reference pages for full general information:
 - 
      
      
Import, Export import from or export to a file CloudImport, CloudExport import from or export to a cloud object ImportString, ExportString import from or export to a string ImportByteArray, ExportByteArray import from or export to a byte array  
Import Elements
    
    
    
   - General Import elements:
 - 
      
      
"Elements" list of elements and options available in this file "Summary" summary of the file "Rules" list of rules for all available elements  - Data representation elements:
 - 
      
      
"Header" raw header lines "Sequence" DNA or protein sequences as a list of strings "Plaintext" sequences as formatted text  - Import uses the "Sequence" element by default for the FASTA format.
 - Additional data elements:
 - 
      
      
"Data" "Header" and "Sequence" elements combined in a list "LabeledData" list of rules for each sequence stored in the file  - Header line meta-information:
 - 
      
      
"Accession" NCBI accession number for each sequence "Description" locus description text for each sequence "GenBankID" GenBank database identifier "Length" list of integers, representing the length of each sequence  - The Wolfram Language uses the standard IUB/IUPAC abbreviations for nucleic acids:
 - 
      
      
A adenosine C cytidine G guanine T thymidine U uracil R purine (G or A) Y pyrimidine (T or C) K ketone (G or T) M amino group (A or C) S strong interaction (G or C) W weak interaction (A or T) B C or G or T D A or G or T H A or C or T V A or C or G N any nucleic acid (A or C or G or T) - gap of indeterminate length  - Codes representing amino acids:
 - 
      
      
A alanine (Ala) B either aspartic acid or asparagine C cysteine (Cys) D aspartic acid (Asp) E glutamic acid (Glu) F phenylalanine (Phe) G glycine (Gly) H histidine (His) I isoleucine (Ile) K lysine (Lys) L leucine (Leu) M methionine (Met) N asparagine (Asn) P proline (Pro) Q glutamine (Gln) R arginine (Arg) S serine (Ser) T threonine (Thr) U selenocysteine V valine (Val) W tryptophan (Trp) Y tyrosine (Tyr) Z either glutamic acid or glutamine X any amino acid * translation stop - gap of indeterminate length  
Options
   - Import options:
 - 
      
      
"HeaderFormat" Automatic specifies the format of the header "ToUpperCase" True whether or not to make sequences uppercase  - Import uses a large built-in library of header format specifications found in common variants of the FASTA format.
 - By setting "HeaderFormat" to a list of literal strings and names of meta-information elements, any header line format can be specified on Import.
 - "HeaderFormat"->{"gi|","DatabaseIndex","|gb|","Accession","|","Description"} is a setting typical for NCBI FASTA files.
 - Advanced Export options:
 - 
      
      
"LineWidth" 70 maximum number of characters in a line "ToUpperCase" True whether or not to make sequences uppercase  
Examples
Basic Examples (7)
This reads the raw header line from a sample FASTA file:
Parse the GenBank database key and the description string from the header line:
Read the first letters of the DNA sequence:
This converts a short sequence to the FASTA format, automatically adding default header information:
This exports a pair of headers and sequences:
Import the previous output using the "Data" element gives raw headers and sequences:
Related Guides
History
Introduced in 2007 (6.0) | Updated in 2012 (9.0)