Wolfram Language & System Documentation Center

FASTA (.fasta, .fa, .fna, .fsa, .mpfa)

See Also
- Import
- Export
- CloudExport
- CloudImport
- Formats
- FASTQ
- MOL
- PDB
- XYZ
- SFF
Related Guides
- See Also
  - Import
  - Export
  - CloudExport
  - CloudImport
  - Formats
  - FASTQ
  - MOL
  - PDB
  - XYZ
  - SFF
- Related Guides

FASTA (.fasta, .fa, .fna, .fsa, .mpfa)

Import and Export support all common variants of the FASTA file format.

Background & Context

- MIME type: chemical/seq-aa-fasta, chemical/seq-na-fasta
- FASTA molecular biology format.
- Standard format for storing and exchanging DNA and protein sequences.
- Plain text format.
- Stores nucleic acid or protein sequences as character strings.
- Various conventions are in use to represent meta-information.
- Developed in 1988 by William Pearson and David Lipman as part of the FASTA sequence-alignment software.

Import & Export

Import["file.fasta"] imports DNA or protein sequences from a FASTA file.
Export["file.fasta",expr] exports a sequence or a list of sequences to the FASTA format.
Import["file.fasta"] returns a list of strings representing the sequences stored in the file.
Export["file.fasta",str] exports a character string representing a DNA sequence to FASTA.
Export["file.fasta",{str₁,str₂,…}] exports multiple DNA sequences.
Import["file.fasta",elem] imports the specified element from a FASTA file.
Import["file.fasta",{elem,sub_a,sub_b,…}] imports a subelement.
Import["file.fasta",{{elem₁,elem₂,…}}] imports multiple elements.
The import format can be specified with Import["file","FASTA"] or Import["file",{"FASTA",elem,…}].
Export["file.fasta",expr,elem] creates a FASTA file by treating expr as specifying element elem.
Export["file.fasta",{expr₁,expr₂,…},{{elem₁,elem₂,…}}] treats each expr_i as specifying the corresponding elem_i.
Export["file.fasta",expr,opt₁->val₁,…] exports expr with the specified option elements taken to have the specified values.
Export["file.fasta",{elem₁->expr₁,elem₂->expr₂,…},"Rules"] uses rules to specify the elements to be exported.
See the following reference pages for full general information:

	Import, Export	import from or export to a file
	CloudImport, CloudExport	import from or export to a cloud object
	ImportString, ExportString	import from or export to a string
	ImportByteArray, ExportByteArray	import from or export to a byte array

Import Elements

General Import elements:
"Elements" list of elements and options available in this file

"Summary" summary of the file

"Rules" list of rules for all available elements
Data representation elements:
"Header" raw header lines

"Sequence" DNA or protein sequences as a list of strings

"Plaintext" sequences as formatted text
Import uses the "Sequence" element by default for the FASTA format.
Additional data elements:
"Data" "Header" and "Sequence" elements combined in a list

"LabeledData" list of rules for each sequence stored in the file
Header line meta-information:

	"Accession"	NCBI accession number for each sequence
	"Description"	locus description text for each sequence
	"GenBankID"	GenBank database identifier
	"Length"	list of integers, representing the length of each sequence

The Wolfram Language uses the standard IUB/IUPAC abbreviations for nucleic acids:

	A	adenosine
	C	cytidine
	G	guanine
	T	thymidine
	U	uracil
	R	purine (G or A)
	Y	pyrimidine (T or C)
	K	ketone (G or T)
	M	amino group (A or C)
	S	strong interaction (G or C)
	W	weak interaction (A or T)
	B	C or G or T
	D	A or G or T
	H	A or C or T
	V	A or C or G
	N	any nucleic acid (A or C or G or T)
	-	gap of indeterminate length

Codes representing amino acids:

	A	alanine (Ala)
	B	either aspartic acid or asparagine
	C	cysteine (Cys)
	D	aspartic acid (Asp)
	E	glutamic acid (Glu)
	F	phenylalanine (Phe)
	G	glycine (Gly)
	H	histidine (His)
	I	isoleucine (Ile)
	K	lysine (Lys)
	L	leucine (Leu)
	M	methionine (Met)
	N	asparagine (Asn)
	P	proline (Pro)
	Q	glutamine (Gln)
	R	arginine (Arg)
	S	serine (Ser)
	T	threonine (Thr)
	U	selenocysteine
	V	valine (Val)
	W	tryptophan (Trp)
	Y	tyrosine (Tyr)
	Z	either glutamic acid or glutamine
	X	any amino acid
	*	translation stop
	-	gap of indeterminate length

Options

Import options:
"HeaderFormat" Automatic specifies the format of the header

"ToUpperCase" True whether or not to make sequences uppercase
Import uses a large built-in library of header format specifications found in common variants of the FASTA format.
By setting "HeaderFormat" to a list of literal strings and names of meta-information elements, any header line format can be specified on Import.
"HeaderFormat"->{"gi|","DatabaseIndex","|gb|","Accession","|","Description"} is a setting typical for NCBI FASTA files.
Advanced Export options:
"LineWidth" 70 maximum number of characters in a line

"ToUpperCase" True whether or not to make sequences uppercase

Examples

Basic Examples (7)

This reads the raw header line from a sample FASTA file:

Extract the accession string:

Parse the GenBank database key and the description string from the header line:

Read the first letters of the DNA sequence:

This converts a short sequence to the FASTA format, automatically adding default header information:

This exports two sequences:

This exports a pair of headers and sequences:

Import the previous output using the "Data" element gives raw headers and sequences:

Import as a list of rules:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

FASTA (.fasta, .fa, .fna, .fsa, .mpfa)

Background & Context

Import & Export

Import Elements

Options

Examples

Basic Examples (7)

	"Elements"	list of elements and options available in this file
	"Summary"	summary of the file
	"Rules"	list of rules for all available elements

	"Header"	raw header lines
	"Sequence"	DNA or protein sequences as a list of strings
	"Plaintext"	sequences as formatted text

	"HeaderFormat"	Automatic	specifies the format of the header
	"ToUpperCase"	True	whether or not to make sequences uppercase

	"LineWidth"	70	maximum number of characters in a line
	"ToUpperCase"	True	whether or not to make sequences uppercase

FASTA (.fasta, .fa, .fna, .fsa, .mpfa)

Background & Context

Import & Export

Import Elements

Options

Examples

Basic Examples (7)

See Also

Related Guides

History