|
|
|||
|
|
| MIME type: chemical/seq-aa-fasta, chemical/seq-na-fasta FASTA molecular biology format. Standard format for storing and exchanging DNA and protein sequences. Plain text format. Stores nucleic acid or protein sequences as character strings. Various conventions are in use to represent meta-information. Developed in 1988 by William Pearson and David Lipman as part of the FASTA sequence-alignment software. |
| "Elements" | list of elements and options available in this file | |
| "Rules" | full list of rules for each element and option | |
| "Options" | list of rules for options, properties, and settings |
| "Header" | raw header lines | |
| "Sequence" | DNA or protein sequences as a list of strings | |
| "Plaintext" | sequences as formatted text |
| "Data" | "Header" and "Sequence" elements combined in a list | |
| "LabeledData" | list of rules for each sequence stored in the file |
| "Accession" | NCBI accession number for each sequence | |
| "Description" | locus description text for each sequence | |
| "GenBankID" | GenBank database identifier | |
| "Length" | list of integers, representing the length of each sequence |
| A | adenosine | |
| C | cytidine | |
| G | guanine | |
| T | thymidine | |
| U | uracil | |
| R | purine (G or A) | |
| Y | pyrimidine (T or C) | |
| K | ketone (G or T) | |
| M | amino group (A or C) | |
| S | strong interaction (G or C) | |
| W | weak interaction (A or T) | |
| B | C or G or T | |
| D | A or G or T | |
| H | A or C or T | |
| V | A or C or G | |
| N | any nucleic acid (A or C or G or T) | |
| - | gap of indeterminate length |
| A | alanine (Ala) | |
| B | either aspartic acid or asparagine | |
| C | cysteine (Cys) | |
| D | aspartic acid (Asp) | |
| E | glutamic acid (Glu) | |
| F | phenylalanine (Phe) | |
| G | glycine (Gly) | |
| H | histidine (His) | |
| I | isoleucine (Ile) | |
| K | lysine (Lys) | |
| L | leucine (Leu) | |
| M | methionine (Met) | |
| N | asparagine (Asn) | |
| P | proline (Pro) | |
| Q | glutamine (Gln) | |
| R | arginine (Arg) | |
| S | serine (Ser) | |
| T | threonine (Thr) | |
| U | selenocysteine | |
| V | valine (Val) | |
| W | tryptophan (Trp) | |
| Y | tyrosine (Tyr) | |
| Z | either glutamic acid or glutamine | |
| X | any amino acid | |
| * | translation stop | |
| - | gap of indeterminate length |
| "HeaderFormat" | Automatic | specifies the format of the header |
This reads the raw header line from a sample FASTA file:
Extract the accession string:
Parse the GenBank database key and the description string from the header line:
Read the first letters of the DNA sequence:
This converts a short sequence to the FASTA format, automatically adding default header information:
This exports two sequences:
This exports a pair of headers and sequences:
Import the previous output using the "Data" element gives raw headers and sequences:
Import as a list of rules:
|
| © 2013 Wolfram Research, Inc. |