represents the biomolecular sequence of the given type corresponding to a string "seq".
infers the type (DNA, protein, etc.) from the sequence.
gives the biomolecular sequence associated with the gene or protein entity ent.
gives the biomolecular sequence with type corresponding to the given list of chemicals.
gives a biomolecular sequence of the given type and length n with arbitrary letters.
Details and Options
- BioSequence[…] evaluates, if possible, to the form BioSequence[type,"seq"].
- BioSequence employs the following letters to represent molecules for each type:
"DNA" A, C, G, T "CircularDNA" A, C, G, T "RNA" A, C, G, U "Peptide" A, C, D, E, F, G, H, I, K, L, M, N, O, P, Q, R, S, T, U, V, W, Y
- The "Peptide" type also allows a period or asterisk (. or *) to represent where a stop in biomolecular translation occurs.
- Additionally, the type can be None to represent generic sequences with no given chemical meaning.
- BioSequence also allows degenerate letters that represent a number of potential chemicals.
- Allowed degenerate letters for DNA and RNA include:
B C, G or T/U D A, G or T/U H A, C or T/U K G or T/U M A or C N A, C, G or T/U R A or G S C or G V A, C or G W A or T/U Y C or T/U
- Allowed degenerate letters for peptides include:
B D or N J I or L X A, C, D, E, F, G, H, I, K, L, M, N, O, P, Q, R, S, T, U, V, W, Y Z E or Q
- The following letter is used as the arbitrary letter when a type and length are provided:
"DNA" or "CircularDNA" N "RNA" N "Peptide" X
- Properties "prop" of a BioSequence obtained by BioSequence[…]["prop"] include:
"SequenceType" the type of sequence as a "BioSequenceType" entity "SequenceString" a string representing the sequence "SequenceLength" the length of the sequence "SequencePattern" a string expression expanding degenerate letters "ChemicalList" a list of the literal chemical entities "ChemicalPatternList" a list of the chemical entities, allowing for degenerate letters "MolecularMass" the molecular mass of the sequence "Properties" a list of the properties
- Both "ChemicalList" and "ChemicalPatternList" give the particular chemicals for each term of the sequence. The former does not support degenerate letters, while the latter will represent them using Alternatives.
- If the sequence has degenerate terms, its molecular mass may be an Interval.
- The types available to BioSequence can also be extended by creating an EntityStore with "ExtendedBioSequenceType" entities and then registering it (EntityRegister).
- The following "ExtendedBioSequenceType" properties can be defined:
"Alphabet" a list of the letters permitted within this sequence "AlphabetRules" an association from letters to specific chemicals "BibliographicSource" an external identifier documenting the sequence type "Caption" the caption above the sequence in formatted output "ComplementLetterRules" two-way rules defining a complement operation "Icon" the icon displayed in the formatted output of the sequence "MolecularMassRules" as association from letters to molecular masses
- The "Icon" can be provided as either an image or the canonical name of an existing sequence type.
- The "MolecularMassRules" will override the molecular masses of the chemicals given via "AlphabetRules" and allow masses to be calculated when no chemicals are given.
- BioSequenceQ[bioseq] gives True only if bioseq corresponds to a valid bio sequence expression.
- Here is the corresponding nucleotide for each DNA (RNA) letter:
A adenine C cytosine G guanine T (U) thymine (uracil)
- Similarly, here is the corresponding amino acid for each peptide letter:
A alanine C cysteine D aspartic acid E glutamic acid F phenylalanine G glycine H histidine I isoleucine K lysine L leucine M methionine N asparagine O pyrrolysine P proline Q glutamine R arginine S serine T threonine U selenocysteine V valine W tryptophan Y tyrosine
Examplesopen allclose all
Basic Sequences (5)
Sequences from Entities (5)
Properties & Relations (15)
BioSequence provides a number of properties:
The types of BioSequence are entities that contain many further properties describing the sequence:
The basic letters for a given type correspond to the "Alphabet" property of "BioSequenceType" entities:
Possible Issues (5)
Wolfram Research (2020), BioSequence, Wolfram Language function, https://reference.wolfram.com/language/ref/BioSequence.html.
Wolfram Language. 2020. "BioSequence." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/BioSequence.html.
Wolfram Language. (2020). BioSequence. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/BioSequence.html