BioSequence

BioSequence[type,"seq"]

represents the biomolecular sequence of the given type corresponding to a string "seq".

BioSequence["seq"]

infers the type (DNA, protein, etc.) from the sequence.

BioSequence[ent]

gives the biomolecular sequence associated with the gene or protein entity ent.

BioSequence[type,{chem1,chem2,}]

gives the biomolecular sequence with type corresponding to the given list of chemicals.

BioSequence[type,n]

gives a biomolecular sequence of the given type and length n with arbitrary letters.

Details and Options

  • BioSequence[] evaluates, if possible, to the form BioSequence[type,"seq"].
  • BioSequence employs the following letters to represent molecules for each type:
  • "DNA"A, C, G, T
    "CircularDNA"A, C, G, T
    "RNA"A, C, G, U
    "Peptide"A, C, D, E, F, G, H, I, K, L, M, N, O, P, Q, R, S, T, U, V, W, Y
  • The "Peptide" type also allows a period or asterisk (. or *) to represent where a stop in biomolecular translation occurs.
  • Additionally, the type can be None to represent generic sequences with no given chemical meaning.
  • BioSequence also allows degenerate letters that represent a number of potential chemicals.
  • Allowed degenerate letters for DNA and RNA include:
  • BC, G or T/U
    DA, G or T/U
    HA, C or T/U
    KG or T/U
    MA or C
    NA, C, G or T/U
    RA or G
    SC or G
    VA, C or G
    WA or T/U
    YC or T/U
  • Allowed degenerate letters for peptides include:
  • BD or N
    JI or L
    XA, C, D, E, F, G, H, I, K, L, M, N, O, P, Q, R, S, T, U, V, W, Y
    ZE or Q
  • The following letter is used as the arbitrary letter when a type and length are provided:
  • "DNA" or "CircularDNA"N
    "RNA"N
    "Peptide"X
  • Properties "prop" of a BioSequence obtained by BioSequence[]["prop"] include:
  • "SequenceType"the type of sequence as a "BioSequenceType" entity
    "SequenceString"a string representing the sequence
    "SequenceLength"the length of the sequence
    "SequencePattern"a string expression expanding degenerate letters
    "ChemicalList"a list of the literal chemical entities
    "ChemicalPatternList"a list of the chemical entities, allowing for degenerate letters
    "MolecularMass"the molecular mass of the sequence
    "Properties"a list of the properties
  • Both "ChemicalList" and "ChemicalPatternList" give the particular chemicals for each term of the sequence. The former does not support degenerate letters, while the latter will represent them using Alternatives.
  • If the sequence has degenerate terms, its molecular mass may be an Interval.
  • The types available to BioSequence can also be extended by creating an EntityStore with "ExtendedBioSequenceType" entities and then registering it (EntityRegister).
  • The following "ExtendedBioSequenceType" properties can be defined:
  • "Alphabet"a list of the letters permitted within this sequence
    "AlphabetRules"an association from letters to specific chemicals
    "BibliographicSource"an external identifier documenting the sequence type
    "Caption"the caption above the sequence in formatted output
    "ComplementLetterRules"two-way rules defining a complement operation
    "Icon"the icon displayed in the formatted output of the sequence
    "MolecularMassRules"as association from letters to molecular masses
  • The "Icon" can be provided as either an image or the canonical name of an existing sequence type.
  • The "MolecularMassRules" will override the molecular masses of the chemicals given via "AlphabetRules" and allow masses to be calculated when no chemicals are given.
  • BioSequenceQ[bioseq] gives True only if bioseq corresponds to a valid bio sequence expression.
  • Here is the corresponding nucleotide for each DNA (RNA) letter:
  • Aadenine
    Ccytosine
    Gguanine
    T (U)thymine (uracil)
  • Similarly, here is the corresponding amino acid for each peptide letter:
  • Aalanine
    Ccysteine
    Daspartic acid
    Eglutamic acid
    Fphenylalanine
    Gglycine
    Hhistidine
    Iisoleucine
    Klysine
    Lleucine
    Mmethionine
    Nasparagine
    Opyrrolysine
    Pproline
    Qglutamine
    Rarginine
    Sserine
    Tthreonine
    Uselenocysteine
    Vvaline
    Wtryptophan
    Ytyrosine

Examples

open allclose all

Basic Examples  (2)

Represent a DNA sequence:

Represent an RNA sequence:

Scope  (10)

Basic Sequences  (5)

Represent a peptide sequence:

Represent a circular DNA sequence:

Represent a sequence of arbitrary nucleic acids:

Represent a sequence of arbitrary peptides:

Infer the type from the sequence of letters:

Degenerate terms can be entered as alternatives in a string expression:

Sequences from Entities  (5)

Represent a sequence through a list of corresponding chemicals:

Degenerate letters can be specified by alternatives between chemicals:

Represent the DNA sequence of the BRCA1 gene:

Represent the peptide sequence of the protein:

Extend the representation of bio sequences to include Hachimoji DNA:

"BioSequenceType" entities can be used as the type when constructing bio sequences:

Properties & Relations  (15)

BioSequence provides a number of properties:

The types of BioSequence are entities that contain many further properties describing the sequence:

Access the raw sequence string:

Find the length of the underlying sequence:

Resolve degenerate letters into patterns over specific bases:

Specific sequences can be resolved into lists of chemicals:

Degenerate letters can be resolved into chemical alternatives:

Access the oligonucleotide (i.e. single-strand) molecular mass varying by possible degenerate choices:

Define a sequence type with molecular mass rules and a custom icon:

With defined mass rules, the molecular mass can be calculated:

The basic letters for a given type correspond to the "Alphabet" property of "BioSequenceType" entities:

BioSequence can be provided as an input to Molecule:

SequenceAlignment can find alignments between two instances of BioSequence:

RandomInstance can sample fully specified instances from a degenerate BioSequence:

BioSequenceQ can validate that a BioSequence is of a given type or has other attributes:

BioSequenceComplement and BioSequenceReverseComplement find genetic complements of a BioSequence:

Possible Issues  (5)

Sequences containing letters not defined for the given type will not format:

Subsequent operations with these sequences may not evaluate:

It may not be possible to infer a type of sequence appropriate for the given string:

Not all sequence letters correspond to chemicals contributing to physical properties:

Userdefined types do not necessarily have all properties available to them:

Degenerate letters cannot be resolved to a particular list of chemicals:

Neat Examples  (2)

Compare two very similar genes:

Generate sequences containing all of the supported characters:

Wolfram Research (2020), BioSequence, Wolfram Language function, https://reference.wolfram.com/language/ref/BioSequence.html.

Text

Wolfram Research (2020), BioSequence, Wolfram Language function, https://reference.wolfram.com/language/ref/BioSequence.html.

BibTeX

@misc{reference.wolfram_2020_biosequence, author="Wolfram Research", title="{BioSequence}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/BioSequence.html}", note=[Accessed: 20-January-2021 ]}

BibLaTeX

@online{reference.wolfram_2020_biosequence, organization={Wolfram Research}, title={BioSequence}, year={2020}, url={https://reference.wolfram.com/language/ref/BioSequence.html}, note=[Accessed: 20-January-2021 ]}

CMS

Wolfram Language. 2020. "BioSequence." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/BioSequence.html.

APA

Wolfram Language. (2020). BioSequence. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/BioSequence.html