Biomolecular Sequences
BioSequence is a string-based representation for biomolecules with chained primary structure. This class of biomolecules includes DNA, RNA, peptides and other sequences, which play important biological roles in maintaining genetic information and undertaking the work of the cell. This representation is supported by functions for recognition, comparison, transliteration and further operations. Degenerate letter handling is integrated throughout these operations. Interaction with the entity system allows for analyzing gene and protein sequences as well as customizing the underlying definitions of sequences and their behavior. BioSequence integrates with existing String functionality to enable novel biomolecular sequence processing.
Bio Sequence Representation
BioSequence — a string-based representation for chained biomolecules such as DNA
Molecule — the molecular representation of a biomolecular sequence
BioSequenceQ — test for a valid biomolecular sequence
Bio Sequence Conversion
BioSequenceComplement — get the complement of a DNA sequence (A↔T, C↔G)
BioSequenceReverseComplement — reverse and complement a DNA sequence
BioSequenceTranscribe — transcribe a DNA sequence to RNA or the reverse
BioSequenceTranslate — translate a DNA/RNA sequence to peptides
BioSequenceBackTranslateList — back-translate a peptide to DNA sequences
BioSequenceInstances — generate a list of instances with wild cards (e.g. S, N) resolved
RandomInstance — generate a list of random instances from a sequence with wild cards
Bio Sequence Visualization
BioSequencePlot — 2D schematic diagram with automatic layout
Bio Sequence Comparison
SequenceAlignment — determine the best-scoring alignment between two sequences
Diff — compute the difference between two sequences
SmithWatermanSimilarity — count one-element matches in the best local alignment
NeedlemanWunschSimilarity — count one-element matches in the best global alignment
EditDistance ▪ DamerauLevenshteinDistance ▪ HammingDistance
SimilarityRules — specify how similarity should be scored for pairs of elements
Bio Subsequence Computation
LongestCommonSequence — find the longest shared contiguous or disjoint sequence
LongestCommonSequencePositions — find positions of the longest common sequence
LongestCommonSubsequence — find the longest shared contiguous sequence
LongestCommonSubsequencePositions — find positions of the longest common subsequence
Subsequences — generate all subsequences of a given sequence
Bio Sequences as Strings
StringLength — number of letters in the string for a bio sequence
StringPart ▪ StringTake ▪ StringDrop ▪ StringInsert
StringReverse — reverse the letters in the string for a bio sequence
StringRotateLeft ▪ StringRotateRight
StringPadLeft ▪ StringPadRight
StringPartition ▪ StringJoin ▪ StringSplit
StringPosition — positions of substrings (including wild cards) in a bio sequence
StringCases — all cases of string patterns in a bio sequence
StringCount — count occurrences of a string pattern in a bio sequence
StringContainsQ ▪ StringFreeQ ▪ StringMatchQ
StringReplace — make replacements for substrings or string patterns in a bio sequence
StringReplacePart — replace substrings at specified positions in a bio sequence
Bio Sequence Modifications
BioSequenceModify — get a bio sequence modified in various ways
Bio Sequence Entities
Gene— known human and other genes
Protein— known human and other proteins
Sequence Types & Genetic Codes
BioSequenceType— types of bio sequences ("DNA", "RNA", "Peptide", ...)
GeneticTranslationTable— translation tables between nucleic acids and amino acids