Biomolecular Sequences

BioSequence is a string-based representation for biomolecules with chained primary structure. This class of biomolecules includes DNA, RNA, peptides and other sequences, which play important biological roles in maintaining genetic information and undertaking the work of the cell. This representation is supported by functions for recognition, comparison, transliteration and further operations. Degenerate letter handling is integrated throughout these operations. Interaction with the entity system allows for analyzing gene and protein sequences as well as customizing the underlying definitions of sequences and their behavior. BioSequence integrates with existing String functionality to enable novel biomolecular sequence processing.

Bio Sequence Representation

BioSequence a string-based representation for chained biomolecules such as DNA

Molecule the molecular representation of a biomolecular sequence

BioSequenceQ test for a valid biomolecular sequence

Bio Sequence Conversion

BioSequenceComplement get the complement of a DNA sequence (AT, CG)

BioSequenceReverseComplement reverse and complement a DNA sequence

BioSequenceTranscribe transcribe a DNA sequence to RNA or the reverse

BioSequenceTranslate translate a DNA/RNA sequence to peptides

BioSequenceBackTranslateList back-translate a peptide to DNA sequences

BioSequenceInstances generate a list of instances with wild cards (e.g. S, N) resolved

RandomInstance generate a list of random instances from a sequence with wild cards

Bio Sequence Comparison

SequenceAlignment determine the best-scoring alignment between two sequences

SmithWatermanSimilarity count one-element matches in the best local alignment

NeedlemanWunschSimilarity count one-element matches in the best global alignment

EditDistance  ▪  DamerauLevenshteinDistance  ▪  HammingDistance

SimilarityRules

Bio Subsequence Computation

LongestCommonSequence find the longest shared contiguous or disjoint sequence

LongestCommonSequencePositions find positions of the longest common sequence

LongestCommonSubsequence find the longest shared contiguous sequence

LongestCommonSubsequencePositions find positions of the longest common subsequence

Subsequences generate all subsequences of a given sequence

Bio Sequences as Strings

StringLength number of letters in the string for a bio sequence

StringPart  ▪  StringTake  ▪  StringDrop  ▪  StringInsert

StringReverse reverse the letters in the string for a bio sequence

StringRotateLeft  ▪  StringRotateRight

StringPadLeft  ▪  StringPadRight

StringPartition  ▪  StringJoin  ▪  StringSplit

StringPosition positions of substrings (including wild cards) in a bio sequence

StringCases all cases of string patterns in a bio sequence

StringCount count occurrences of a string pattern in a bio sequence

StringContainsQ  ▪  StringFreeQ  ▪  StringMatchQ

StringStartsQ  ▪  StringEndsQ

StringReplace make replacements for substrings or string patterns in a bio sequence

StringReplacePart replace substrings at specified positions in a bio sequence

StringRepeat  ▪  StringDelete

Bio Sequence Modifications

BioSequenceModify get a bio sequence modified in various ways

Bio Sequence Entities

Gene known human and other genes

Protein known human and other proteins

Sequence Types & Genetic Codes

BioSequenceType types of bio sequences ("DNA", "RNA", "Peptide", ...)

GeneticTranslationTable translation tables between nucleic acids and amino acids