Sequence Alignment & Comparison
The Wolfram Language includes state-of-the-art algorithms for sequence alignment and comparison, capable of handling strings and lists containing very large numbers of elements.
SequenceAlignment — find alignments between strings, allowing insertion and deletion
Diff — return a representation of diffs between two expressions
Diff3 — return a representation of the three-way diff between two expressions and their common ancestor
LongestCommonSubsequence — find the longest contiguous subsequence in common
LongestCommonSequence — find the longest sequence in common, perhaps disjoint
LongestCommonSubsequencePositions ▪ LongestCommonSequencePositions
StringPosition ▪ StringCases ▪ StringCount
SequencePosition ▪ SequenceCases ▪ SequenceCount
Subsequences — all subsequences of a list
LongestOrderedSequence — find the longest ordered sequence in a list, perhaps disjoint
WarpingCorrespondence ▪ CanonicalWarpingCorrespondence
Similarity and Distance Measures »
SmithWatermanSimilarity ▪ NeedlemanWunschSimilarity
EditDistance ▪ DamerauLevenshteinDistance ▪ HammingDistance ▪ ...
Nearest — find sequences nearest with respect to a distance measure
FindClusters — find clusters of sequences with respect to a distance measure
DistanceMatrix — construct the matrix of pairwise distances
BioSequence — a string-based representation of chained structures such as DNA
GenomeLookup — find exact matches with human and other genomes