SequenceAlignment
SequenceAlignment[s1,s2]
finds an optimal alignment of sequences of elements in the strings, lists or biomolecular sequences s1 and s2, and yields a list of successive matching and differing sequences.
Details and Options
- SequenceAlignment[s1,s2] gives a list of the form {seg1,seg2,…} where each segi is either a single string or sequence of list elements u, representing a matching segment, or a pair {u1,u2}, representing segments that differ between the si.
- The following options can be given:
-
GapPenalty 0 additional cost for each alignment gap IgnoreCase False whether to ignore case of letters in strings MergeDifferences True whether to combine adjacent differences Method "Global" alignment algorithm to be used SimilarityRules Automatic rules for similarities between elements - SequenceAlignment attempts to find an alignment that maximizes the total similarity score.
- SequenceAlignment by default finds a global Needleman–Wunsch alignment of the complete strings or lists s1 and s2.
- With the option setting Method->"Local", it finds a local Smith–Waterman alignment.
- For sufficiently similar strings or lists, local and global alignment methods give the same result.
- SequenceAlignment also supports methods "AlignByLongestCommonSequence" and "AlignByLongestSubsequences", provided GapPenalty, MergeDifferences and SimilarityRules are all set to their respective defaults.
- Whereas the "Global" and "Local" methods both maximize a similarity score, "AlignByLongestCommonSequence" maximizes the number of characters or list elements common to both sequences.
- "AlignByLongestSubsequences" is effectively a divide-and-conquer heuristic approximation to aligning by the longest common (not necessarily contiguous) sequence, trading accuracy for speed. When sequences are fairly close, the alignment quality will be good, outperforming the other methods by up to two orders of magnitude in speed.
- With the default setting SimilarityRules->Automatic, each match between two elements contributes 1 to the total similarity score, while each mismatch, insertion, or deletion contributes -1.
- Various named similarity matrices are supported, as specified in the notes for SimilarityRules.
Examples
open allclose allBasic Examples (2)
Options (8)
GapPenalty (1)
IgnoreCase (1)
SequenceAlignment treats string input as case sensitive:
With IgnoreCaseTrue, SequenceAlignment will convert both strings to lowercase before aligning:
Method (3)
Default global alignment of two strings:
Local alignment of the same strings:
The "AlignByLongestCommonSequence" method maximizes the number of characters or list elements common to both sequences:
Take two texts, remove their diacritics and convert to lowercase:
The "AlignByLongestSubsequences" method can be significantly faster for similar sequences, but it can give a notably smaller set of matching characters:
Applications (4)
This gives the global alignment of two similar strings:
This shows the difference between global and local string alignment:
Obtain reference BRCA1 gene sequences for a human and a chimpanzee:
Check that their lengths are similar:
Align them using the default ("Global") method, using ByteCount to check the size of the result:
The "Local" method is slower, though it gives a more concise result:
Align using the longest sequence common to the pair:
Method "AlignByLongestSubsequences" is the fastest in this case and gives the smallest result:
Matching segments are close in total length, with the alignment using the longest common sequence having the largest matching part:
Obtain two Scandinavian language versions of the UN Universal Declaration of Human Rights:
Align using both the default and longest common subsequences methods and compare by byte count:
The global method has around 60% of the characters in the matching sections:
The faster heuristic method also manages to get nearly 57% of the characters in the matching parts:
Possible Issues (1)
Neat Examples (1)
Compare two very similar genes:
Use Diff to see the difference graphically:
Text
Wolfram Research (2008), SequenceAlignment, Wolfram Language function, https://reference.wolfram.com/language/ref/SequenceAlignment.html (updated 2024).
CMS
Wolfram Language. 2008. "SequenceAlignment." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2024. https://reference.wolfram.com/language/ref/SequenceAlignment.html.
APA
Wolfram Language. (2008). SequenceAlignment. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SequenceAlignment.html