SequenceAlignment

SequenceAlignment[s1,s2]

finds an optimal alignment of sequences of elements in the strings, lists or biomolecular sequences s1 and s2, and yields a list of successive matching and differing sequences.

Details and Options

  • SequenceAlignment[s1,s2] gives a list of the form {seg1,seg2,} where each segi is either a single string or sequence of list elements u, representing a matching segment, or a pair {u1,u2}, representing segments that differ between the si.
  • SequenceAlignment by default finds a global NeedlemanWunsch alignment of the complete strings or lists s1 and s2.
  • With the option setting Method->"Local", it finds a local SmithWaterman alignment.
  • For sufficiently similar strings or lists, local and global alignment methods give the same result.
  • The following options can be given:
  • GapPenalty 0additional cost for each alignment gap
    IgnoreCaseFalsewhether to ignore case of letters in strings
    MergeDifferences Truewhether to combine adjacent differences
    Method"Global"alignment algorithm to be used
    SimilarityRules Automaticrules for similarities between elements
  • SequenceAlignment attempts to find an alignment that maximizes the total similarity score.
  • With the default setting SimilarityRules->Automatic, each match between two elements contributes 1 to the total similarity score, while each mismatch, insertion, or deletion contributes -1.
  • Various named similarity matrices are supported, as specified in the notes for SimilarityRules.

Examples

open allclose all

Basic Examples  (3)

Globally align two similar strings:

Global alignment of two strings:

Local alignment of the same strings:

Global alignment of two instances of BioSequence:

Options  (4)

GapPenalty  (1)

By default, an alignment is found with two gaps:

Increasing the penalty for gaps forces another alignment with fewer gaps:

MergeDifferences  (1)

This gives insertions, deletions, and replacements as separate differences:

SimilarityRules  (2)

Align two short protein sequences:

Assigning a negative score to the deletion of "V" gives a different alignment:

Align with type-specific similarity rules that align degenerate letters:

Without the degenerate similarity rules, a perfect degenerate alignment is missed:

Applications  (2)

This gives the global alignment of two similar strings:

This shows the difference between global and local string alignment:

Possible Issues  (1)

When aligning nested lists, a list at level one can be a common element of the input lists:

Or a list at level one may denote a difference between the two input lists:

As the two outputs are identical, the output cannot be used to disambiguate the two cases:

Neat Examples  (1)

Compare two very similar genes:

Wolfram Research (2008), SequenceAlignment, Wolfram Language function, https://reference.wolfram.com/language/ref/SequenceAlignment.html (updated 2020).

Text

Wolfram Research (2008), SequenceAlignment, Wolfram Language function, https://reference.wolfram.com/language/ref/SequenceAlignment.html (updated 2020).

CMS

Wolfram Language. 2008. "SequenceAlignment." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2020. https://reference.wolfram.com/language/ref/SequenceAlignment.html.

APA

Wolfram Language. (2008). SequenceAlignment. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SequenceAlignment.html

BibTeX

@misc{reference.wolfram_2023_sequencealignment, author="Wolfram Research", title="{SequenceAlignment}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/SequenceAlignment.html}", note=[Accessed: 19-March-2024 ]}

BibLaTeX

@online{reference.wolfram_2023_sequencealignment, organization={Wolfram Research}, title={SequenceAlignment}, year={2020}, url={https://reference.wolfram.com/language/ref/SequenceAlignment.html}, note=[Accessed: 19-March-2024 ]}