Wolfram Language & System Documentation Center

SimilarityRules

is an option for functions such as SequenceAlignment that gives a list of rules for similarity scores to assume between pairs of elements.

Details

The setting for SimilarityRules must consist of a list of rules of the form {e₁,e₂}->v, where the e_i give elements to compare, and v gives their similarity.
The e_i can be either explicit characters or other elements, or patterns.
A rule for {e,""} gives the score for a deletion; a rule for {"",e} gives the score for an insertion.
SimilarityRules->Automatic is effectively equivalent to SimilarityRules -> {{a_, a_} -> 1, {a_, b_} -> -1}, giving a score of +1 for any pair of identical elements, and a score of -1 for any mismatch, deletion or insertion.
The following named settings for SimilarityRules implement various similarity matrices typically used for particular bioinformatics purposes:

	"BLAST"	alignment of nucleotide sequences
	"BLOSUM62"	local alignment of related amino acid sequences
	"BLOSUM80"	local alignment of similar sequences
	"PAM30"	global alignment of very similar amino acid sequences
	"PAM70"	global alignment of related sequences
	"PAM250"	global alignment of dissimilar sequences

When working on BioSequence arguments, the "SimilarDegenerateBases" setting will allow for any letters that potentially share any underlying chemical to match.

Examples

Basic Examples (4)

Align two strings on the first "X":

Wolfram Language code: SequenceAlignment["XaXb", "Xa"]

Align instead on the second "X" by making the {"b","a"} replacement favorable:

Wolfram Language code: SequenceAlignment["XaXb", "Xa", SimilarityRules -> {{"b", "a"} -> 10}]

Get the global similarity of two strings using default similarity scores:

Wolfram Language code: NeedlemanWunschSimilarity["abc", "aXc"]

Change the scores so "b" and "X" are considered a match:

Wolfram Language code: NeedlemanWunschSimilarity["abc", "aXc", SimilarityRules -> {{"b", "X"} -> 1}]

Use the "PAM70" similarity matrix to globally align related protein sequences:

Wolfram Language code: NeedlemanWunschSimilarity[ProteinData["CCND2"], ProteinData["CCND3"], SimilarityRules -> "PAM70"]

See the difference in local alignment when using the "SimilarDegenerateBases" setting:

Wolfram Language code: SmithWatermanSimilarity[BioSequence["DNA", "AAATTCCAAANNTNCCAAAA"], BioSequence["DNA", "GGTTCC"]]

Wolfram Language code:

SmithWatermanSimilarity[BioSequence["DNA", "AAATTCCAAANNTNCCAAAA"], BioSequence["DNA", "GGTTCC"], SimilarityRules -> "SimilarDegenerateBases"]

Use various named similarity matrices to globally align related protein sequences:

Wolfram Language code: NeedlemanWunschSimilarity[ProteinData["CCND2"], ProteinData["CCND3"], SimilarityRules -> "BLAST"]

Wolfram Language code: NeedlemanWunschSimilarity[ProteinData["CCND2"], ProteinData["CCND3"], SimilarityRules -> "BLOSUM62"]

Wolfram Language code: NeedlemanWunschSimilarity[ProteinData["CCND2"], ProteinData["CCND3"], SimilarityRules -> "BLOSUM80"]

Use various named similarity matrices to locally align related protein sequences:

Wolfram Language code: SmithWatermanSimilarity[ProteinData["CCND2"], ProteinData["CCND3"], SimilarityRules -> "PAM30"]

Wolfram Language code: SmithWatermanSimilarity[ProteinData["CCND2"], ProteinData["CCND3"], SimilarityRules -> "PAM70"]

Wolfram Language code: SmithWatermanSimilarity[ProteinData["CCND2"], ProteinData["CCND3"], SimilarityRules -> "PAM250"]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

SimilarityRules

Details

Examples

Basic Examples (4)

Text

CMS

APA

BibTeX

BibLaTeX

SimilarityRules

Details

Examples

Basic Examples (4)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX