---
title: "FindMoleculeSubstructure"
language: "en"
type: "Symbol"
summary: "FindMoleculeSubstructure[mol, patt] finds a mapping between the atom indices in mol and an occurrence of patt in mol. FindMoleculeSubstructure[mol, patt, All] finds all occurrences of patt in mol and returns all mappings. FindMoleculeSubstructure[mol, patt, n] finds at most n mappings."
keywords: 
- functional group
- molecule substructure
- cheminformatics
canonical_url: "https://reference.wolfram.com/language/ref/FindMoleculeSubstructure.html"
source: "Wolfram Language Documentation"
related_guides: 
  - 
    title: "Molecular Structure & Computation"
    link: "https://reference.wolfram.com/language/guide/MolecularStructureAndComputation.en.md"
  - 
    title: "Physics & Chemistry: Data and Computation"
    link: "https://reference.wolfram.com/language/guide/PhysicsAndChemistryDataAndComputation.en.md"
related_functions: 
  - 
    title: "Molecule"
    link: "https://reference.wolfram.com/language/ref/Molecule.en.md"
  - 
    title: "MoleculeContainsQ"
    link: "https://reference.wolfram.com/language/ref/MoleculeContainsQ.en.md"
  - 
    title: "MoleculePattern"
    link: "https://reference.wolfram.com/language/ref/MoleculePattern.en.md"
  - 
    title: "MoleculeMatchQ"
    link: "https://reference.wolfram.com/language/ref/MoleculeMatchQ.en.md"
  - 
    title: "MoleculeSubstructureCount"
    link: "https://reference.wolfram.com/language/ref/MoleculeSubstructureCount.en.md"
  - 
    title: "MoleculeFreeQ"
    link: "https://reference.wolfram.com/language/ref/MoleculeFreeQ.en.md"
  - 
    title: "MoleculePlot"
    link: "https://reference.wolfram.com/language/ref/MoleculePlot.en.md"
  - 
    title: "AtomList"
    link: "https://reference.wolfram.com/language/ref/AtomList.en.md"
  - 
    title: "BondList"
    link: "https://reference.wolfram.com/language/ref/BondList.en.md"
---
[EXPERIMENTAL]

# FindMoleculeSubstructure

FindMoleculeSubstructure[mol, patt] finds a mapping between the atom indices in mol and an occurrence of patt in mol.

FindMoleculeSubstructure[mol, patt, All] finds all occurrences of patt in mol and returns all mappings.

FindMoleculeSubstructure[mol, patt, n] finds at most n mappings.

## Details and Options

* ``FindMoleculeSubstructure`` returns a list of associations ``Association[p1 -> m1, p2 -> m2, …]`` where ``pi`` and ``mi`` are atom indices in ``patt`` and ``mol``, respectively.

* By default, substructure matches are pruned to remove multiple matches to the same set of atoms. Use the option ``Overlaps -> True`` to disable this behavior.

* ``FindMoleculeSubstructure`` takes the following options:

|                        |           |                                         |
| ---------------------- | --------- | --------------------------------------- |
| IgnoreStereochemistry  | False     | whether to ignore stereochemistry       |
| IncludeHydrogens       | Automatic | whether to include hydrogen atoms       |
| Overlaps               | False     | whether to include matches that overlap |

---

## Examples (10)

### Basic Examples (2)

Find phenyl rings in a molecule:

```wl
In[1]:=
FindMoleculeSubstructure[Molecule[{"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", 
  "C", "C", "C", "C"}, {Bond[{1, 2}, "Single"], Bond[{2, 3}, "Single"], Bond[{3, 4}, "Single"], 
  Bond[{4, 5}, "Aromatic"], Bond[{5, 6}, "Aromatic ... }, "Aromatic"], 
  Bond[{16, 17}, "Aromatic"], Bond[{17, 18}, "Aromatic"], Bond[{18, 19}, "Aromatic"], 
  Bond[{19, 20}, "Aromatic"], Bond[{14, 21}, "Single"], Bond[{21, 22}, "Single"], 
  Bond[{9, 4}, "Aromatic"], Bond[{20, 15}, "Aromatic"]}, {}], MoleculePattern["c1ccccc1"]]

Out[1]= {<|1 -> 4, 2 -> 5, 3 -> 6, 4 -> 7, 5 -> 8, 6 -> 9|>}
```

---

Find all alcohol functional groups:

```wl
In[1]:=
FindMoleculeSubstructure[Molecule[{"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "O", "C", "C", "C", "C", "C", "C", 
  "C", "O", "C", "C", "C", "O", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", 
  "H", "H", "H", "H", "H", "H", "H", "H", "H", "H ...     "Direction" -> "Counterclockwise"], Association["StereoType" -> "Tetrahedral", 
     "ChiralCenter" -> 16, "Direction" -> "Counterclockwise"], 
    Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 19, "Direction" -> "Clockwise"]}}], MoleculePattern[{"C", "O", "H"}, {Bond[{1, 2}], Bond[{2, 3}]}], All]

Out[1]= {<|1 -> 11, 2 -> 12, 3 -> 35|>, <|1 -> 19, 2 -> 20, 3 -> 43|>, <|1 -> 23, 2 -> 24, 3 -> 50|>}
```

### Scope (4)

Use an atomic symbol string as a pattern:

```wl
In[1]:= FindMoleculeSubstructure[Molecule[{"O", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[{1, 3}, "Single"]}, {}], "O"]

Out[1]= {<|1 -> 1|>}

In[2]:= FindMoleculeSubstructure[Molecule[{"O", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[{1, 3}, "Single"]}, {}], "H", All]

Out[2]= {<|1 -> 2|>, <|1 -> 3|>}
```

---

Indicate charge or mass number in the atomic symbol:

```wl
In[1]:=
m = Molecule[{Atom["C"], Atom["C", "MassNumber" -> 14], Atom["O"], Atom["O", "FormalCharge" -> -1], 
  Atom["H"], Atom["H"], Atom["H"]}, {Bond[{1, 2}, "Single"], Bond[{2, 3}, "Double"], 
  Bond[{2, 4}, "Single"], Bond[{1, 5}, "Single"], Bond[{1, 6}, "Single"], Bond[{1, 7}, "Single"]}];

In[2]:= FindMoleculeSubstructure[m, Atom["O", "FormalCharge" -> -1]]

Out[2]= {<|1 -> 4|>}

In[3]:= FindMoleculeSubstructure[m, Atom["C", "MassNumber" -> 14]]

Out[3]= {<|1 -> 2|>}
```

---

Use ``Atom`` to make a more general pattern. Find all charged atoms:

```wl
In[1]:= m = Molecule[Entity["Chemical", "Methyl4Nitrobutyrate"]];

In[2]:= FindMoleculeSubstructure[m, Atom["FormalCharge" -> Except[0]], All]

Out[2]= {<|1 -> 3|>, <|1 -> 5|>}
```

Find positively charged atoms:

```wl
In[3]:= FindMoleculeSubstructure[m, Atom["FormalCharge" -> GreaterThan[0]], All]

Out[3]= {<|1 -> 5|>}
```

Find negatively charged atoms:

```wl
In[4]:= FindMoleculeSubstructure[m, Atom["FormalCharge" -> LessEqualThan[-1]], All]

Out[4]= {<|1 -> 3|>}
```

---

Use ``Bond`` to define a pattern for any double bond:

```wl
In[1]:= m = Molecule["2-[[(1E,4E)-1,5-bis(5-nitrofuran-2-yl)penta-1,4-dien-3-ylidene]amino]guanidine"]

Out[1]=
Molecule[{"N", "C", "N", "N", "N", "C", "C", "C", "C", "C", "C", "C", 
  Atom["N", "FormalCharge" -> 1], "O", Atom["O", "FormalCharge" -> -1], "O", "C", "C", "C", "C", 
  "C", "C", Atom["N", "FormalCharge" -> 1], "O", Atom["O", "FormalCharge" -> -1 ...  -> 
   {Association["StereoType" -> "DoubleBond", "StereoBond" -> {7, 8}, "Ligands" -> {6, 9}, 
     "Value" -> "Opposite"], Association["StereoType" -> "DoubleBond", "StereoBond" -> {17, 18}, 
     "Ligands" -> {6, 19}, "Value" -> "Opposite"]}}]

In[2]:= FindMoleculeSubstructure[m, Bond[_, "Double"], All]

Out[2]= {<|1 -> 2, 2 -> 4|>, <|1 -> 5, 2 -> 6|>, <|1 -> 7, 2 -> 8|>, <|1 -> 13, 2 -> 14|>, <|1 -> 17, 2 -> 18|>, <|1 -> 23, 2 -> 24|>}
```

Find only double bonds with a nitrogen atom:

```wl
In[3]:= FindMoleculeSubstructure[m, Bond[{"N", _}, "Double"], All]

Out[3]= {<|1 -> 4, 2 -> 2|>, <|1 -> 5, 2 -> 6|>, <|1 -> 13, 2 -> 14|>, <|1 -> 23, 2 -> 24|>}
```

Find double bonds with a charged atom:

```wl
In[4]:= FindMoleculeSubstructure[m, Bond[{Atom["FormalCharge" -> Except[0]], _}, "Double"], All]

Out[4]= {<|1 -> 13, 2 -> 14|>, <|1 -> 23, 2 -> 24|>}
```

### Options (3)

#### IgnoreStereochemistry (1)

By default, stereoisomers do not match:

```wl
In[1]:=
FindMoleculeSubstructure[Molecule["l-alanine"], 
	MoleculePattern["C[C@@H](N)C(=O)O"]]

Out[1]= {}
```

Use ``IgnoreStereochemistry -> True`` to get a positive match:

```wl
In[2]:=
FindMoleculeSubstructure[Molecule["l-alanine"], 
	MoleculePattern["C[C@@H](N)C(=O)O"], IgnoreStereochemistry -> True]

Out[2]= {<|1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4, 5 -> 5, 6 -> 6|>}
```

#### IncludeHydrogens (1)

By default, substructure matching is performed using a molecule's hydrogen-suppressed graph, unless the pattern contains explicit hydrogen atoms:

```wl
In[1]:=
FindMoleculeSubstructure[Molecule[{"C", "C", "C", "Cl"}, {Bond[{1, 2}, "Single"], Bond[{2, 3}, "Single"], 
  Bond[{2, 4}, "Single"]}, {}], Bond[{"C", "H"}, "Single"], All]

Out[1]= {<|1 -> 1, 2 -> 5|>, <|1 -> 1, 2 -> 6|>, <|1 -> 1, 2 -> 7|>, <|1 -> 2, 2 -> 8|>, <|1 -> 3, 2 -> 9|>, <|1 -> 3, 2 -> 10|>, <|1 -> 3, 2 -> 11|>}
```

Some matches to hydrogen will be missed for more involved patterns. In the following example, the pattern is for a carbon atom bonded to either a hydrogen or chlorine atom, but only finds the C-Cl bond:

```wl
In[2]:=
FindMoleculeSubstructure[Molecule[{"C", "C", "C", "Cl"}, {Bond[{1, 2}, "Single"], Bond[{2, 3}, "Single"], 
  Bond[{2, 4}, "Single"]}, {}], Bond[{"C", Atom["AtomicNumber" -> (1 | 17)]}, "Single"], All]

Out[2]= {<|1 -> 2, 2 -> 4|>}
```

Use the option ``IncludeHydrogens -> True`` to make sure hydrogens are treated as explicit for the purposes of pattern matching:

```wl
In[3]:=
FindMoleculeSubstructure[Molecule[{"C", "C", "C", "Cl"}, {Bond[{1, 2}, "Single"], Bond[{2, 3}, "Single"], 
  Bond[{2, 4}, "Single"]}, {}], Bond[{"C", Atom["AtomicNumber" -> (1 | 17)]}, "Single"], All, IncludeHydrogens -> True]

Out[3]= {<|1 -> 1, 2 -> 5|>, <|1 -> 1, 2 -> 6|>, <|1 -> 1, 2 -> 7|>, <|1 -> 2, 2 -> 4|>, <|1 -> 2, 2 -> 8|>, <|1 -> 3, 2 -> 9|>, <|1 -> 3, 2 -> 10|>, <|1 -> 3, 2 -> 11|>}
```

#### Overlaps (1)

By default, substructure matches are pruned to remove multiple matches to the same set of atoms:

```wl
In[1]:= FindMoleculeSubstructure[Molecule["hexane"], MoleculePattern["CCCCC"], All]

Out[1]= {<|1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4, 5 -> 5|>, <|1 -> 2, 2 -> 3, 3 -> 4, 4 -> 5, 5 -> 6|>}
```

Use the option ``Overlaps -> True`` to find all possible matchings between the pattern and molecule:

```wl
In[2]:= FindMoleculeSubstructure[Molecule["hexane"], MoleculePattern["CCCCC"], All, Overlaps -> True]

Out[2]= {<|1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4, 5 -> 5|>, <|1 -> 2, 2 -> 3, 3 -> 4, 4 -> 5, 5 -> 6|>, <|1 -> 5, 2 -> 4, 3 -> 3, 4 -> 2, 5 -> 1|>, <|1 -> 6, 2 -> 5, 3 -> 4, 4 -> 3, 5 -> 2|>}
```

### Applications (1)

Write a function to locate the nitrogen and carbonyl carbon at the N-terminus of a protein:

```wl
In[1]:=
nterminus[mol_] := Module[
	{patt, atomsOfInterest = {1, 3}}, patt = MoleculePattern[{Atom["N", "HydrogenCount" -> 2], Atom["C", "OrbitalHybridization" -> "SP3"], 
  Atom["C"], Atom["O"], Atom["O" | "N"]}, {Bond[{1, 2}, "Single"], Bond[{2, 3}, "Single"], 
  Bond[{3, 4}, "Double"], Bond[{3, 5}, "Single"]}];
	DeleteMissing[AssociationThread[{"amine", "carbonyl"}, Lookup[First[FindMoleculeSubstructure[mol, patt], {}], atomsOfInterest]]]
	]
```

Apply the function to a peptide:

```wl
In[2]:=
peptide = BioSequence["Peptide", "VGSA"];
nterminus[peptide]

Out[2]= <|"amine" -> 1, "carbonyl" -> 3|>
```

Highlight the atoms in a 2D plot:

```wl
In[3]:= MoleculePlot[peptide, %]

Out[3]= [image]
```

## See Also

* [`Molecule`](https://reference.wolfram.com/language/ref/Molecule.en.md)
* [`MoleculeContainsQ`](https://reference.wolfram.com/language/ref/MoleculeContainsQ.en.md)
* [`MoleculePattern`](https://reference.wolfram.com/language/ref/MoleculePattern.en.md)
* [`MoleculeMatchQ`](https://reference.wolfram.com/language/ref/MoleculeMatchQ.en.md)
* [`MoleculeSubstructureCount`](https://reference.wolfram.com/language/ref/MoleculeSubstructureCount.en.md)
* [`MoleculeFreeQ`](https://reference.wolfram.com/language/ref/MoleculeFreeQ.en.md)
* [`MoleculePlot`](https://reference.wolfram.com/language/ref/MoleculePlot.en.md)
* [`AtomList`](https://reference.wolfram.com/language/ref/AtomList.en.md)
* [`BondList`](https://reference.wolfram.com/language/ref/BondList.en.md)

## Related Guides

* [Molecular Structure & Computation](https://reference.wolfram.com/language/guide/MolecularStructureAndComputation.en.md)
* [Physics & Chemistry: Data and Computation](https://reference.wolfram.com/language/guide/PhysicsAndChemistryDataAndComputation.en.md)

## History

* [Introduced in 2019 (12.0)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn120.en.md)