ProteinData

ProteinData["prot"]
gives the reference amino acid sequence for the protein prot.

ProteinData["prot","property"]
gives the value of the specified property for the protein prot.

DetailsDetails

  • Proteins are specified by standard names such as .
  • ProteinData[] gives a list of all reference human proteins.
  • Protein sequences are represented as strings of standard single-letter amino acid codes.
  • Fundamental properties include:
  • "MolecularWeight"total molecular weight in daltons
  • Sequence properties for proteins include:
  • "DNACodingSequence"base pair sequence coding for the protein
    "DNACodingSequenceLength"length of base pair sequence coding for the protein
    "Gene"gene that codes for the protein
    "Sequence"amino acid sequence for the protein
    "SequenceLength"length of amino acid sequence for the protein
  • Protein structures may contain additional elements not explicitly encoded in the original DNA sequence.
  • Molecular structure properties based on residues include:
  • "DihedralAngles"list of dihedral angles , ψ, ω in radians
    "SecondaryStructureRules"list of rules giving start and end positions of helix, sheet, etc. structures
  • Molecular structure properties based on individual atoms include:
  • "AdditionalAtomPositions"list of 3D coordinates of additional atoms
    "AdditionalAtomTypes"list of element symbols for additional atoms
    "AtomPositions"list of 3D coordinates of protein atoms
    "AtomRoles"list of structural roles for protein atoms
    "AtomTypes"list of element symbols for protein atoms
    "GyrationRadius"radius of gyration
    "MoleculePlot"3D molecular structure plot
  • Distances are measured in picometers.
  • ProteinData["prot","prop",grouping] gives molecular structure properties with various groupings:
  • {}no grouping
    "Chain"group by chain
    "Residue"group by residue
    {g1,g2,}list of grouping criteria
  • Properties associated with chains within structures include:
  • "ChainLabels"list of identifiers for 3D structure chains
    "ChainSequences"list of amino acid sequences for 3D structure chains
  • Protein common domain properties include:
  • "DomainIDs"NCBI CDD numbers of domains
    "DomainPositions"positions of domains in the protein sequence
    "Domains"names of domains in the protein
  • Functional properties include:
  • "BiologicalProcesses"biological processes associated with the protein
    "CellularComponents"cellular components in which the protein is found
    "MolecularFunctions"molecular functions of the protein
  • Protein identification properties include:
  • "AlternateNames"alternate traditional names
    "GeneID"GeneID number string for the protein's gene
    "Name"traditional name
    "NCBIAccessions"NCBI accession strings
    "PDBIDList"list of all PDB ID strings
    "PrimaryPDBID"PDB ID chosen in the Wolfram Language for structure properties, etc.
    "StandardName"standard Wolfram Language name
  • ProteinData["prot","prop","Units"] gives the units for a particular property value.

ExamplesExamplesopen allclose all

Basic Examples  (6)Basic Examples  (6)

Get a list of human proteins:

In[1]:=
Click for copyable input
Out[1]//Shallow=
In[2]:=
Click for copyable input
Out[2]=

Display the ribbon diagram:

In[1]:=
Click for copyable input
Out[1]=

Get the amino acid sequence of a protein:

In[1]:=
Click for copyable input
Out[1]=

Get the molecular weight of a protein:

In[1]:=
Click for copyable input
Out[1]=

Get the number of amino acids in a protein sequence:

In[1]:=
Click for copyable input
Out[1]=

Get the coordinates of atoms in a 3D protein structure:

In[1]:=
Click for copyable input
Out[1]//Short=
Introduced in 2008
(7.0)
| Updated in 2014
(10.0)