PDB (.pdb)

MIME type: chemical/x-pdb
Protein Data Bank PDB files.
3D molecular model file.
Used in bioinformatics applications and on the web for storing and exchanging molecule models.
PDB is an acronym for Protein Data Bank.
Plain text format.
Stores structure information for large biological molecules such as proteins and nucleic acids.
Does not store chemical bond information.
Developed in 1971 at Brookhaven National Laboratory.
Maintained by the Research Collaboratory for Structural Bioinformatics (RCSB).
  • Import supports Version 2.3 and previous versions of the PDB format, as well as several common variants.
  • Export generates PDB 2.3 files.

Import and ExportImport and Export

  • Import["file.pdb"] reads a PDB file and returns a stylized rendering of the protein.
  • The Wolfram Language provides a variety of 3D rendering styles for macromolecules.
  • Export["file.pdb",expr] creates a PDB file from a 3D model of a molecule.
  • Import["file.pdb"] returns a Graphics3D object.
  • Import["file.pdb",elem] imports the specified element from a PDB file.
  • Import["file.pdb",{elem,suba,subb,}] imports a subelement.
  • Import["file.pdb",{{elem1,elem2,}}] imports multiple elements.
  • The import format can be specified with Import["file","PDB"] or Import["file",{"PDB",elem,}].
  • Export["file.pdb",{elem1->expr1,elem2->expr2,}] uses rules to specify the elements to be exported.
  • See the reference pages for full general information on Import and Export.
  • ImportString and ExportString support the PDB format.

ElementsElements

  • General Import elements:
  • "Elements"list of elements and options available in this file
    "Rules"full list of rules for each element and option
    "Options"list of rules for options, properties, and settings
  • Export uses the element by default.
  • Graphics element:
  • "Graphics3D"PDB file rendered as a Graphics3D object
  • Import uses the element by default for the PDB format.
  • Data representation elements:
  • "AdditionalAtoms"atoms that are not constituents of a chain
    "AdditionalCoordinates"3D coordinates of additional atoms
    "AdditionalIndex"index of additional atoms in VertexCoordinates and
    "AdditionalResidues"additional residue sequences given as an array of three-letter abbreviations
    "ResidueAtoms"list of residue atoms
    "ResidueChainLabels"list of chain labels
    "ResidueCoordinates"3D coordinates of residue atoms
    "ResidueIndex"index of residue atoms in VertexCoordinates and
    "ResidueRoles"functional roles of residue atoms
    "Residues"residue sequences given as an array of three-letter abbreviations
    "Resolution"spatial resolution of the model coordinates in picometers
    "SecondaryStructure"rules describing the large-scale structure of a chain
    "Sequence"residue sequences given as a list of strings
    "VertexCoordinates"atomic coordinates, typically given in picometers
    "VertexTypes"all atoms or groups constituting the molecule, typically given as a list of chemical element abbreviations
  • When reading an incomplete chain which is missing one or more residues from PDB, the Wolfram Language will represent it as a sequence of individual subchains.
  • The Wolfram Language uses the standard IUB/IUPAC abbreviations for amino acid residues:
  • Aalanine (Ala)
    Ccysteine (Cys)
    Daspartic acid (Asp)
    Eglutamic acid (Glu)
    Fphenylalanine (Phe)
    Gglycine (Gly)
    Hhistidine (His)
    Iisoleucine (Ile)
    Klysine (Lys)
    Lleucine (Leu)
    Mmethionine (Met)
    Nasparagine (Asn)
    Pproline (Pro)
    Qglutamine (Gln)
    Rarginine (Arg)
    Sserine (Ser)
    Tthreonine (Thr)
    Vvaline (Val)
    Wtryptophan (Trp)
    Ytyrosine (Tyr)
    Xunspecified or unknown amino acid (Unk)
  • The following abbreviations are used to represent nucleic acids:
  • Aadenosine
    Ccytidine
    Gguanosine
    Iinosine
    Tthymidine
    Uuridine
    Xunspecified or unknown nucleic acid
  • When importing a PDB file that describes multiple 3D models of the same molecule, the following Import elements can be used to read the geometries of all models:
  • "ResidueCoordinatesList"residue coordinates for each model
    "AdditionalCoordinatesList"3D coordinates of additional atoms for each model
    "VertexCoordinatesList"atomic coordinates for each model, typically given in picometers
  • Meta-information elements:
  • "Authors"author information as referenced in the file
    "Comments"comments and remarks stored in the file, given as a list of strings
    "DepositionDate"when the file was added to the database
    "Organism"organism in which the protein occurs
    "PDBClassification"PDB classification from the file header
    "PDBIDPDB structure identification string
    "References"bibliographic references, given as rules
    "Title"document title

OptionsOptions

  • General rendering options:
  • ImageSizeAutomaticspecifies the overall size of the graphics to display
    BackgroundWhitespecifies what background color to use
    ColorFunctionAutomaticfunction to apply to determine the coloring of secondary structure visualizations
    ViewPointAutomaticpoint in space from which the 3D model is to be viewed
  • With the default setting ViewPoint->Automatic, the Wolfram Language automatically calculates the optimal viewing angle for the imported molecule model.
  • Selecting a rendering style:
  • "Rendering""Structure"specifies the visualization method
  • Possible settings for are:
  • "BallAndStick"displays atoms and bonds as a ball-and-stick model
    "Structure"stylized rendering of the protein backbone
    "Spacefilling"atoms shown as overlapping spheres
    "Wireframe"bonds rendered as lines

ExamplesExamplesopen allclose all

Basic Examples  (5)Basic Examples  (5)

Import a large PDB file from the RCSB Protein Data Bank website:

In[1]:=
Click for copyable input
Out[1]=

Get the title of this PDB file:

In[2]:=
Click for copyable input
Out[2]=

Import the labels for each chain in the above molecule:

In[3]:=
Click for copyable input
Out[3]=

Show the Import elements available in a sample file:

In[1]:=
Click for copyable input
Out[1]=

Get the name of the organism referenced in this file:

In[2]:=
Click for copyable input
Out[2]=

Import the bibliographic references from this file:

In[3]:=
Click for copyable input
Out[3]=

Import the residue sequence:

In[4]:=
Click for copyable input
Out[4]=

This gives the same sequence as a string of single-character abbreviations:

In[5]:=
Click for copyable input
Out[5]=

Get structural information about this molecule:

In[6]:=
Click for copyable input
Out[6]=

Show the protein backbone in a stylized form:

In[7]:=
Click for copyable input
Out[7]=

Show the same protein, using standard colors for each residue:

In[8]:=
Click for copyable input
Out[8]=

This imports the sample file as a ball-and-stick graphic:

In[9]:=
Click for copyable input
Out[9]=

Show the same protein as a wireframe model:

In[10]:=
Click for copyable input
Out[10]=

Import residue data:

In[11]:=
Click for copyable input
Out[11]=

This imports the sample file, rendering atoms as space-filling spheres:

In[12]:=
Click for copyable input
Out[12]=

Import a DNA model:

In[1]:=
Click for copyable input
Out[1]=

Import RNA and DNA sequences from this file:

In[2]:=
Click for copyable input
Out[2]=
In[3]:=
Click for copyable input
Out[3]=

Read all data from a PDB file, and export it back to PDB:

In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
Out[2]=

Import a simple 3D model from a MOL file, and export it to PDB:

In[1]:=
Click for copyable input
Out[1]=
In[2]:=
Click for copyable input
Out[2]=

Import the resulting PDB file as a 3D graphic:

In[3]:=
Click for copyable input
Out[3]=
Introduced in 2007
(6.0)
| Updated in 2014
(10.0)