PDB (.pdb)

Background & Context

    • MIME type: chemical/x-pdb
    • Protein Data Bank PDB files.
    • 3D molecular model file.
    • Used in bioinformatics applications and on the web for storing and exchanging molecule models.
    • PDB is an acronym for Protein Data Bank.
    • Plain text format.
    • Stores structure information for large biological molecules such as proteins and nucleic acids.
    • Does not store chemical bond information.
    • Developed in 1971 at Brookhaven National Laboratory.
    • Maintained by the Research Collaboratory for Structural Bioinformatics (RCSB).

Import & Export

  • Import["file.pdb"] reads a PDB file and returns a stylized rendering of the protein.
  • The Wolfram Language provides a variety of 3D rendering styles for macromolecules.
  • Export["file.pdb",expr] creates a PDB file from a 3D model of a molecule.
  • Import["file.pdb"] returns a Graphics3D object.
  • Import["file.pdb",elem] imports the specified element from a PDB file.
  • Import["file.pdb",{elem,suba,subb,}] imports a subelement.
  • Import["file.pdb",{{elem1,elem2,}}] imports multiple elements.
  • The import format can be specified with Import["file","PDB"] or Import["file",{"PDB",elem,}].
  • Export["file.pdb",{elem1->expr1,elem2->expr2,}] uses rules to specify the elements to be exported.
  • See the following reference pages for full general information:
  • Import, Exportimport from or export to a file
    CloudImport, CloudExportimport from or export to a cloud object
    ImportString, ExportStringimport from or export to a string
    ImportByteArray, ExportByteArrayimport from or export to a byte array

Import Elements

  • General Import elements:
  • "Elements" list of elements and options available in this file
    "Summary"summary of the file
    "Rules"list of rules for all available elements
  • Export uses the "Rules" element by default.
  • Graphics element:
  • "Graphics3D"PDB file rendered as a Graphics3D object
  • Import uses the "Graphics3D" element by default for the PDB format.
  • Data representation elements:
  • "AdditionalAtoms"atoms that are not constituents of a chain
    "AdditionalCoordinates"3D coordinates of additional atoms
    "AdditionalIndex"index of additional atoms in VertexCoordinates and VertexTypes
    "AdditionalResidues"additional residue sequences given as an array of three-letter abbreviations
    "Molecule"a symbolic representation of the molecule model
    "ResidueAtoms"list of residue atoms
    "ResidueChainLabels"list of chain labels
    "ResidueCoordinates"3D coordinates of residue atoms
    "ResidueIndex"index of residue atoms in VertexCoordinates and VertexTypes
    "ResidueRoles"functional roles of residue atoms
    "Residues"residue sequences given as an array of three-letter abbreviations
    "Resolution"spatial resolution of the model coordinates in picometers
    "SecondaryStructure"rules describing the large-scale structure of a chain
    "Sequence"residue sequences given as a list of strings
    "VertexCoordinates"atomic coordinates, typically given in picometers
    "VertexTypes"all atoms or groups constituting the molecule, typically given as a list of chemical element abbreviations
  • When reading an incomplete chain that is missing one or more residues from PDB, the Wolfram Language will represent it as a sequence of individual subchains.
  • The Wolfram Language uses the standard IUB/IUPAC abbreviations for amino acid residues:
  • Aalanine (Ala)
    Ccysteine (Cys)
    Daspartic acid (Asp)
    Eglutamic acid (Glu)
    Fphenylalanine (Phe)
    Gglycine (Gly)
    Hhistidine (His)
    Iisoleucine (Ile)
    Klysine (Lys)
    Lleucine (Leu)
    Mmethionine (Met)
    Nasparagine (Asn)
    Pproline (Pro)
    Qglutamine (Gln)
    Rarginine (Arg)
    Sserine (Ser)
    Tthreonine (Thr)
    Vvaline (Val)
    Wtryptophan (Trp)
    Ytyrosine (Tyr)
    Xunspecified or unknown amino acid (Unk)
  • The following abbreviations are used to represent nucleic acids:
  • Aadenosine
    Ccytidine
    Gguanosine
    Iinosine
    Tthymidine
    Uuridine
    Xunspecified or unknown nucleic acid
  • When importing a PDB file that describes multiple 3D models of the same molecule, the following Import elements can be used to read the geometries of all models:
  • "ResidueCoordinatesList"residue coordinates for each model
    "AdditionalCoordinatesList"3D coordinates of additional atoms for each model
    "VertexCoordinatesList"atomic coordinates for each model, typically given in picometers
  • Meta-information elements:
  • "Authors"author information as referenced in the file
    "Comments"comments and remarks stored in the file, given as a list of strings
    "DepositionDate"when the file was added to the database
    "Organism"organism in which the protein occurs
    "PDBClassification"PDB classification from the file header
    "PDBIDPDB structure identification string
    "References"bibliographic references, given as rules
    "Title"document title

Options

  • General rendering options:
  • ImageSizeAutomaticspecifies the overall size of the graphics to display
    BackgroundWhitespecifies what background color to use
    ColorFunctionAutomaticfunction to apply to determine the coloring of secondary structure visualizations
    ViewPointAutomaticpoint in space from which the 3D model is to be viewed
  • With the default setting ViewPoint->Automatic, the Wolfram Language automatically calculates the optimal viewing angle for the imported molecule model.
  • Selecting a rendering style:
  • "Rendering""Structure"specifies the visualization method
  • Possible settings for "Rendering" are:
  • "BallAndStick"displays atoms and bonds as a ball-and-stick model
    "Structure"stylized rendering of the protein backbone
    "Spacefilling"atoms shown as overlapping spheres
    "Wireframe"bonds rendered as lines

Examples

Basic Examples  (6)

Import a large PDB file from the RCSB Protein Data Bank website:

Get the title of this PDB file:

Import the labels for each chain in the above molecule:

Show the Import elements available in a sample file:

Get the name of the organism referenced in this file:

Import the bibliographic references from this file:

Import the structure as a Molecule object:

Get the molecular mass and convert to kilodaltons:

Import the residue sequence:

This gives the same sequence as a string of single-character abbreviations:

Get structural information about this molecule:

Show the protein backbone in a stylized form:

Show the same protein, using standard colors for each residue:

This imports the sample file as a ball-and-stick graphic:

Show the same protein as a wireframe model:

Import residue data:

This imports the sample file, rendering atoms as space-filling spheres:

Import a DNA model:

Import RNA and DNA sequences from this file:

Read all data from a PDB file and export it back to PDB:

Import a simple 3D model from a MOL file and export it to PDB:

Import the resulting PDB file as a 3D graphic:

Export a Molecule object as a PDB string: