"RCSBProteinDataBank" (Service Connection)

Connect to the RCSB Protein Data Bank API using the Wolfram Language to query extensive data on biomolecule structures and their properties.

Connecting & Authenticating

ServiceConnect["RCSBProteinDataBank"] creates a connection to the RCSB Protein Data Bank API. If a previously saved connection can be found, it will be used; otherwise, a new authentication request will be launched.

Requests

ServiceExecute["RCSBProteinDataBank","request",params] sends a request to the RCSB Protein Data Bank API, using parameters params. The following are the possible requests.

BioMolecule Structures

Request:

"BioMolecule" get a BioMolecule structure from RCSB Protein Data Bank

Parameters:
  • "PDBStructureID"NonePDB structure ID
  • Data

    Request:

    "EntryData" get the relevant data present in the structure file from RCSB Protein Data Bank as a Dataset

    Parameters:
  • "PDBStructureID"NonePDB structure ID
  • Request:

    "ChemicalComponentData" get the relevant information about residues present in the RCSB Protein Data Bank

    Parameters:
  • "ComponentID"Nonechemical ID of a residue
  • PDB IDs

    Request:

    "TextSearch" get the relevant PDB IDs from RCSB PDB by giving a simple text query

    Parameters:
  • "Query"Nonesearch query
    "StartIndex"1start index of the output structures
    MaxItems10total number of output structures
  • Request:

    "SequenceSearch" get the relevant PDB IDs from RCSB PDB by providing a BioSequence

    Parameters:
  • "BioSequence"Nonesequence to search against
    "MinimumOverlapFraction"Noneminimum overlap fraction between two sequences
    "StartIndex"1start index of the output structures
    MaxItems10total number of output structures
  • Request:

    "SequenceMotifSearch" get the relevant PDB IDs from RCSB PDB by searching for a sequence motif

    Parameters:
  • "Motif"Nonemotif to search against; can be a string or BioSequence object
    "PatternType"Nonetype of the input motif
    "SequenceType"Nonetype of the sequence motif
    "StartIndex"1start index of the output structures
    MaxItems10total number of output structures
  • Parameter Details

    Possible values for "PatternType" in the request "SequenceMotifSearch" include:
  • "Simple"simple expression
    "Regex"regular expression
    "Prosite"Prosite expression
  • Possible values for "SequenceType" in the request "SequenceMotifSearch" include:
  • "Protein"protein sequence
    "DNA"DNA sequence
    "RNA"RNA sequence
  • When the value of the "Motif" in the request "SequenceMotifSearch" is a string, then depending on the "PatternType", the following convention is used:
  • "X"any single letter code of either protein, DNA or RNA
    "{P}"any amino acid except "P" ("Pro")
    "[ST]"either "S" ("Ser") or "T" ("Thr")
    "X(2)"same as "XX"
    "X(2,4)""XX" or "XXX" or "XXXX"
    "C-{S}-C-X(2)-[LIVMYFWC]""Prosite" format example
    "C{S}CXX[LIVMYFWC]""Regex" format example
    "CXCXXL""Simple" format example
  • Examples

    Basic Examples  (7)

    Create a new connection:

    Get a BioMolecule object by providing the PDB ID through an ExternalIdentifier:

    Visualize the BioMolecule:

    Get the data in the structure file:

    Get the relevant information about molecules present in RCSB PDB:

    Search for structures from RCSB PDB by giving a simple text query:

    Visualize one of the structures:

    Get a specific number of structures starting from a "StartIndex" and MaxItems parameters:

    Search structures from RCSB PDB using BioSequence:

    BioSequence of the first structure:

    Align this sequence with the reference sequence:

    Compare the input sequence with the similar structure using SmithWatermanSimilarity:

    So there is almost a 99% overlap between the two sequences.

    "BioSequence" can also be "RNA" or "DNA":

    Visualize the structure:

    Search for a "DNA" sequence:

    Visualize the structure:

    Search for structures that contain a zinc finger sequence motif:

    Visualize the structure of the first element of the search, where zinc is in purple:

    "Motif" can also accept a BioSequence:

    You can also search for "DNA" or "RNA" motifs:

    Visualize the structure of the first element of the search:

    Search by RNA sequence motif:

    Visualize the first structure of the search: