"RCSBProteinDataBank" (Service Connection)

Connect to the RCSB Protein Data Bank API using the Wolfram Language to query extensive data on biomolecule structures and their properties.

Connecting & Authenticating

ServiceConnect["RCSBProteinDataBank"] creates a connection to the RCSB Protein Data Bank API. If a previously saved connection can be found, it will be used; otherwise, a new authentication request will be launched.

Requests

ServiceExecute["RCSBProteinDataBank","request",params] sends a request to the RCSB Protein Data Bank API, using parameters params. The following are the possible requests.

BioMolecule Structures

Request:

"BioMolecule" get a BioMolecule structure from RCSB Protein Data Bank

Parameters:
  • "PDBStructureID"NonePDB structure ID
  • Data

    Request:

    "EntryData" get the relevant data present in the structure file from RCSB Protein Data Bank as a Dataset

    Parameters:
  • "PDBStructureID"NonePDB structure ID
  • Request:

    "ChemicalComponentData" get the relevant information about residues present in the RCSB Protein Data Bank

    Parameters:
  • "ComponentID"Nonechemical ID of a residue
  • PDB IDs

    Request:

    "TextSearch" get the relevant PDB IDs from RCSB PDB by giving a simple text query

    Parameters:
  • "Query"Nonesearch query
    "StartIndex"1start index of the output structures
    MaxItems10total number of output structures
  • Request:

    "SequenceSearch" get the relevant PDB IDs from RCSB PDB by providing a BioSequence

    Parameters:
  • "BioSequence"Nonesequence to search against
    "MinimumOverlapFraction"Noneminimum overlap fraction between two sequences
    "StartIndex"1start index of the output structures
    MaxItems10total number of output structures
  • Request:

    "SequenceMotifSearch" get the relevant PDB IDs from RCSB PDB by searching for a sequence motif

    Parameters:
  • "Motif"Nonemotif to search against; can be a string or BioSequence object
    "PatternType"Nonetype of the input motif
    "SequenceType"Nonetype of the sequence motif
    "StartIndex"1start index of the output structures
    MaxItems10total number of output structures
  • Parameter Details

    Possible values for "PatternType" in the request "SequenceMotifSearch" include:
  • "Simple"simple expression
    "Regex"regular expression
    "Prosite"Prosite expression
  • Possible values for "SequenceType" in the request "SequenceMotifSearch" include:
  • "Protein"protein sequence
    "DNA"DNA sequence
    "RNA"RNA sequence
  • When the value of the "Motif" in the request "SequenceMotifSearch" is a string, then depending on the "PatternType", the following convention is used:
  • "X"any single letter code of either protein, DNA or RNA
    "{P}"any amino acid except "P" ("Pro")
    "[ST]"either "S" ("Ser") or "T" ("Thr")
    "X(2)"same as "XX"
    "X(2,4)""XX" or "XXX" or "XXXX"
    "C-{S}-C-X(2)-[LIVMYFWC]""Prosite" format example
    "C{S}CXX[LIVMYFWC]""Regex" format example
    "CXCXXL""Simple" format example
  • Examples

    Basic Examples  (7)

    Create a new connection:

    Get a BioMolecule object by providing the PDB ID through an ExternalIdentifier:

    Visualize the BioMolecule:

    Get the data in the structure file:

    Get the relevant information about molecules present in RCSB PDB:

    Search for structures from RCSB PDB by giving a simple text query:

    Visualize one of the structures:

    Get a specific number of structures starting from a "StartIndex" and MaxItems parameters:

    Search structures from RCSB PDB using BioSequence:

    BioSequence of the first structure:

    Align this sequence with the reference sequence:

    The String elements of the above list are the common part of the sequence. The list elements of the output are either mismatch, insertion or deletion. The first element of the list corresponds to the first sequence and the second element corresponds to the second one. See what fraction of the sequences are common:

    So there is almost a 99% overlap between the two sequences.

    "BioSequence" can also be "RNA" or "DNA":

    Visualize the structure:

    Search for a "DNA" sequence:

    Visualize the structure:

    Search for structures that contain a zinc finger sequence motif:

    Visualize the structure of the first element of the search, where zinc is in purple:

    "Motif" can also accept a BioSequence:

    You can also search for "DNA" or "RNA" motifs:

    Visualize the structure of the first element of the search:

    Search by RNA sequence motif:

    Visualize the first structure of the search: