GenBank (.gb, .gbk)

Background & Context

    • MIME type: chemical/seq-na-genbank
    • GenBank molecular biology format.
    • Native format of the US National Center for Biotechnology Information (NCBI) database.
    • Standard format for storing and exchanging annotated DNA sequences.
    • Plain text format.
    • Developed in 1982 as part of the NIH GenBank project.

Import & Export

  • Import["file.gb"] imports a DNA sequence from a GenBank file.
  • Import["file.gb"] returns a string representing the sequence stored in the file.
  • Import["file.gb",elem] imports the specified element from a GenBank file.
  • Import["file.gb",{elem,suba,subb,}] imports a subelement.
  • Import["file.gb",{{elem1,elem2,}}] imports multiple elements.
  • The import format can be specified with Import["file","GenBank"] or Import["file",{"GenBank",elem,}].
  • See the following reference pages for full general information:
  • Importimport from a file
    CloudImportimport from a cloud object
    ImportStringimport from a string
    ImportByteArrayimport from a byte array

Import Elements

  • General Import elements:
  • "Elements" list of elements and options available in this file
    "Summary"summary of the file
    "Rules"list of rules for all available elements
  • Data representation elements:
  • "Features"all sequence annotations, given as a list of rules
    "Sequence"DNA or protein sequence as a string
    "Plaintext"sequences as formatted text
    "Comment"miscellaneous comments on sequence
  • Import uses the "Sequence" element by default for the GenBank format.
  • Meta-information elements:
  • "Locus"locus description
    "Definition"GenBank file title
    "NCBIAccession"NCBI accession number
    "NCBIAccessionVersion"versioned NCBI accession number
    "GenBankID"GenBank database identifier
    "Project"name of the sequencing project
    "Keywords"list of keywords
    "Organism"source organism referenced in the file
    "Segment"sequence segment, if divided into multiple GenBank files
    "Source"source organism
    "Reference"bibliographic reference, given as a list of rules
    "Comments"comments stored in the file, given as a list of strings

Examples

Basic Examples  (6)

This returns the available elements for a sample GenBank file:

File title:

Basic locus information:

Import information about the source organism:

Extract the accession number and GenBank identifier:

Read the first letters of the DNA sequence:

Import a plaintext version of the sequence:

Read a list of bibliographic references and extract the first one: