This is documentation for Mathematica 8, which was
based on an earlier version of the Wolfram Language.
View current documentation (Version 11.2)

GenBank (.gb, .gbk)

MIME type: chemical/seq-na-genbank
GenBank molecular biology format.
Native format of the U.S. National Center for Biotechnology Information (NCBI) database.
Standard format for storing and exchanging annotated DNA sequences.
Plain text format.
Developed in 1982 as part of the NIH GenBank project.
  • Import supports all versions of the GenBank file format.
  • Import imports a DNA sequence from a GenBank file.
  • Import returns a string representing the sequence stored in the file.
  • Import imports the specified element from a GenBank file.
  • Import imports multiple elements.
  • See the reference pages for full general information on Import and Export.
"Elements"list of elements and options available in this file
"Rules"full list of rules for each element and option
"Options"list of rules for options, properties, and settings
  • Data representation elements:
"Features"all sequence annotations, given as a list of rules
"Sequence"DNA or protein sequence as a string
"Plaintext"sequences as formatted text
"Comment"miscellaneous comments on sequence
  • Import uses the element by default for the GenBank format.
  • Metainformation elements:
"Locus"locus description
"Definition"GenBank file title
"NCBIAccession"NCBI accession number
"NCBIAccessionVersion"versioned NCBI accession number
"GenBankID"GenBank database identifier
"Project"name of the sequencing project
"Keywords"list of keywords
"Organism"source organism referenced in the file
"Segment"sequence segment, if divided into multiple GenBank files
"Source"source organism
"Reference"bibliographic reference, given as a list of rules
"Comments"comments stored in the file, given as a list of strings
This returns the available elements for a sample GenBank file:
File title:
Basic locus information:
Import information about the source organism:
Extract the accession number and GenBank identifier:
Read the first letters of the DNA sequence:
Import a plaintext version of the sequence:
Read a list of bibliographic references and extract the first one:
This returns the available elements for a sample GenBank file:
In[1]:=
Click for copyable input
Out[1]=
 
File title:
In[1]:=
Click for copyable input
Out[1]=
 
Basic locus information:
In[1]:=
Click for copyable input
Out[1]=
 
Import information about the source organism:
In[1]:=
Click for copyable input
Out[1]=
Extract the accession number and GenBank identifier:
In[2]:=
Click for copyable input
Out[2]=
Read the first letters of the DNA sequence:
In[3]:=
Click for copyable input
Out[3]=
 
Import a plaintext version of the sequence:
In[1]:=
Click for copyable input
Out[1]//Short=
 
Read a list of bibliographic references and extract the first one:
In[1]:=
Click for copyable input
Out[1]=