Sequence Databases
Information on relevant sequence databases can be found
by following the links below. Additionally, the first issue every year of
Nucleic Acids Research contains status reports
from the curators of the major databases.
dbEST
dbEST
is the division of GenBank that contains
"single-pass" cDNA sequences,
or Expressed Sequence Tags, from a number of organisms.
DDBJ
Entries from the DNA Databank of Japan (DDBJ)
are wholly incorporated into GenBank.
EMBL
The EMBL
Nucleotide Sequence Database is a comprehensive database of DNA
and RNA sequences collected from the scientific literature and
patent applications and directly submitted from researchers and
sequencing groups. Data collection is done in collaboration with
GenBank (USA) and the DNA Databank of Japan (DDBJ).
GenBank
GenBank
is the NIH genetic sequence database, an annotated collection
of all publicly available DNA sequences. There are approximately
1,622,000,000 bases in 2,356,000 sequence records as of June 1998.
The complete release
notes for the current version of GenBank are available by
FTP. A new release is made every two months. GenBank is part of
the International Nucleotide Sequence Database Collaboration,
which is comprised of the DNA DataBank of Japan (DDBJ), the European
Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These
three organizations exchange data on a daily basis.
MSDB
MSDB
is a non-identical protein sequence database maintained by the Proteomics Department
at the Hammersmith Campus of Imperial College London. MSDB is designed specifically for
mass spectrometry applications.
NCBInr
NCBI maintains composite, non-identical protein and nucleic acid
databases for their search tools
BLAST and
Entrez.
The entries in the protein database,
nr
, have been compiled from GenBank CDS translations,
PIR, SWISS-PROT, PRF, and PDB. NCBI has made strong efforts to
cross-reference the sequences in these databases in order to avoid
duplication.
OWL
OWL is a non-identical composite of four
publicly-available protein databases: SWISS-PROT, PIR (1-3), GenBank
(translation) and NRL-3D. OWL has not been updated since May 1999,
and should be considered obsolete.
PDB
The Brookhaven Protein Data
Bank (PDB) is a database of three-dimensional structures.
This means that entries are invariably well characterised, with
reliable sequence data which can also be found in the other databases.
Entries which are unique to PDB tend to be variant proteins, with
distorted structures, which were used to refine a structural determination.
PIR
The PIR
(Protein Information Resource) database was initiated at the NBRF
in the early 1960's by the late Margaret O. Dayhoff as a collection
of sequences for the study of evolutionary relationships among
proteins. The database is now an international collaboration of
three data centers: the NBRF, the Munich Information Center for
Protein Sequences (MIPS), and the Japan International Protein
Information Database (JIPID). The three centers cooperate to produce
and distribute a single database of `wild-type' protein sequences.
PRF
The Protein Research Foundation
of Japan database contains protein sequences abstracted from scientific
publications.
Swiss-Prot
Swiss-Prot is a curated
protein sequence database which strives to provide a high level
of annotations (such as the description of the function of a protein,
its domains structure, post-translational modifications, variants,
etc), a minimal level of redundancy and high level of integration
with other databases. It was established in 1986 and has been
maintained collaboratively, since 1987, by the Department of Medical
Biochemistry of the University of Geneva and the EMBL Data Library
(now the EMBL Outstation of The European Bioinformatics Institute
- EBI).
|