| Sequence DatabasesInformation on relevant sequence databases can be found
by following the links below. Additionally, the first issue every year of 
Nucleic Acids Research contains status reports 
from the curators of the major databases.
 dbESTdbEST
is the division of GenBank that contains 
"single-pass" cDNA sequences,
or Expressed Sequence Tags, from a number of organisms. DDBJEntries from the DNA Databank of Japan (DDBJ)
are wholly incorporated into GenBank. EMBLThe EMBL
Nucleotide Sequence Database is a comprehensive database of DNA
and RNA sequences collected from the scientific literature and
patent applications and directly submitted from researchers and
sequencing groups. Data collection is done in collaboration with
GenBank (USA) and the DNA Databank of Japan (DDBJ). GenBankGenBank
is the NIH genetic sequence database, an annotated collection
of all publicly available DNA sequences. There are approximately
1,622,000,000 bases in 2,356,000 sequence records as of June 1998.
The complete release
notes for the current version of GenBank are available by
FTP. A new release is made every two months. GenBank is part of
the International Nucleotide Sequence Database Collaboration,
which is comprised of the DNA DataBank of Japan (DDBJ), the European
Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These
three organizations exchange data on a daily basis. MSDBMSDB
is a non-identical protein sequence database maintained by the Proteomics Department 
at the Hammersmith Campus of Imperial College London. MSDB is designed specifically for 
mass spectrometry applications.
 NCBInrNCBI maintains composite, non-identical protein and nucleic acid 
databases for their search tools
BLAST and 
Entrez.
The entries in the protein database, 
nr
, have been compiled from GenBank CDS translations,
PIR, SWISS-PROT, PRF, and PDB. NCBI has made strong efforts to
cross-reference the sequences in these databases in order to avoid
duplication. OWLOWL is a non-identical composite of four
publicly-available protein databases: SWISS-PROT, PIR (1-3), GenBank
(translation) and NRL-3D. OWL has not been updated since May 1999,
and should be considered obsolete. PDBThe Brookhaven Protein Data
Bank (PDB) is a database of three-dimensional structures.
This means that entries are invariably well characterised, with
reliable sequence data which can also be found in the other databases.
Entries which are unique to PDB tend to be variant proteins, with
distorted structures, which were used to refine a structural determination. PIRThe PIR
(Protein Information Resource) database was initiated at the NBRF
in the early 1960's by the late Margaret O. Dayhoff as a collection
of sequences for the study of evolutionary relationships among
proteins. The database is now an international collaboration of
three data centers: the NBRF, the Munich Information Center for
Protein Sequences (MIPS), and the Japan International Protein
Information Database (JIPID). The three centers cooperate to produce
and distribute a single database of `wild-type' protein sequences. PRFThe Protein Research Foundation
of Japan database contains protein sequences abstracted from scientific
publications. Swiss-ProtSwiss-Prot is a curated
protein sequence database which strives to provide a high level
of annotations (such as the description of the function of a protein,
its domains structure, post-translational modifications, variants,
etc), a minimal level of redundancy and high level of integration
with other databases. It was established in 1986 and has been
maintained collaboratively, since 1987, by the Department of Medical
Biochemistry of the University of Geneva and the EMBL Data Library
(now the EMBL Outstation of The European Bioinformatics Institute
- EBI). |