Matrix Science
Home Mascot Help  
   
  Help > Mascot Search Overview   
 
 
 
Mascot Search Forms
Peptide Mass Fingerprint
Sequence Query
MS/MS Ions Search
Mascot Help
Search parameter reference
Data file format
Scoring algorithm
Results format
Results Interpretation
Error tolerant search
Decoy database search
Quantitation
User Meeting Presentations
2007
More Help
Help Topic Index
Unimod
Useful Links
 

Mascot Search Overview

Mascot is a powerful search engine which uses mass spectrometry data to identify proteins from primary sequence databases.

While a number of similar programs available, Mascot is unique in that it integrates all of the proven methods of searching. These different search methods can be categorised as follows:

  • Peptide Mass Fingerprint in which the only experimental data are peptide mass values, (detailed description)
  • Sequence Query in which peptide mass data are combined with amino acid sequence and composition information. A super-set of a sequence tag query, (detailed description)
  • MS/MS Ion Search using uninterpreted MS/MS data from one or more peptides, (detailed description)

The general approach for all types of search is to take a small sample of the protein of interest and digest it with a proteolytic enzyme, such as trypsin. The resulting digest mixture is analysed by mass spectrometry.

Different types of mass spectrometer have different capabilities. A simple instrument will measure a set of molecular weights for the intact mixture of peptides. An instrument with MS/MS capability can additionally provide structural information by recording the fragment ion spectrum of a peptide. Usually, the digest mixture will be separated by chromatography prior to MS/MS analysis, so that MS/MS spectra from individual peptides can be measured.

The experimental mass values are then compared with calculated peptide mass or fragment ion mass values, obtained by applying cleavage rules to the entries in a comprehensive primary sequence database. By using an appropriate scoring algorithm, the closest match or matches can be identified. If the "unknown" protein is present in the sequence database, then the aim is to pull out that precise entry. If the sequence database does not contain the unknown protein, then the aim is to pull out those entries which exhibit the closest homology, often equivalent proteins from related species.

The sequence databases that can be searched on this server are:

  • MSDB is a comprehensive, non-identical protein sequence database maintained by the Proteomics Department at the Hammersmith Campus of Imperial College London. MSDB is designed specifically for mass spectrometry applications.
  • NCBInr
  • is a comprehensive, non-identical protein database maintained by NCBI for use with their search tools BLAST and Entrez. The entries have been compiled from GenBank CDS translations, PIR, SWISS-PROT, PRF, and PDB.
  • SwissProt is a high quality, curated protein database. Sequences are non-redundant, rather than non-identical, so you may get fewer matches for an MS/MS search than you would from a comprehensive database, such as MSDB or NCBInr. SwissProt is ideal for peptide mass fingerprint searches.
  • dbEST is the division of GenBank that contains "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms. During a Mascot search, the nucleic acid sequences are translated in all six reading frames. dbEST is a very large database, and is divided into three sections: EST_human, EST_mouse, and EST_others. Even so, searches of these databases take far longer than a search of one of the non-redundant protein databases. You should only search an EST database if a search of a protein database has failed to find a match.
 
 
Copyright © 2007 Matrix Science Ltd. All Rights Reserved.