Related Topics

Peptide Mass Fingerprint

A mass spectrum of the peptide mixture resulting from the digestion of a protein by an enzyme provides a fingerprint of great specificity. So specific, that it is often possible to identify the protein from this information alone.

This method of identification is much more reliable than using fingerprints based on PAGE migration patterns or HPLC retention times. However, peptide mass fingerprinting is limited to the identification of proteins for which sequences are already known, it is not a method of structural elucidation.

Choice of Enzyme

An enzyme of low specificity, which digests proteins to a mixture of free amino acids and di- and tri- peptides, is not a good choice. A complex mixture containing large numbers of components of similar mass will result in a lot of overlapping peaks. A further consideration for MALDI analysis is that the low mass region, below ~500 Da, is obscured by the presence of matrix peaks.

In general, it is best to use enzymes of specificity equal to or greater than trypsin.

Missed Cleavages

Setting the number of allowed missed cleavage sites to zero simulates a limit digest. If you are confident that your digest is perfect, with no partial fragments present, this will give maximum discrimination and the highest score.

If experience shows that your digest mixtures usually include some partials, that is, peptides with missed cleavage sites, you should choose a setting of 1, or maybe 2 missed cleavage sites. Don't specify a higher number without good reason, because each additional level of missed cleavages increases the number of calculated peptide masses to be matched against the experimental data. If the actual digest does not contain extended partials, this simply increases the number of random matches, and so reduces discrimination.

Choice of Search Masses

Select experimental mass values which are large enough to offer good discrimination, yet not so large as to be likely to be extended partials. A good mass range for trypsin is 1000 to 3500 Da.

If you have misgivings about an experimental mass value, then it is best to leave it out. An example would be a peak which is broader than the others, indicating that it may be an unresolved doublet.

Be generous in setting the peptide mass tolerance. If an experimental mass falls just outside the allowed window, then it contributes nothing towards the score. However, remember that the number of spurious matches, and the search time, increase with the size of the error window.

With Mascot 2.2 and later, if intensity information is supplied, Mascot will attempt to use this to discriminate against noise peaks. However, this is not a substitute for having a high quality peak list.

Constraining the Protein Molecular Weight

Supplying a protein molecular weight to some search engines can be risky, because many of the sequence database entries are for the least processed form of a protein. For example, the SwissProt entry for bovine insulin, INS_BOVIN, is actually the sequence of the precursor protein including signal and connecting peptides. This adds up to a molecular weight of 11,394 Da, so that a search based too tightly around an experimental measurement of the molecular weight of this protein (5734 Da) would fail to find a correct match.

This is not a problem with Mascot, because the protein molecular weight is applied as a sliding window. That is, for each database entry, Mascot looks for the highest scoring set of peptide matches which are within a contiguous stretch of sequence less than or equal to the specified protein molecular weight.

This will often be less than the mass of the entire sequence entry (unless the data set happens to include both the N-terminal and C-terminal peptides). Consequently, if you specify a value for the protein molecular weight, this acts only as a ceiling. Not only will you see smaller proteins on the hit list, you will also see larger ones, but all of the reported matches will be within a stretch of sequence less than or equal to the specified mass.

Too Many Peptide Masses

"Too many notes, your excellency"

The optimum data set for a peptide mass fingerprint is, of course, all of the correct peptides and none of the wrong ones. By correct, we mean that the textbook enzyme cleavage rules were followed, and only specified modifications are present.

Sadly, real life data are generally far from ideal, and it is almost unknown to get every single experimental mass value matching and 100% sequence coverage. However, it is not always recognised that having too many peptide mass values can create similar difficultires to having too few.

Imagine a tryptic digest of a 20 kDa protein. We would expect something around 20 perfect cleavage peptides. If the digest was incomplete, or there was a non-quantitative modification, we might expect to double the number of peptides observed.

If 100 peaks are taken from the mass spectrum of this digest and submitted to Mascot then either 60 to 80 peaks are noise or there are extensive non-quantitative modifications. Either possibility is bad news for search specificity. The peaks which cannot be matched correctly will still contribute to the population of random matches. The scoring algorithm in Mascot will do the best it can to bring the statistically less probable (i.e. correct) matches to the top, but there will be occasions when proteins with multiple random matches overwhelm the hit list.

Autolytic Peptide Masses

For low level digests, it can be useful to screen the experimental data for enzyme autolysis fragments.