Peptide Mass Fingerprint
A mass spectrum of the peptide mixture resulting from the digestion
of a protein by an enzyme provides a fingerprint of great specificity.
So specific, that it is often possible to identify the protein
from this information alone.
This method of identification is much more reliable than using
fingerprints based on PAGE migration patterns or HPLC retention
times. However, peptide mass fingerprinting is limited to the
identification of proteins for which sequences are already known,
it is not a method of structural elucidation.
Choice of Enzyme
An enzyme of low specificity, which digests proteins to a mixture
of free amino acids and di- and tri- peptides, is not a good choice.
A complex mixture containing large numbers of components of similar
mass will result in a lot of overlapping peaks. A further consideration
for MALDI analysis is that the low mass region, below ~500 Da,
is obscured by the presence of matrix peaks.
In general, it is best to use enzymes of specificity equal
to or greater than trypsin.
Missed Cleavages
Setting the number of allowed missed cleavage sites to zero simulates a limit digest.
If you are confident that your digest is perfect, with no partial fragments present,
this will give maximum discrimination and the highest score.
If experience shows that your digest mixtures usually include some partials,
that is, peptides with missed cleavage sites, you should choose a setting of 1, or
maybe 2 missed cleavage sites. Don't specify a higher number without good reason,
because each additional level of missed cleavages increases the number
of calculated peptide masses to be matched against the experimental data.
If the actual digest does not contain extended
partials, this simply increases the number of random matches, and so reduces discrimination.
Choice of Search Masses
Select experimental mass values which are large enough to offer
good discrimination, yet not so large as to be likely to be extended
partials. A good mass range for trypsin is 1000 to 3500 Da.
If you have misgivings about an experimental mass value, then
it is best to leave it out. An example would be a peak which is
broader than the others, indicating that it may be an unresolved
doublet.
Be generous in setting the peptide
mass tolerance. If an experimental mass falls just outside
the allowed window, then it contributes nothing towards the score.
However, remember that the number of spurious matches, and the
search time, increase with the size of the error window.
With Mascot 2.2 and later, if intensity information is supplied,
Mascot will attempt to use this to discriminate against noise peaks.
However, this is not a substitute for having a high
quality peak list.
Supplying a protein molecular weight to some search engines can be risky, because many of
the sequence database entries are for the least processed form of a protein.
For example, the SwissProt entry for bovine insulin, INS_BOVIN, is actually
the sequence of the precursor protein including signal and connecting
peptides. This adds up to a molecular weight of 11,394 Da, so
that a search based too tightly around an experimental measurement
of the molecular weight of this protein (5734 Da) would fail to
find a correct match.
This is not a problem with Mascot, because the protein molecular
weight is applied as a sliding window. That is, for each database
entry, Mascot looks for the highest scoring set of peptide matches
which are within a contiguous stretch of sequence less than or
equal to the specified protein molecular weight.
This will often be less than
the mass of the entire sequence entry (unless the data set happens to include
both the N-terminal and C-terminal peptides).
Consequently, if you specify a value for the protein molecular weight,
this acts only as a ceiling. Not only will you see smaller proteins on
the hit list, you will also see larger ones, but all of the reported matches will
be within a stretch of sequence less than or equal to the specified mass.
Too Many Peptide Masses
"Too many notes, your excellency"
The optimum data set for a peptide mass fingerprint is, of course,
all of the correct peptides and none of the wrong ones.
By correct, we mean that the textbook enzyme cleavage rules were followed, and
only specified modifications are present.
Sadly, real life data are generally far from ideal, and it is almost unknown to get
every single experimental mass value matching and 100% sequence coverage. However, it
is not always recognised that having too many peptide mass values can create
similar difficultires to having too few.
Imagine a tryptic digest of a 20 kDa protein. We would expect something around
20 perfect cleavage peptides. If the digest was incomplete,
or there was a non-quantitative modification, we might expect to double the number of
peptides observed.
If 100 peaks are taken from the mass spectrum of this digest and submitted to Mascot then
either 60 to 80 peaks are noise or there are extensive non-quantitative modifications.
Either possibility is bad news for search specificity. The peaks which cannot be matched
correctly will still contribute to the population of random matches. The scoring
algorithm in Mascot will do the best it can to bring the statistically less probable (i.e.
correct) matches to the top, but there will be occasions when proteins with multiple
random matches overwhelm the hit list.
Autolytic Peptide Masses
For low level digests, it can be useful to screen the experimental
data for enzyme autolysis fragments.
|