| Quantitation: emPAI protocolThe Exponentially Modified Protein Abundance
  Index (emPAI) offers approximate, label-free, relative quantitation of the 
  proteins in a mixture based on protein coverage by the peptide matches in a database 
  search result. Developed by Ishihama and colleagues, the key publication is 
  Ishihama, 
  Y., et al., Exponentially modified protein abundance index (emPAI) 
  for estimation of absolute protein amount in proteomics by the number of 
  sequenced peptides per protein, Molecular & Cellular Proteomics 4 1265-1272 (2005)Unlike the other quantitation protocols, the information required for emPAI is always 
  present in a search result, and there are no parameter settings, so
  emPAI is "always on", as long as the MS/MS search contains at least 100 spectra.
  
   The formula is very simple:   Where Nobserved is the number of experimentally observed 
peptides and Nobservable is the calculated number of observable peptides for each protein. 
The tricky bit is deciding 
what to include and what to exclude in these two counts.
 The number of observed peptidesThe count of observed peptides only includes peptide matches with scores at or above the homology threshold, 
or the identity threshold, if there is no homology threshold. Ishihama et. al. obtained best proportionality 
for a standard protein mixture by counting unique parent ions,
including different charge states from the same peptide sequence.
We have followed this same rule.The number of observable peptidesTo estimate the number of observable peptides, Ishihama et. al. performed explicit 
in silico digests of the protein sequences. The peptide list
was then filtered to exclude peptides outside the mass spectrometer 
scan range and the observed nano-LC retention time range.For reasons of speed, we prefer to make a calculated estimate of
the number of observable peptides based on the protein mass, the average amino acid composition of 
the database, and the enzyme specificity. The error of doing this is negligible
compared with other sources of uncertainty:
 
  It isn't practical to filter by retention time, because this information is usually unavailableThe mass range of the instrument has to be estimated from the range of precursors found in the data setMass range filtering is by Mr, rather than m/zThe digest is assumed to be a limit digestNo obvious way to extend the calculation to semi-specific or non-specific digests In the
supplementary 
material for Ishihama et. al., there is a worked example for 
human serum albumin which resulted in a count of 34 for the observable peptides 
in the Mr range 700 to 2800 and the retention time range 40 to 150 minutes.
The enzyme was strict trypsin and no missed cleavages were allowed. The number of 
peptides estimated by the routine used here, (&emPAI_digest), is 35. 
Click here for an example of emPAI. We are grateful to Dr Jyoti Choudhary of the
Sanger Institute
for this small, LC-MS/MS data set from a human cell lysate
acquired using a Waters QTof. Notice how  hits 14 and 15 both have the same number of 
matches, but KPYM_HUMAN has the smaller emPAI value because Nobservable will be some 3 times 
larger than for BASI_HUMAN. |