Quantitation: emPAI protocol

The Exponentially Modified Protein Abundance Index (emPAI) offers approximate, label-free, relative quantitation of the proteins in a mixture based on protein coverage by the peptide matches in a database search result. Developed by Ishihama and colleagues, the key publication is Ishihama, Y., et al., Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein, Molecular & Cellular Proteomics 4 1265-1272 (2005)

Unlike the other quantitation protocols, the information required for emPAI is always present in a search result, and there are no parameter settings, so emPAI is "always on", as long as the MS/MS search contains at least 100 spectra.

The formula is very simple: empai formula

Where N_observed is the number of experimentally observed peptides and N_observable is the calculated number of observable peptides for each protein. The tricky bit is deciding what to include and what to exclude in these two counts.

The number of observed peptides

The count of observed peptides only includes peptide matches with scores at or above the homology threshold, or the identity threshold, if there is no homology threshold. Ishihama et. al. obtained best proportionality for a standard protein mixture by counting unique parent ions, including different charge states from the same peptide sequence. We have followed this same rule.

The number of observable peptides

To estimate the number of observable peptides, Ishihama et. al. performed explicit in silico digests of the protein sequences. The peptide list was then filtered to exclude peptides outside the mass spectrometer scan range and the observed nano-LC retention time range.

For reasons of speed, we prefer to make a calculated estimate of the number of observable peptides based on the protein mass, the average amino acid composition of the database, and the enzyme specificity. The error of doing this is negligible compared with other sources of uncertainty:

It isn't practical to filter by retention time, because this information is usually unavailable
The mass range of the instrument has to be estimated from the range of precursors found in the data set
Mass range filtering is by Mr, rather than m/z
The digest is assumed to be a limit digest
No obvious way to extend the calculation to semi-specific or non-specific digests

In the supplementary material for Ishihama et. al., there is a worked example for human serum albumin which resulted in a count of 34 for the observable peptides in the Mr range 700 to 2800 and the retention time range 40 to 150 minutes. The enzyme was strict trypsin and no missed cleavages were allowed. The number of peptides estimated by the routine used here, (&emPAI_digest), is 35.

Example

Click here for an example of emPAI. We are grateful to Dr Jyoti Choudhary of the Sanger Institute for this small, LC-MS/MS data set from a human cell lysate acquired using a Waters QTof. Notice how hits 14 and 15 both have the same number of matches, but KPYM_HUMAN has the smaller emPAI value because N_observable will be some 3 times larger than for BASI_HUMAN.