Quantitation: Overview

Many different approaches to protein quantitation using mass spectrometry data have been described in the literature. For a short, recent review, see Ong, S. E. and Mann, M., Mass spectrometry-based proteomics turns quantitative, Nature Chemical Biology 1 252-262 (2005). In terms of the "mechanics" of their implementation, most of the popular approaches can be classified into a relatively small number of protocols:

Reporter: Quantitation based on the relative intensities of fragment peaks at fixed m/z values within an MS/MS spectrum. For example, iTRAQ and Tandem Mass Tags

Precursor: Quantitation based on the relative intensities of extracted ion chromatograms (XICs) for precursors within a single data set. This is by far the most widely used approach, which can be used with any chemistry that creates a precursor mass shift. For example, ¹⁸O, AQUA, ICAT, ICPL, Metabolic, SILAC, etc., etc.

Multiplex: Quantitation based on the relative intensities of sequence ion fragment peaks within an MS/MS spectrum. This is a novel approach, which can be used with any chemistry that labels one peptide terminus, creating a small mass shift, such as SILAC or ¹⁸O.

Replicate: Label free quantitation based on the relative intensities of extracted ion chromatograms (XICs) for precursors in multiple data sets aligned using mass and elution time.

emPAI: Label free quantitation for the proteins in a mixture based on protein coverage by the peptide matches in a database search result.

Average: Label free quantitation for the proteins in a mixture based on the application of a rule to the intensities of extracted ion chromatograms (XICs) for the peptide matches in a database search result.

Some protocols can be fully implemented within a Mascot result report because all the necessary information is present in the peak list. These protocols are Reporter, Multiplex, and emPAI. In fact, emPAI is "always on", and will be reported whenever an MS/MS search contains at least 100 spectra.

The other three protocols require additional information from the raw data file, either because it is necessary to integrate the elution profile of each precursor peptide or because information is required for precursor peptides that were not used to trigger MS/MS scans, so are missing from the peak list. So, for Precursor, Replicate, and Average, the quantitation report is generated in Mascot Distiller, which has access to both the Mascot search results and the raw data.

Besides the choice of protocol, there are a large number of other choices and parameters associated with searching, processing, and reporting of quantitation data. These choices and parameters are necessary to provide sufficient flexibility and control, yet it would be undesirable to expose all of them in the search form. The solution is to encapsulate all the settings for a quantitation experiment into a named quantitation method. This means that quantitation support requires just a single control in the search form:

The set of quantitation methods is defined by an XML configuration file, called quantitation.xml. As with other configuration files, this file lives on the Mascot Server and is downloaded by Mascot Distiller and other clients as required. Brave souls may choose to edit the XML file directly, but a more friendly interface is provided by a browser based configuration editor.

Modifications

The introduction of quantitation has required changes in the way that modifications are handled within Mascot. Mascot now takes its modification definitions direct from an XML representation of the Unimod database. To update the local definitions, simply download the latest XML file from the Unimod help page.

In Unimod, both amino acid residues and modifications are defined in terms of their elemental composition. This is important for metabolic labelling, in which the isotopic label is present throughout the peptide backbone.

Unimod also provides a framework for including local definitions of modifications within a quantitation method. For example, the multiplex method may require that a modification has two neutral losses. One of 0 Da and one corresponding to the complete modification moiety, so that a mixed spectrum, containing both labelled and unlabelled peptides, can be matched with a good score. It would be confusing to have such an artificial modification appearing in Unimod, so the preference is to define it within the quantitation method.

In the context of a quantitation method, Mascot now supports exclusive modifications. A group of exclusive modifications can be thought of as a choice of fixed modifications. In many quantitation experiments, separate samples are derivatised then pooled. Thus, a given peptide may carry one or the other set of modifications, but never a mixture of both. Some people use the term "binary" for this type of specificity. We prefer exclusive because binary implies only two possibilities. A SILAC experiment might have three or four different labels which will never be mixed on a single peptide. One of reasons for introducing this new type of modification is that variable modifications greatly increase the size of the search space, because all of the possible permutations and combinations of modified and unmodified residues have to be explored. As the search space becomes larger, the search takes longer and the score threshold increases, making it more difficult to get significant matches from marginal spectra. By treating labels as a choice of fixed modifications, we avoid this combinatorial explosion.

Key concepts

Two key concepts have already been introduced. A quantitation method encapsulates all the settings for searching, processing, and reporting of quantitation data. The "mechanics" of the method are specified by the protocol. These keywords represent concepts, and also structures in the configuration file. You will see these words used consistently, (we hope), in this help, in the browser based configuration editor, and in the XML configuration file.

Another important concept and keyword is a component. This is the characteristic property of a peptide that identifies its origin in the sample mixture. For example, in a SILAC experiment, one component might be unmodified peptides while another component is peptides modified with Label:13C(6) on arginine or lysine. If the protocol was reporter, then a component would be identified by a reporter ion m/z value. If it was a metabolic labelling experiment, then one component might be ¹⁴N peptides and the other ¹⁵N peptides. In a label free experiment, using the replicate protocol, each data file would be a component.

If the one component was called light and the second heavy, we might want a report to list the ratio of heavy over light, or maybe light over heavy. This is specified as a report ratio and is not limited to two components. In an ¹⁸O experiment, for example, you might want to report the ratio of (¹⁸O₁ + ¹⁸O₂) / ¹⁸O₀. Both the numerator and denominator of a reported ratio can be linear combinations of components. For example, (0.5 * A + 0.5 * B - C) / (D + E) ... not that we can think of a practical use for something so complicated.

In a quantitation method, modifications are organised into groups, classified as fixed, variable, or exclusive. Modification groups can be defined as variable or exclusive at the component level, where they usually characterise the component. They can also be defined at the method level, but only as fixed or variable. Defining modifications at the method level is a convenience, for modifications that are important to the method, and saves having to choose them in the search form.

The other method level tabs in the configuration editor, which correspond to child elements of the method element in the XML file, are:

integration: Choices and parameters to control the way in which extracted ion chromatograms are integrated in Mascot Distiller. Also, how to align precursors in the replicate protocol.
quality: Miscellaneous quality criteria that peptide matches must meet before they can be used for quantitation. The most important is the strength of the peptide match, defined in terms of either a minimum score, a maximum expect value, or the score being at or above either the identity threshold or the homology threshold.
outliers: When ratios for individual peptide matches are combined into ratios for protein hits, a variety of procedures are available for detecting and rejecting outliers
normalisation: Whatever the quantitation method, it can be difficult to ensure that each component is treated identically. If it is reasonable to expect that only a minority of the proteins in the sample will be up or down regulated, a common solution is to normalise the data so as to make the average ratios across the entire data set unity.