Proline User Guide

Release 1.6

Proline Software Suite

Proline is a suite of software and components dedicated to mass spectrometry proteomics. Proline lets you extract data from raw files, import results from MS/MS identification engines, organize and store your data in a relational database, process and analyse this data to finally visualize and extract knowledge from MS based proteomics results.

The current version support the following features:

Import result files (OMSSA, Mascot and X!Tandem files are currently supported). Once imported, search results can then be browsed and visualized through a graphical user interface.

Validate search results using customizable filters and infer proteins identification based on validated PSM. Identification summaries issued from the validation can obviously be browsed and visualized.

Combine individual search results or identification summaries to build a comprehensive proteome.
Quantify identified peptides and proteins by spectral counting or label free LC-MS quantification

Export identification summaries or search results in different formats including standard exchange formats.

The software suite is based on two main components: a server handling processing tasks and based on relational database management system storing the data generated by the software and two different graphical user interfaces, both allowing users to start tasks and visualize the data: Proline Studio which is a rich client interface and Proline Web the web client interface. An additional component called ProlineAdmin dedicated to system administrators to setup and manage Proline.

Concepts

Read the Concepts & Principles documentation to understand the main concepts and algorithms implemented in Proline.

How-to

Find quick answer to your questions in this How to section.

Raw file conversion to mzDB

This procedure is detailed in the mzDB Documentation section.

Proline Concepts & Principles

Dataset types:

Result File
Search Results
Decoy Searches
Identification Summary

Data Processing:

Protein Inference
Protein and Proteins Sets scoring
FDR Estimation
Validation Algorithm
Protein Sets Filtering
Merge multiple Search Results
Merge multiple Identification Summaries
Compare with Spectral Count
Quantitation (Principle)

LC-MS quantification
LC-MS quantification workflows
mzDB-processing
Label-free LC-MS quantitation workflow

Quantitation (Configuration)

Label-free LC-MS quantitation configuration
Post-processing of LC-MS quantitative results

Data Import/Export:

Identification Summary Export

Introduction

Proline considers different types of identification data: Result Files, Search Results and Identification Summaries which will be defined in the following sections. All these data are connected according to this chart:

Result File

A Result File is the file created by a search engine when a search is submitted. OMSSA (.omx files), Mascot (.dat files) and X!Tandem (.xml files) search engines are currently supported by Proline. Generic mzIdentML files could also be imported. A beta version for MaxQuant support has been implemented. Actually only search result will be imported from MaxQuant files.

A first step when using Proline is to import Result Files through Proline Studio or Proline Web.

Search engines may provide different types of searches for MS and MS/MS data. It is important to highlight that the Result File content depends on the search type. Proline only supports MS/MS ions searches at this point.

Search Result

A Search Result is the raw interpretation of a given set of MS/MS spectra given by a search engine or a de novo interpretation process. It contains one or many peptides matching the submitted MS/MS spectra (PSM, i.e. Peptide Spectrum Match), and the protein sequences these peptides belong to. The Search Result also contains additional information such as search parameters, used protein sequence databank, etc.

A Search Result is created when a Result File (Mascot .dat file, OMSSA .omx or a X! Tandem .xml) is imported in Proline. In the case of a target-decoy search, two Search Results are created: one for the target PSMs, one for decoy PSMs.

Content of a Search Result

Importing a Result File creates a new Search Result in the database which contains the following information:

Search Settings: software name and version, parameters values
Peaklist and Spectrum information: file name, MS level, precursor m/z, …
Search result data:

Protein sequences
Peptide sequences
Spectra
Two kinds of Matches:

Peptide Spectrum Matches (PSM), i.e. the matching between a peptide and a spectrum, with some related data such as the score, fragment matches…
Protein Matches, i.e. the proteins in the databank corresponding to the PSMs identified by the search engine

Mascot result importation

The PSM score corresponds to Mascot ion score.

OMSSA result importation

The PSM score corresponds to the negative common logarithm of the E-value:

Score = -log10(E-value)

Note: Proline only supports OMSSA Result Files generated with the 2.1.9 release.

X!Tandem result importation

The X!Tandem standard hyperscore is used as a PSM score.

Note: Proline supports X!Tandem Result Files generated with the Sledgehammer release (or later).

Decoy Searches

Proline handles decoy searches performed from two different strategies:

Concatenated searches:

A protein databank is created by concatenating target protein sequence to decoy protein sequence. Decoy could be created using reverse or random strategy. A unique search is done using that databank.

Separated searches:

Two searches are done using the same peaklist, one on a target protein databank and one on a decoy protein databank. These searches are then combined to retrieve useful information such as FDR … Mascot allows the user to check a decoy option and automatically creates a decoy databank.

Decoy and Target Search Result

Concatenated searches:

When importing a Search Result from a decoy concatenated databank, decoy data are extracted from the Result File and stored in Proline databases as a decoy Search Result independent of the target Search result. Nevertheless both searches are linked to each other.

Separated searches

The two performed searches are stored in Proline databases and are linked together.

See Search Result to view which information is saved.

Identification Summary

An Identification Summary is a set of identified proteins inferred from a subset of the PSM contained in the Search Result. The subset of PSM taken into account are the PSM that have been validated using a filtering process (example: PSM fulfilling some specified criteria such as score greater than a threshold value).

Content of an Identification Summary

Peptide Set

Protein Set

Typical Protein
Sameset
Subset
Subsumable peptide set

Protein Inference

All peptides identifying a protein are grouped in a Peptides Set. A same Peptides Set can identify many proteins, represented by one Proteins Set. In this case, one protein of this Protein Set is chosen to represent the set, it is the Typical Protein. If only a subset of peptides identify a (or some) protein(s), a new Peptide Set is created. This Peptide Set is a subset of the first one, and identified Proteins are Subset Proteins.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\pepprotset1.png$

In the first example, P2 and P5 are identified by the same peptide set {pe1, pe4, pe5, pe8}. P2 was chosen as typical protein. One subset composed of {pe4, pe5, pe8} identifies subset protein P4.
In the second example, another protein set represented by P3 shares some peptides with the protein set represented by P2. Both protein sets have specific peptides.
Sharing could involve many protein sets as shown in example 3.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\pepprotset2.png$

All peptides sets and associated protein sets are represented, even if there are no specific peptides. In both cases above, no choice is done on which protein set / peptide set to keep. These protein sets could be filtered after inference (see Protein sets filtering).

Proteins and Proteins sets scoring

There are multiple algorithms that could be used to calculate the Proteins and Protein Sets scores. Proteins scores are computed during the importation phase while Protein Sets scores are computed during the validation phase.

Protein

Each individual protein match is scored according to all peptide matches associated with this protein, independently of any validation of these peptide matches. The sum of the peptide matches scores is used as protein score (called standard scoring for Mascot result files).

Protein Set

Each individual protein set is scored according to the validated peptide matches belonging to this protein set (see inference).

Scoring schemes

Mascot Standard Scoring

The score associated to each identified protein (or protein set) is the sum of the score of all peptide matches identifying this protein (or protein set). In case of duplicate peptide matches (peptide matched by multiple queries) only the match with the best score is considered.

Mascot MudPIT Scoring

This scoring scheme is also based on the sum of all non-duplicate peptide matches score. However the score for each peptide match is not its absolute value, but the amount that it is above the threshold: the score offset. Therefore, peptide matches with a score below the threshold do not contribute to the protein score. Finally, the average of the thresholds used is added to the score. For each peptide match, the “threshold” is the homology threshold if it exists, otherwise it is the identity threshold. The algorithm below illustrates the MudPIT score computation procedure:

Protein score = 0
For each peptide match {
If there is a homology threshold and ions score > homology threshold {
Protein score += peptide score - homology threshold
} else if ions score > identity threshold {
Protein score += peptide score - identity threshold
}
}
Protein score += 1 * average of all the subtracted thresholds

if there are no significant peptide matches, the protein score will be 0.
homology and identity threshold values depend on a given p-value. By default Mascot and Proline compute these thresholds with a p-value of 5%.
In the case of separated target-decoy searches we obtain two values for each threshold: one for the target search and another one for the decoy search. In order to obtain a single value we apply the following procedure:

the homology threshold is the decoy value if it exists else the target value
the identity threshold is the mean of target and decoy values.

The benefit of the MudPIT score over the standard score is that it removes many of the junk protein sets, which have a high standard score but no high scoring peptide matches. Indeed, protein sets with a large number of weak peptide matches do not have a good MudPIT score.

Mascot Modified MudPIT Scoring

This scoring scheme, introduced by Proline, is a modified version of the Mascot MudPIT one. The difference with the latter is that it does not take into account the average of the substracted thresholds. This leads to the following scoring procedure:

This score has the same benefits than the MudPIT one. The main difference is that the minimum value of this modified version will be always close to zero while the genuine MudPIT score defines a minimum value which is not constant between the datasets and the proteins (i.e. the average of all the subtracted thresholds).

FDR Estimation

There are several ways to calculate FDR depending on the database search type. In Proline the FDR is calculated at peptide and protein levels using the following rules:

if the Search has been done on a concatenated Target/Decoy bank or if rank filter has been used during validation :

Note: when computing PSM FDR, peptide sequences matching a Target Protein and a Decoy Protein is taken into account in both cases.

if the Search has been done on a separated Target/Decoy bank :

Validation Algorithm

Once a result file has been imported and a search result created, the validation is performed in four main steps:

Peptide Matches filtering and Validation
Protein Inference (peptides and proteins grouping)
Protein and Proteins Sets scoring
Protein Sets Filtering and Validation

Finally, the Identification Summary issued from these steps is stored in the identification database. Different validation of a Search Result can be performed and a new Identification Summary of this Search Result is created for each validation.

When validating a merged Search Result, it is possible to porpagate the same validation parameters to all childs Search results. In this case Peptide Matches filtering and Validation will be applied on childs as well as Protein Sets filtering. Note: actually, Protein Sets validation is not propagated to childs Search Results.

Peptide Matches Filtering

Peptide Matches identified in search result can be filtered using one or multiple predefined filters (described hereafter). Only validated peptide matches will be considered for further steps.

Basic Score Filter

All PSMs which score is lower than a given threshold are invalidated.

Pretty Rank Filter

This filtering is performed after having temporarily joined target and decoy PSMs corresponding to the same query (only really needed for separated forward/reverse database searches). Then for each query, PSMs from target and decoy are sorted by score. A rank (Mascot pretty rank) is computed for each PSM depending on their score position: PSM with almost equal score (difference < 0.1) are assigned the same rank. All PSMs with rank greater than specified one are invalidated.

Minimum Sequence Length Filter (Length)

PSMs corresponding to short peptide sequences (length lower than the provided one) can be invalidated using this parameter.

Mascot e-Value Filter (e-Value)

Allows to filter PSMs by using the Mascot expectation value (e-value) which reflects the difference between the PSM score and the Mascot identity threshold (p=0.05). PSMs having an e-value greater than the specified one are invalidated.

Mascot adjusted e-Value Filter (Adjusted e-Value)

Proline is able to compute an adjusted e-value. It first selects the lowest threshold between the identity and homology ones (p=0.05). Then it computes the e-value using this selected threshold. PSMs having an adjusted e-value greater than the specified one are invalidated.

Mascot p-Value on Identity Filter (Identity p-Value)

Given a specific p-value, the Mascot identity threshold is calculated for each query and all peptide matches associated to the query with a score lower than calculated identity threshold are invalidated.
When parsing Mascot result file, the number of PSM candidate for a spectra is saved and could be used to recalculate identity threshold for any p-value.

Mascot p-Value on homology Filter (Homology p-Value)

Given a specific p-value, the Mascot homology threshold is inferred for each query and all peptide matches associated to the query with a score lower than calculated homology threshold are invalidated.

Single PSM per MS Query Filter

This filter validates only one PSM per Query. To select a PSM, following rules are applied:

For each query:

Select PSM with higher score.
If several PSM with same score:

Choose PSM which identify the protein which have the max number of valid PSM
If still equality

Choose the first PSM

:!: For testing purpose, it is possible to request for this filter to be executed after Peptide Matches Validation (see below). In this case, the requested FDR in validation step is modified by this filter. This is just to confirm the need or not of this filter and to validate the way we apply it!

Single PSM per Rank Filter

This filter validates only one PSM per Pretty Rank. If you choose this filter + a pretty rank filter you should have the same behaviour than the “Single PSM per Query Filter”.

In order to choose the PSM, following rules are applied:

For Pretty Rank of each query:

If several PSM :

Choose PSM which identify the protein which have the max number of valid PSM
If equality

Choose the first PSM

Peptide Matches Validation

Specify an expected FDR and tune a specified filter in order to obtain this FDR. See how FDR is calculated.

Once previously described prefilters have been applied, a validation algorithm can be run to control the FDR: given a criteria, the system estimates the better threshold value in order to reach a specific FDR.

Protein Sets Filtering

Filtering applied during validation is the same as Protein Sets Filtering

Protein Sets Validation

Once prefilters (see above) have been applied, a validation algorithm can be run to control the FDR. See how FDR is calculated.

At the moment, it is only possible to control the FDR by changing the Protein Set Score threshold. Three different protein set scoring functions are available.

Given an expected FDR, the system tries to estimate the best score threshold to reach this FDR. Two validation rules (R1 and R2) corresponding to two different groups of protein sets (see below the detailed procedure) are optimized by the algorithm. Each rule defines the optimum score threshold allowing to obtain the closest FDR to the expected one for the corresponding group of protein sets.

Here is the procedure used for FDR optimization:

protein sets are segregated in two groups, the ones identified by a single validated peptide (G1) and the ones identified by multiple validated peptides (G2), with potentially multiple identified PSMs per peptide.

for each of the validation rules, the FDR computation is performed by merging target and decoy protein sets and by sorting them by descending score. The score threshold is then modulated by using successively the score of each protein set of this sorted list. For each new threshold, a new FDR is computed by counting the number of target/decoy protein sets having a score above or equivalent to this value. The procedure stops when there are no more protein sets in the list or when a maximum FDR of 50% is reached. It has to be noted that the two validation rules are optimized separately:

G2 FDR is first optimized leading to the R2 score threshold. The validation status of G2 protein sets is then fixed.
final FDR (G1+G2) is then optimized leading to the R1 score threshold. Only the G1 protein sets are here used for the score threshold modulation procedure. However the FDR is computed by taking into account the G2 validated target/decoy protein sets.

The separation of proteins sets in two groups allows to increase the power of discrimination between target and decoy hits. Indeed, the score threshold of the G1 group is often much higher than the G2 one. If we were using a single average threshold, this would reduce the number of G2 validated proteins, leading to a decrease in sensitivity for a same value of FDR.

Identification Summary Filtering

Any Identification Summary, generated by validation or merging could be filtered.

Filtering consists in invalidating Protein Sets which doesn't follow specified criteria. Invalidated Protein Sets are not been taken into account for further algorithms or display.

Available filtering criteria are defined below.

Specific peptides Filter

This filter invalidates protein sets that don't have at least x peptides identifying only that protein set. The specificity is considered at the DataSet level.

This filtering go through all Protein Sets from worse score to best score. For each, if the protein set is invalidated, associated peptides properties are updated before going to next protein set. Peptide property is the number of identified protein sets.

Peptides count Filter

This filter invalidates protein sets that don't have at least x peptides identifying that protein set, independently of the number of protein sets identified by the same peptide.

This filtering go through all Protein Sets. For each, if the protein set is invalidated, associated peptides properties are updated before going to next protein set. Peptide property is the number of identified protein sets.

Peptide sequence count Filter

This filter invalidates protein sets that don't have at least x different peptide sequences (independently of PTMs) identifying that protein set.

Protein set score Filter

This filter invalidates protein sets which score is below the a given value.

Search Results Merging

Two king of merge is allowed in Proline.

Merge using aggregation

Merging several Search Results consists in creating a parent Search Result which contains the best child PSM for each peptide. The best PSM is the PSM with the highest score.

Merge using union

Merging several Search Results consists in creating a parent Search Result which contains all child PSMs from each child Search Result. Depending on the size of the child Search Result, this operation may be long and size of project databases may increase quickly.

Parent or child Search Results can be validated the same way.

Another merge algorithm could be used: see Merge Identification Summaries

Identification Summaries Merging

Merging several Identification Summaries consists in creating a parent Identification Summary which contains the best child PSM for each peptide ( The best PSM is the PSM with the highest score) or all validated PSMs from child Identification Summaries depending of the merge mode used, using aggregation ou union.

A Search Result corresponding to this parent Identification Summary is generated and Protein Inference is applied.

Even in union mode this operation should less time and size consuming as only validated PSMs are taken into account.

Compare Identification Summaries with Spectral Count

Definition

The peptide spectral count consists in counting the number of spectra which matches the current peptide. Thus, it is equal to the number of peptide spectrum matches.
Protein basic spectral count (BSC) is equal to the sum of the peptide spectral count for all peptides which identify the protein.
Protein specific spectral count (SSC) is equal to the sum of the peptide spectral count for specific peptide only. A specific peptide, is a peptide which does not identify any other protein (or more precisely protein in other protein sets) in the context of the identification summaries.
Protein weighted spectral count (WSC) is the Protein specific spectral count + sharing-weighted spectral count of shared peptide.

Example calculation of spectral count

Specificity and weight reference

The peptide specificity and the spectral count weight could be defined in the context of the Identification Summary where the spectral count is calculated as shown in previous schema. It could also could be done using another Identification Summary as reference, like using the common parent Identification Summary. This allow to consider only identified and validated protein in the merge context.

If we consider the following case, where Sample1 Identification Summary is the merge of Replicat1 and Replicat2.

If the spectral count calculation is done at each child level, aligning protein sets identified in parent to protein sets in child, we get the following result:

Sample1 Protein Sets	Replicat1				Replicat2
	Ref Prot.	BSC	SSC	WSC	Ref Prot.	BSC	SSC	WSC
P2	P2	5	2	4	P3	7	7	7
P3	P3	4	1	2	P3	7	7	7

We can see that when different parent protein sets are seen as one protein set in a child, the spectral count is biased. This calculation was not retain!

Now, if we align on child protein rather than protein sets, we get the following result:

Sample1 Protein Sets	Replicat1				Replicat2
	Ref Prot.	BSC	SSC	WSC	Ref Prot.	BSC	SSC	WSC
P2	P2	5	2	4	P2	2	0	0
P3	P3	4	1	2	P3	7	7	7

Again, when considering specificity at child level, the result of spectral count in Replicat2 is not representative, as it has a null SSC and WSC. This calculation was not retain!

A way to make some correction is to define the specificity of the peptide and their weight at the parent level, and apply it at the child level. Therefore, specific peptides for P2 is pe8 and for P3 it is pe6 and pe7. For peptide weight, if we consider pe4 for example, it will be define as follow:

Weight pe4 for P2 = 2/8 ⇒ P2 has 1 specific peptide with 2 PSM for a total of 8 PSM (if we consider P2 and P3 (6 specific PSM) which are proteins identified by pe4)
Weight pe4 for P3 = 6/8 ⇒ P3 has 6 specific PSMs for a total of 8 …

The spectral count result will thus be:

Sample1 Protein Sets	Replicat1				Replicat2
	Ref Prot.	BSC	SSC	WSC	Ref Prot.	BSC	SSC	WSC
P2	P2	5	2	2.75	P2	2	0	0.5
P3	P3	4	1	3.25	P3	7	5	6.5

NOTE:

In case of multiple level hierarchy (Sample → Condition1 vs Condition2 → 3 replicates by conditions), it could make sense to calculate the spectral count weight at “Condition1” and “Condition2” levels rather than “Sample” level to keep the difference involved by the experiment condition.

In Proline

Actually, spectral count is calculated for a set of hierarchy related Identification Summaries. In other words, this means that Identification Summaries should have a common parent. The list of protein to compare or to consider is created at the parent level as the peptide specificity. User can select the dataset where the shared peptides spectral count weight is calculated. (See previous chapter)

Firstly, the peptide spectral count is calculated using following rules

Equal to Peptide Spectrum Matches Count if Identification Summaries is a leaf (not issued from a merge)
Sum of child peptide spectral count in case of identification Summaries merge
Sum of validated child peptide spectral count in case of Search Result merge. Validated child PSMs are PSMs which satisfy validation applied to parent Identification Summaries.

Once, peptide spectral count is known for each peptide, protein spectral count is calculated using following rules

Protein BSC = sum of peptide spectral count
Protein SSC = sum of peptide spectral count for specific peptide only
Protein WSC = SSC + weighted peptide spectral count for shared peptides. The weight of a peptide for a given protein (P1) = peptide SC x (number of specific PSM of P1 /number of specific PSMs of all protein identified by the peptide). See explanation in previous chapter

FAQ

Why the BSC is less than Peptide Count ?

When running SC even on a simple hierarchy (1 parent, 2 childs) in some case we obtain a BSC less than peptide count. This occurs only for Invalid Protein Sets. Invalid Protein Sets are the one that are present at the parent level but was filtered at child level (filter on specific peptide for example).

Indeed, the peptide count value is read in the child Protein Sets. On the other hand, the BSC is calculated by getting the spectral count information at child level for each peptide identified at parent level. If a Protein Set is invalidated, its peptide are not taken into account while merging so some of them could be missing at parent level if there were not identified in the other child.

This case is illustrated here

LC-MS Quantitation: Principles

This section will describes in details the quantitation principles and concepts.

LC-MS quantification: Different strategies for quantitative analysis
LC-MS quantification workflows : Workflow and implementation in Proline
mzDB-processing : Extracting peptidic signals from a file converted into the mzDB format
Label-free LC-MS quantitation workflow : Label Free specific workflow

LC-MS quantification

Different strategies for quantitative analysis

Although 2D-gel analysis has been a pioneer method in this field, it has been gradually replaced by LC-MS/MS analysis allowing nowadays to quantify a larger number of proteins and allowing their identification. Quantification is made on thousands of species and requires new and adapted algorithms for the processing of complex data. Two major strategies are available to perform LC-MS/MS relative quantification: strategies based on isotopic-labeling of peptides or proteins in one of the compared conditions, and label-free based strategies that can be analyzed in different ways. There are usually three types of LC-MS/MS data analyses (cf. figure 1):

Extraction of a couple of MS signals detected within a single analysis when using an isotopic-labeling strategy
Counting and comparing the number of fragmentation spectra (MS/MS) of peptides from a given protein detected in parallel analysis (“Label-free quantitation based on spectral-counting”),
Extraction, alignment and comparison of the MS signal intensities from the same peptide detected in parallel analysis (“Label-free quantitation based on LC-MS signal extraction”).

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\quant_analysis_methods.jpeg$

Figure 1: Main view of different approaches of LC-MS/MS quantitative analysis. (Mueller, Brusniak et al. 2008)

At first LC-MS/MS quantitative analysis has been made using isotopic-labeling strategies. Labeling molecules facilitates the relative quantification of two conditions in the same LC-MS/MS run. According to the theory of stable isotope dilution, an isotopically-labeled-peptide is chemically identical to its unlabeled counterpart. Therefore both peptides behave identically during chromatographic separation as well as mass spectrometric analysis (from ionization to detection). As it is possible to measure the difference in mass for the labeled and unlabeled peptide with mass spectrometry, the quantification can be done by integrating and comparing their corresponding signal intensities (cf. figure below).

Figure 2: Extraction of quantitative data from a mass spectrum. On the left the visualization of the isotopic profile for each peptide, labeled (red) and unlabeled (black). On the right, the chromatographic peak reconstruction by extracting the signal of the peptide throughout the duration of the analysis. The integration of this peak gives a proportional value to the abundance of the peptide. Here, the measurement of the areas shows that the abundance of the labeled peptide is 85% that of the unlabeled one.

Isotopic labeling strategies are very efficient but limited by the maximum number of samples that can be compared (eight samples at most for an iTRAQ 8plex labeling), the cost or the constraint due to the introduction of the label. The development of high-resolution instruments, such as the LTQ-Orbitrap, has enabled the development of label-free quantification methods. This methodology is easy to implement as it is no longer necessary to modify the samples, it allows an accurate quantification of the proteins within a complex mixture, and it considerably reduces the cost of the analysis. An LC-MS/MS acquisition can be seen as a map made of all the MS spectra generated by the instrument. This LC-MS map corresponds to a three-dimensional space: elution time (x), m/z (y) and measured intensity (z).

Figure 3: image generated using MsInspect representing an LC-MS map. The dashed square up-right is a zoomed view of the map and gives an idea of the data complexity. The blue points correspond to the monoisotopic mass of the peptide ions.

Analysing MS data can be done in several ways:

Un-supervised approach: it consists in detecting peptide signals from a LC-MS map (cf. figure 3 below). The detection is done by first using peak picking algorithms, then grouping together the peaks that correspond to a same peptide, at the same time on the m/z scale (different isotopes of an isotopic profile and different charge states of a peptide) and on the elution time scale (detected isotopic profiles of the peptide on different consecutive MS spectra all along its chromatographic elution). This process depends on the comparison of experimental data and theoretical known models of isotopic distribution and peptide chromatographic elution. The purpose of this analysis is to find a list of features corresponding to all the signals for a single peptide ion with their corresponding coordinates. The identification of these peptides can be done from the MS/MS spectra matching these features, or using a targeted approach in a second acquisition, or with a database of a set of previously identified peptides containing information such as the peptide sequence, mass and elution time. This third method is called “Accurate Mass and Time Tags” or AMT (Smith, Anderson et al. 2002).
Supervised approach: the coordinates (x, y) of the peptidic signals to extract are known (or predicted). In an LC-MS experiment, the MS signal intensity of an peptide eluting from the chromatographic column can be monitored (cf. figure 4). The area under the curve of the chromatographic peak is the extracted ion current (XIC, also called extracted ion chromatogram) and it is proportional to the peptide abundance in the sample. Indeed it has been proved that the XIC is linearly dependant of the quantity of the peptide (Ong and Mann 2005). Therefore the signal analysis consists in extracting the intensity of the signal at a specific coordinate on the LC-MS map and giving the corresponding XIC.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\supervised_signal_extraction.jpeg$

Figure 4: Extraction of the MS signal of a peptide previously identified using a search engine

The first approach is more exhaustive than the latter as it can find quantitative information on peptides that may not have been fragmented by the mass spectrometer. About the second approach, we can only assume that knowing the peptide exact monoisotopic mass should reduce the probability of making mistakes in the quantification, but no study to our knowledge has proved it so far. In a comparative quantitation analysis, both approaches require the matching of the extracted signals (cf. figure 5). To do this, the LC-MS Maps have to be previously aligned in order to correct the variability coming from the peptide chromatographic elution. Indeed the difference for the elution time of a given peptide in two LC-MS analysis may reach tens of seconds. Even if a peptide mass can be precisely measured, it is still possible that peptides with very close m/z elute at the same time frame. Figure 3 shows how important the density of the measures is. Therefore, comparing LC-MS maps without aligning their time scale would generate many matching errors.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\lcms_feature_mapping.jpeg$

Figure 5: Matching of the detected peptides on several LC-MS maps

Different algorithms have been developed to correct the time scale and are usually optimized for a given approach. Supervised method benefits of the knowledge of the peptide identification and is thus able to align maps with a low error rate. More data processing will be needed to obtain quality quantification results. Read the “LC-MS quantitation workflows” documentation to get more information about LC-MS quantification algorithms in Proline.

Label-free LC-MS quantitation workflow

Analyzing Label-free LC-MS data requires a series of algorithms presented below.

Figure 1 : overview of the different stages of label-free LC-MS data processing

Generation of the LC-MS maps

LC-MS maps are created directly through Proline by using its own algorithms.

These algorithms take advantage of the mzDB file format to process ions not scan by scan but in the m/z dimension by processing elution peaks in LC-MS regions of 5 Da width, also called run slices.

Proline provides multiple algorithms

Unsupervised detection algorithm

The preferred algorithm is Proline is the one performing an unsupervised detection of chromatographic peaks (also called peakels).

The detection of all the MS signals is made in a single iteration on all the run slices of the mzDB file. Thus all peptide signals whose mass is contained in a specific run slice are detected simultaneously.

In a given run slice, m/z peaks are first sorted in descending intensity order. Starting from the most intense m/z peak, the algorithm searches for the peak of the same m/z value in the preceding and following MS1 scans according to a user-defined m/z tolerance depending on the mass analyzer. The lookup procedure stops as soon as the m/z peak is absent in more than a predefined number of consecutive scans. Once collected this m/z peak list, which is comparable to a chromatogram, is then smoothed using a Savitzky-Golay filter and split in the time dimension to form elution peaks by applying a peak picking procedure that searches for significant minima and maxima of signal intensity. At this stage, the process returns a list of elution peaks defined by an m/z value, an apex elution time and an elution time range.

The deisotoping of the obtained peaks is performed separately using two possible algorithms:

an unsupervised algorithm that compute all possible combinations of detected elution peaks
a supervised algorithm that takes advantage of knowledge obtained from MS/MS identifications. Indeed, identified PSMs provides useful information such as the charge states and the mono-isotopic mass of the target peptide ion. And because an MS/MS event can be easily localized in the LC-MS map, is is really easy to map these information to the detected elution peaks and thus to perform an assisted deisotoping of these peaks. This second algorithm is the one that is preferred in Proline.

MS2 driven algorithm

This alternative algorithm performs a target signal extraction rather than an unspervised detection. It has been now deprecated but I think it is important to understand the differences with the unsupervised approach.

In this second algorithm, we consider each MS/MS event triggered by the spectrometer as an evidence for the presence of a peptide ion. Each of these events provides a set of information about the targeted precursor ion: the m/z ratio (assuming it is monoisotopic), the moment when the MS/MS has been triggered (usually not the maximum of the elution peak) and the charge state of the ion. The first and second information can be considered as close coordinates for the peptide signal on the LC-MS map. The charge state (z) can provide additional information to simplify the extraction of different isotopes of the features which are approximately separated by 1/z. For each MS/MS event:

The run slice containing the precursor m/z of the MS/MS event is retrieved (default window is 5 Da, more details in the mzDB documentation), as well as the following run slice, in order to load into memory everything about the peptidic signal including the isotopes. The XIC for the MS/MS precursor mass can be then easily accessed with a user defined mass precision (default is 5 ppm).

The apex of the monoisotopic mass elution peak does not exactly fit to the moment the MS/MS was triggered. Knowing that, the signal on the XIC is integrated on both sides of the moment the MS/MS was triggered (default value is 10 scans) to determine the ascendant slope in order to find the apex. The integration of the signal is done by summing the intensities of n isotopes, n being a user-defined value (default value is 3, including the monoisotopic peak).

For each isotopic profile, the intensities are extracted allowing gaps (default value is 1) until a minimal intensity is reached. This minimal intensity is defined as a ratio of the detected apex intensity (default value is 0.001%). Only one extraction is done per spectrum, hence reducing the extraction time (theoretically).

The peak is detected on the extracted signal corresponding to the isotope signal with the highest relative intensity predicted by the averagine (most of the time it corresponds to the monoisotopic peak, in conventional conditions such as tryptic digestion). The limits of this peak are used to tune all the limits of the isotopes (elution peaks). To do so, two different algorithms are being tested:

“Basic” algorithm: applying a Savitzky-Golay smoothing then looking for the local highest point.

Wavelet-based algorithm: using multiple wavelet transformed curves to determine the position of the peaks

The last step consists in extracting the peptide signals containing a strong overlap with the previously extracted signal (especially with the first two isotopes).

Feature clustering

Maps generated with peak picking algorithms cannot be 100% reliable and often contain redundant signals, corresponding to the same compound. Furthermore, modified peptides having the same sequence can have different PTM polymorphisms that can give different MS signals with the same m/z ratio but having slightly different retention times. Comparing LC-MS maps with such cases is a problem as it may lead to an inversion of feature matches between maps. Creating feature clusters is a way to avoid this issue. This operation is called “Clustering” (cf. figure 2).

Figure 2: grouping features into cluster. All features with the same charge state, close m/z ratio and retention times are grouped in a single cluster. The other features are stored without clustering.

Feature clustering consists in grouping, in a given LC-MS map, the features with the same charge state, close in retention time and m/z ratio (default tolerances are respectively 15 seconds and 10 ppm). Some metrics are calculated for each cluster (equivalent to those used for the features):

Cluster m/z is the median of the m/z of all features in the cluster

Cluster RT is (2 calculation options):

Median: median of all the retention times of the features in the cluster

Most intense: retention time of the most intense feature

Cluster intensity is:

Sum: sum of the intensities of all the features in the cluster

Most intense: intensity of the most intense feature

Cluster charge state is the charge state of every feature in the cluster

Number of MS1 in cluster is the sum of the MS1 signal of all features in the cluster

Number of MS2 in cluster is the sum of the MS2 signal of all features in the cluster

Cluster first scan is the first scan of all the features in the cluster

Cluster last scan is the last scan of all the features in the cluster

The resulting maps are “cleaner”, thus reducing ambiguities for map alignment and comparison. Quantitative data extracted from these maps will be processes in the following steps. It is necessary to eliminate the ambiguities found by the clustering step. To do so, it is possible to rely on the information given by the search engine on each identified peptide. If some ambiguities remain, the end user must be aware of them and be able to either manually handle them or either exclude them from the analysis.

Note: do not mix up clustering and deconvolution which consists in grouping all the charge states detected for a single molecule.

LC-MS map alignment

Feature matching

Because chromatographic separation is not completely reproducible, LC-MS maps must be aligned before being compared. The first step of the alignment algorithm is to randomly pick a reference map and then compare every other map to it. On each comparison, the algorithm will determine all possible matches between detected features, considering time and mass windows (the default values are respectively 600 seconds and 10 ppm). Only landmarks involving unambiguous links between the maps (only one feature on each map) are kept (cf. figure 3).


Figure 3 : Matching features with the reference map respecting a mass (10 ppm) and time tolerance (600 s)

The result of this alignment algorithm can be represented with a scatter plot (cf. figure 5).

Selection of the reference map

The algorithm completes this alignment process several times with randomly chosen reference maps. Then it sums the absolute values of the distance between each map to an average map (cf. figure 4). The map with the lowest sum is the closest to the other maps and will be considered as the final reference map from this point.

Figure 4: Selection of the reference map. The chart on the left shows the time distances between each map and the average map obtained by multiple alignments. The chart on the right summarizes the integration of each curve in the chart on the left. The map closest to the average map is selected as the reference map.

Two algorithms have been implemented to make this selection.

Exhaustive algorithm

This algorithm considers every possible couple between maps:

For each map, compute the distance in time to all the other maps (Sum of the distances in seconds)

The reference map is the one with the lowest distance

Iterative algorithm

Randomly select a reference map

Align this map with all the other maps

Compute the distance in time to all the other maps

The new reference map is the one with the lowest distance

Steps 2 to 4 are repeated unless

the reference map remains the same for two consecutive iteration

the maximum number of iteration is reached (default value is 3)

Alignment smoothing

The last thing to do is to find the path going through the regions with the highest density of points in the scatter plot. This step was implemented using a moving median smoothing (cf. figure 5).

Figure 5: Alignment smoothing of two maps using a moving median calculation. The scatter plot represents the time variation (in seconds) of multiple landmarks (between the compared map and the reference map) against the observed time (in seconds) in the reference map. A user-defined window is moved along the plot, computing on each step a median time difference (left plot). The smoothed alignment curve is constituted of all the median values (right plot).

Creation of the master map

Once the maps have been corrected and aligned, the final step consists in creating a consensus map or master map. It is produced by searching the best match for each feature detected on different maps. The master map can be seen as a representation of all the features detected on the maps, without redundancy. (cf. figure 6).

Figure 6: Creation of the master map by matching the features detected on two LC-MS maps. The elution times used here are the ones corrected by the alignment step. The intensity of a feature can vary from one map to another, it can also happen that a feature appears in only one map.

During the creation of the master map, the algorithm first considers matches for the most intense features (higher than a given threshold), and then consider the other features only if they match a feature with a high intensity in another map. This is done in order to avoid to include background noise to the master map (cf. figure 7).

Figure 7: Distribution of the intensities of the maps considered to build the master map. The construction is done in 3 steps: 1) removing features with a normalized intensity lower than a given threshold 2) matching the most intense features 3) features without matches in at least one map are compared again with the low intensity features, put aside in first step.

“Predicted time extractor” algorithm

This algorithm is used for cross-assignment, when a peptidic signal is detected in a file but does not have an equivalent signal in another one (frequently in DDA). In this case, the algorithm tries to extract some signal from the file where the signal has not been found. The aim of this algorithm is to reduce the number of missing values.

Extracting a 4-minutes XIC (user-defined value) around:

the time predicted by the alignment

the ratio m/z of the isotope with the highest intensity predicted by the averagine (which is estimated from the mean value of the m/z of the observed signals in other conditions)

The isotopic patterns are extracted for each spectrum. Many peptide signals can be detected and need to be filtered in order to find the best candidate.

To do so, we verify beforehand that :

The chromatographic elution peaks of the monoisotopic mass are really corresponding to monoisotopic masses: i.e., if no elution peak P is present before the considered monoisotopic mass M that has a difference of mass equal to 1.0027/z (z being the charge of M), having a distance apex-to-apex (P vs. M) lower than a user-defined threshold of number of cycles (default value is 5), a Pearson correlation higher than a user-defined threshold (default value is 0.7) and finally a P/M area ratio agreeing with the predicted value for P using averagine.

If needed a filter of the duration of a peptide signal (which is usually peptide-specific).
Considering the signals close to each other in time (elution time at the apex vs. predicted time).
Considering the signals close to each other in m/z ratio.

Solving conflicts

It has been seen that ambiguous features with close m/z and retention times can be grouped into clusters. Other conflicts are also generated during the creation of the master map due to wrong matches. Adding the peptide sequence is the key to solve these conflicts by identifying without ambiguity a feature. Proline has access to the list of all identified and validated PSMs as well as the identifier (id) of each MS/MS spectrum related to an identification. This means that the link between the scan id and the peptide id is known. On the other hand the list of MS/MS events simultaneous to the elution window of each feature is known. For each of these events the corresponding peptide sequences can be retrieved. If only one peptide sequence is found for the master feature, it is be kept as it is. Otherwise the master feature is cloned in order to have one feature per peptide sequence. During this duplication step the daughter features are distributed on the new master features according to the identified peptide sequences.

Cross assignment

When the master map is created some intensity values could be missing. Proline reads the mzDB files to reduce the number of missing values, using the expected coordinates (m/z – RT) for each missing feature to extract new features. These new extractions are added to copies of the daughters and the master maps. This gives a new master map with a limited number of missing values.

Normalizing LC-MS maps

The comparison of LC-MS maps is confronted to another problem which is the variability of the MS signals measured by the instrument. This variability can be technical or biological. Technical variations between MS signals in two analyses can depend on the injected quantity of material, the reproducibility of the instrument configuration and also the software used for the signal processing. The observed systematic biases on the intensity measurements between two successive and similar analysis are mainly due to errors in the total amount of injected material in each case, or the LC-MS system instabilities that can cause variable performances during a series of analysis and thus a different response in MS signal for peptides having the same abundance. Data may not be used if the difference is too important. It is always recommended to do a quality control of the acquisition before considering any computational analysis. However, there are always biases in any analytic measurement but they can usually be fixed by normalizing the signals. Numerous normalization methods have been developed, each of them using a different mathematical approach (Christin, Bischoff et al. 2011). Methods are usually split in two categories, linear and non-linear calculation methods, and it has been demonstrated that linear methods can fix most of the biases (Callister, Barry et al. 2006). Three different linear methods have been implemented in Proline by calculating normalization factors as the ratio of the sum of the intensities, as the ratio of the median of the intensities, or as the ratio of the median of the intensities.

Sum of the intensities

How to calculate this factor:

For each map, sum the intensities of the features

The reference map is the median map

The normalization factor of a map = sum of the intensities of the reference map / sum of the intensities of the map

Median of the intensities

How to calculate this factor:

For each map, calculate the median of the intensities in the map

The reference map is the median map

The normalization factor of a map = median of the intensities of the reference map / median of the intensities of the map

Median of ratios

This last strategy has been published in 2006 (Dieterle, Ross et al. 2006) and gives the best results. It consists in calculating the intensity ratios between two maps to be compared then set the normalization factor as the inverse value of the median of these ratios (cf. figure 8). The procedure is the following:

For each map in a “map set”, sum the intensities of the features

The reference map is the median map

For each feature of the master map, ratio = intensity of the feature in the reference map / intensity of the feature for this map

Normalization factor = median of these ratios

Figure 8 : Distribution of the ratios transformed in log2 and calculated with the intensities of features observed in two LC-MS maps. The red line representing the median is slightly off-centered. The normalization factor is equal to the inverse of this median value. The normalization process will refocus the ratio distribution on 0 which is represented by the black arrow.

Proline makes this normalization process for each match with the reference map and has a normalization factor for each map, independently of the choice of the algorithm. The normalization factor for the reference map is equal to 1.

Building a "QuantResultSummary"

Once the master map is normalized, it is stored in the Proline LCMS database and used to create a “QuantResultSummary”. This object links the quantitative data to the identification summaries validated in Proline. This “QuantResultSummary” is then stored in the Proline MSI database (cf. figure 9).

Figure 9: From raw files to the « QuantResultSummary » object.

Quantitation: configuration

The first quantitation step as well as the advanced quantitation (see Quantitation: principles) has some parameters that could be modified by the user.

Label-free LC-MS quantitation configuration

Post-processing of LC-MS quantitative results

Label-free LC-MS quantitation configuration

Here is the description of the parameters that could be modified by the user.

Feature extraction Strategy

Defines the algorithms and methods to use for signal extraction and deisotoping.

Start Extraction from XIC from :

MS/MS Events: supervised strategy where each feature extraction is targeted for each acquired MS/MS spectrum.

Validated Peptides: same strategy but with a filtering of MS/MS events based on the list of validated peptides.

Raw MS signal analysis: unsupervised strategy which tries to detect LC-MS features using a signal recognition algorithm

Deisotoping mode:

Unsupervised: an algorithm combining time correlated isotopes elution peaks. In the final step, this algorithm checks the matching between experimental and theoretical isotopes ratios.

Identification based: the charge state of each PSM (Peptide Spectrum Match) is used to combine isotopes signals into an LC-MS feature.

Extraction parameters

These parameters are used by signal extraction algorithms .

Extraction m/z Tolerance: In supervised algorithms this corresponds to the error tolerance between the precursor ion m/z and peaks extracted in the mzDB file.
In unsupervised algorithms this corresponds to the error tolerance between each peak apex and other extracted peaks.

Clustering parameters

Clustering must be applied to the imported LC-MS maps to group features that are close in time and m/z. This step reduces ambiguities and errors that could occur during the feature mapping phase.

Time tolerance: features that are close in time are grouped. If delta time between two features is lower than time tolerance, features are grouped.

m/z tolerance: features that are close in m/z are grouped. If delta m/z between two features is lower than m/z tolerance, features are grouped.

m/z tolerance unit: m/z tolerance can be provided in PPM or Dalton.

Cluster time computation: you have the choice between 2 computation methods: most intense or median. For most intense method, the cluster time corresponds to the time of the most intense feature composing the cluster. For median method, cluster time is the median of the feature times forming the cluster.

Cluster intensity computation: you have the choice between 2 computation methods: most intense or sum. For most intense method, the cluster intensity corresponds to the intensity of the most intense feature of features forming the cluster. For sum method, cluster intensity is the sum of the intensities of features composing the cluster.

Alignment Computation

This is an important step in the LCMS process. It consists in aligning maps of the map set to correct the RT values. RT shifts of shared features between the compared maps follow a curve reflecting the fluctuations of the LC separation. The time deviation trend is obtained by computing a moving median using a smoothing algorithm. This trend is then used as a model of the alignment of the compared LC-MS maps. This model provides a basis for the correction of RT values.

Method : You have the choice between 2 alignment methods

Comprehensive: the comprehensive algorithm computes the distance between maps for each possible couple of maps and selects the map with the lowest sum of distances to be the reference map.

Iterative: for the iterative algorithm, first a reference map is chosen randomly, then each other map is aligned against the reference and the algorithm computes the distance for each couple of maps. The map that has the smallest distance becomes the reference map. The 2 previous steps are re-iterated until either reference map stays the same between two iterations or the maximum number of iterations is reached.

Then all other maps are aligned to this computed reference map and their retention times are corrected.

Maximum number of iterations: this option is available only for iterative method. This is a stop condition of the iterative algorithm, when the algorithm has reached its maximum number of iterations, it stops.

m/z tolerance: m/z window used to match features between two compared maps.

m/z tolerance unit: m/z tolerance can be provided in PPM or Dalton.

Time tolerance in seconds: time window used to match features between two compared maps.

Alignment smoothing

When alignment is done, a trend can be extracted with a smoothing method permitting the correction of the aligned map retention time.

Smoothing method: you have the choice between two smoothing methods, time window or landmark range.

Number of landmarks/time interval: if selected smoothing method is landmark range, time of aligned map is corrected using median computed on windows containing a specified number of landmarks. The run is divided in windows of size the specified number of landmarks. You have to provide the number of landmarks by window. The smoothing method is applied considering the number of landmarks present in the window, and computes the median point for this window.

If selected smoothing method is set to time window, time of aligned map is corrected using median in a time window. You have to provide the time interval. This time interval corresponds to the window size in which time median will be computed.

Minimum number of landmarks in window: this option is only available for time window smoothing method. This allows you to specify the minimum number of landmarks a window must contain to compute a median on it, it is not significant to compute a median on less landmarks.

Sliding window overlap: overlap is used to compute the step to move the smoothing window forward to calculate a smoothing point for this new smoothing window. Overlap gives the percentage of overlapping between two consecutive windows. For example, if window size is 200 (seconds or landmarks depending on which smoothing method is selected) and overlap is 20%, the step forward = 200*((100-20)/100) = 160 seconds or landmarks, i.e. the smoothing window is moved forward by a step of 160, so two successive windows overlaps each other by a step of 40 seconds or landmarks corresponding to 20% of 200.

Master map creation

This step consists in creating the “master map” (also called consensus map), this map resulting from the superimposition of all compared maps.

m/z tolerance: when mapping features from two different maps of the map set, delta m/z between features must be lower than the m/z tolerance to be considered as the same feature seen on two different maps.

m/z tolerance unit: the m/z tolerance unit can be provided in PPM or Dalton.

Time tolerance (seconds): when mapping features from two different maps of the map set, delta time between features must be lower than the time tolerance to be considered as the same feature seen on two different maps.

Normalization method: sometimes the ratio distribution is not centred around zero as we could have expected if data were exactly reproducible. Intensity normalization (by applying a mathematical transformation) is thus needed to reduce the impact of experimental artefacts and ensure accurate quantification. Three methods are available:

Median ratio normalization method algorithm: first, compute sum of feature intensities for each map of the map set and sort maps by computed intensities. The map ranking nearest from the median is taken as the reference map. Then for each master map feature, compute ratio as reference map feature intensity / feature intensity for the considered map. The normalization factor corresponds to the median of the computed ratios.

Median normalization method algorithm: first compute median intensity for each map, set the reference map to median map, normalization factor for map M = reference map median intensity / map M median intensity.

Sum normalization method algorithm: first, compute feature intensities sums for each map, set the reference map to the median map, normalization factor for map M = intensities sum of reference map / intensities sum of map M.

Master feature filter type: a filter can be applied to the map features to keep the best features (above threshold) to build the master map.

Two methods are available to filter features: the filter can be applied directly on intensity values (Intensity method) or it can be a proportion of the map median intensity (Relative intensity method).

Relative intensity threshold/Intensity threshold: this provides the threshold for one or the other filtering method, depending on which method you have selected. Only features above this threshold are considered for the master map building process.

Relative intensity/Intensity method: this option depends on which filtering method you select.

If you choose Relative intensity for master feature filter type, the only possibility you have is percent, so features which intensities are beyond the relative intensity threshold in percentage of the median intensity are removed. If you choose Intensity for master feature filter type, you also have only one possibility at the moment of the intensity method: basic. Features which intensities are beyond the intensity threshold are removed and not considered for the master map building process.

Post-processing of LC-MS quantitative results

This procedure is used to compute ratios of peptide and protein abundances. Several filters can also be set to increase the quality of quantitative results.

Here is the description of the parameters that could be modified by the user.

Peptide filters

Use only specific peptides: if checked, peptides shared between different protein sets are discarded from the statistical analysis.

Discard missed cleaved peptides: if checked, peptides containing missed cleavages are discarded from the statistical analysis. It has to be noted that perfect tryptic peptides whose sequence is included in an observed missed cleaved peptide are also discarded if this option is enabled.

Discard oxidized peptides: if checked, peptides containing the Oxidation (M) modification are discarded from the statistical analysis. It has to be noted that non-modified peptides whose sequence is the same than an observed oxidized peptide are also discarded if this option is enabled.

Peptide and protein common parameters

Normalization: the normalization factors are computed as the median of the ratios distributions between each run and a run of reference. A similar procedure is used for the normalization of LC-MS features.

Aggregation of peptides in proteins

Peptide abundances can be summarized into protein abundances using several mathematical methods:

sum: for each quantitative channel (raw file) the sum of observed peptides abundances is computed

mean: for each quantitative channel (raw file) the mean of observed peptides abundances is computed

mean of TOP3 peptides: same procedure but applied on the 3 most abundant peptides. Peptides are sorted by descending median abundances (computed across all compared samples for peptide). Then the 3 first peptides are kept.

median: for each quantitative channel, the median of observed peptides abundances is computed

median profile: a matrix of peptide abundance ratios is first computed (rows correspond to peptides and columns to quantitative channels). The median of these ratios is then computed for each column. The relative values are then converted back into absolute values using a scaling factor. This factor is computed as the maximum value from the means of TOP3 peptides abundances.

normalized median profile: matrix of peptide abundance ratios is first computed (rows correspond to peptides and columns to quantitative channels). This matrix is then normalized and then summarized using the median method described above. The obtained median abundances are then adjusted by using a scaling factor. This factor is computed as the maximum value from the means of TOP3 peptides abundances.

Identification Summary Export

When exporting a whole Identification Summary in an excel file, the following sheets may be generated:

Search settings and info : Contains information on project and search settings parameters

Import and filters : Summary of used parameters during import, filtering and validation process

Protein sets : List of all Protein Set, valid or invalidated during Protein Sets Filtering. Some columns description :

#sequences (#specific sequences ) : number of different peptide sequences identifying the Protein Set (specific : which does not identify any other valid Protein Set )

#peptides (#peptides) : number of different peptide (sequence + PTM) identifying the Protein Set (specific : which does not identify any other valid Protein Set )

#peptide_matches (#specific_peptide_matches) : number of different peptide spectrum matches identifying the Protein Set (specific : which does not identify any other valid Protein Set )

Protein matches in protein set : list of Protein Matches in each Protein Set. A same Protein Match could thus appears few times if it belongs to different Protein Sets. (same column as protein set)
Best PSM from protein sets : List of best peptide spectrum matches (a single PSM per peptide is listed) for each Protein Set. Some columns description :

#psm_prot_sets : number of Valid Protein Set identified by this PSM.

#psm_prot_matches : number of Protein Match, which belong to at least 1 valid Protein Set, identified by this PSM.

#psm_db_prot_matches : number of Protein Match, validated or not, identified by this PSM. This is equivalent to the number of protein in fasta files containing the PSM.

All PSMs from protein sets : List of all peptide spectrum matches for each Protein Set. (same column as best PSM from protein sets)

Dataset statistics : Some statistic values for the exported Identification Summary : number of Protein Set, modified peptides …

How to

Note: Read the Concepts & Principles documentation to understand main concepts and algorithms used in Proline.

Proline Studio

List of Abbreviations

Creation/Deletion

Open a session and access to my projects

Create a new project

Create a Dataset

Import a Search Result

Delete Data and clean project

Connection Management

Proline Web

Workflow

Open a session and access to my projects

Create a new project

Import Result Files

Create an Identification Dataset

Validate a Search Result

Create a Quantitation

Delete Datasets

Users management

Create a User

Display

Display Identification Summary information

Display proteins sets in Identification Summary

Display peptides and/or PSM in identification result

Display Quantitation results

Save, import and export

Export data

Import Search Result

Algorithm and other operation

Validate a Search Result

Proline Studio

List of Abbreviations

Calc. Mass: Calculated Mass

Delta MoZ: Delta Mass to Charge Ratio

Exp. MoZ: Experimental Mass to Charge Ratio

Ion Parent Int.: Ion Parent Intensity

Missed Cl.: Missed Cleavage

Modification D. Mass: Modification Delta Mass

Modification Loc.: Modification Location

Next AA: Next Amino-Acid

Prev. AA: Previous Amino-Acid

Protein Loc.: Protein Location of the Modification

Protein S. Matches: Protein Set Matches

PSM: Peptide Spectrum Match

PTM: Post Translational Modification

PTM D. Mass: PTM Delta Mass

RT: Retention Time

Server Connection

When you start Proline Studio for the first time, the Server Connection Dialog is automatically displayed.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\login_dialog.png$

You must fill the following fields:

- Server Host: this information must be asked to your IT Administrator. It corresponds to the Proline server name

- User: your username (an account must have been previously created by the IT Administrator).

- Password: password corresponding to your account (username).

If the field “Remember Password” is checked, the password is saved for future use. Server connection dialog continues to open with Proline Studio, the user though does not need to fill in his password, unless the last one is changed after his last login.

Create a New Project

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\addprojectpopupprolinev3.png$ $D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\addprojectdialogprolinev3.png$

To create a Project, click on “+“ button at the right of the Project Combobox. The Add Project Dialog opens.

Fill the following fields:

- Name: name of your project

- Description: description of your project

You can specify other people to share this new project with them. Then click on OK Button

Creation of a Project can takes a few seconds. During its creation, the Project is displayed grayed with a small hourglass over it.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newprojectcreatedprolinev3.png$

Create a Dataset

In the Identification tree, you can create a Dataset to group your data

To create a Dataset:

- right click on Identifications or on a Dataset to display the popup.

- click on the menu “Add Dataset…”

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\adddatasetpopup.png$

On the dialog opened:

- fill the name of the Dataset

- choose the type of the Dataset

- optional: click on “Create Multiple Datasets” and select the number of datasets you want to create

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\addaggregatewindowprolinev3.png$

Let's see the result of the creation of 3 datasets named “Replicate”:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\addaggregateresultprolinev4.png$

Create a Folder

In both Identification and Quantitation tree, you can create Folders to organize your data

To create a Folder :

- right click on Identifications, Quantitations or on a Folder to display the popup.

- click on the menu “Add Identification Folder…” or “Add Quantitation Folder…”

Import a Search Result

There are two possibilities to import Search Results:

- import multiple Search Results in “All Imported” and put them later in different datasets.

- import directly a Search Result in a dataset.

Import in "All Imported"

To import in “All Imported”:

- right click on “All Imported” to show the popup

- click on the menu “Import Search Result…”

In the Import Search Results Dialog:

- select the file(s) you want to import thanks to the file button (the Parser will be automatically selected according to the type of file selected)

- select the different parameters (see description below)

- click on OK button

Note 1: You can only browse the files accessible from the server according to the configuration done by your IT Administrator. Ask him if your files are not reachable. (Look for Setting up Mount-points paragraph in Installation & Setup page)

Note 2: Proline is able to import OMSSA files compressed with BZip2.

Parameters description:

Software Engine: the software which generated your Result File (this parameter will be automatically set when files are selected)

Instrument: mass-spectrometer used for sample analysis

Peaklist Software: the software used for the peaklist creation

Decoy Strategy: The type of decoy search which was performed.

“No Decoy”: if the search was performed against a target database only.

“Concatenated Decoy”: if target and decoy sequences were merged into a single database.

“Software Engine Decoy”: if the decoy sequences were generated on-the-fly by your search engine.

Decoy Accession Regex: for concatenated searches only. Select the rule to apply for the discrimination of target and decoy protein matches.

Parser Parameters: according to your Software Engine, some extra-parameters are displayed:

Mascot:

Subset Threshold: the percentage of score between a given protein match and the master protein match (superset). Protein matches with a relative score lower than

Master_protein_score * (1-subset threshold)

won't be imported.

Omssa:

Usermods file path: an XML file containing the definitions for each user defined PTM used in the OMSSA search.

PTM Composition file path: a text file containing the chemical composition for each user defined PTM. This is required for PTMs not already imported in another Search Result. The format is the following: PTM name=<PTM composition> (one per line). Example:

Acetyl peptide N-term=H(-6) C(-7) O(-1)

Importing a Search Result can take some time. While the import is not finished, the “All Imported” is shown grayed with an hourglass and you can follow the imports in the Tasks Log Window (Menu Window > Tasks Log to show it).

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\importidentificationlogv3.png$

To show all the Search Results imported, double click on “All Imported”, or right click to popup the contextual menu and select “Display List”

From the All Imported window, you can drag and drop one or multiple Search Result to an existing dataset.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\draganddropsearchresultprolinev3.png$

Import directly in a Dataset

It is possible to import a Search Result directly in a Dataset. In this case, the Search Result is available in “All Imported” too.

To import a Search Result in a Dataset, right click on a dataset and then click on “Import Search Result…” menu.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\importsearchresultpopup.png$

Delete Data

You can delete Search Results, Identification Summaries and Datasets in the data tree. You can also delete XIC or Spectral Counts in the quantitation tree.

Delete the Datasets (identification or quantitation…) from the tree view (Search Result always accessible from “All Imported” view…).

There are two ways to delete data: use the contextual popup or drag and drop data to the Trash.

Delete Data from the contextual popup

Select the data you want to delete, right-click to open the contextual menu and click on delete menu.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\deletedatasetpopup.png$

The selected data is put in the Trash. So it is possible to restore it while the Trash has not been emptied.

Delete Data by Drag and Drop

Select the data you want to delete and drag it to the Trash. It is possible to restore data while the Trash has not been emptied

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\deletedatasetprolinedndprolinev3.png$

Empty the Trash

To empty the Trash, you have to Right click on it and select the “Empty Trash” menu.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\emptytrashprolinev3.png$

A confirmation dialog is displayed and if accepted Dataset will be removed from the Trash.

Search Results are not completely removed, you can retrieve them from the “All Imported” window.

Delete a Project

It is not possible to delete a Project by yourself. If you need to do it, ask to your IT Administrator.

Connection Management

Once user is connected (see Server Connection), it is possible to:

Reconnect with a different login

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\reconnect_dialog.png$

Change password

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\changepassword.png$

Display MS Queries, Peptides/PSM or Proteins of a Search Result

Functionality Access

To display data of a Search Result:

- right click on a Search Result

- click on the menu “Display Search Result >” and on the sub-menu “MSQueries” or “PSM” or “Proteins”

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displaysearchresultpopup.png$

MSQueries Window

If you click on MSQueries sub-menu, you obtain this window:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\msquerysearchresult.png$

Upper View: list of MSQueries.

Bottom Window: list of all Peptides linked to the current selected MSQuery.

Note: Abbreviations used are listed here

Peptides/PSM Window

If you click on PSM sub-menu, you obtain this window:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displaysearchresultpsmdialogv3.png$

Upper View: list of all PSM/Peptides.

Middle View: Spectrum, Spectrum Error and Fragmentation Table of the selected PSM. If no annotation is displayed, you can generate Spectrum Matches by clicking on the according button

Bottom Window: list of all Proteins containing the currently selected Peptide.

Note: Abbreviations used are listed here

Proteins Window

If you click on Proteins sub-menu, you obtain this window:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displaysearchresultproteindialogv3.png$

Upper View: list of all Proteins

Bottom View: list of all Peptides for the selected Protein.

Note: Abbreviations used are listed here

Display MS Queries, PSM, Peptides, Protein Sets, PTM Protein Sites or Adjacency Matrices of an Identification Summary

Functionality Access

To display data of an Identification Summary:

- right click on an Identification Summary

- click on the menu “Display Identification Summary >” and on the sub-menu “MSQueries”, “PSM”, “Peptides”, “Protein Sets”, “PTM Protein Sites” or “Adjacency Matrix”

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\popupdisplayidentification.png$

MSQueries Window

If you click on MSQueries sub-menu, you obtain this window:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\msqueryrsm.png$

Upper View: list of MSQueries.

Bottom Window: list of all Peptides linked to the current selected MSQuery.

Note: Abbreviations used are listed here

PSM Window

If you click on PSM sub-menu, you obtain this window:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayidspsmdialogv3.png$

Note: Abbreviations used are listed here

Peptides Window

If you click on Peptides sub-menu, you obtain this window:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayidspetidesdialogv3.png$

Upper View: list of all Peptides

Middle View: list of all Protein Sets containing the selected peptide.

Bottom Left View: list of all Proteins of the selected Protein Set

Bottom Right View: list of all Peptides of the selected Protein

Note: Abbreviations used are listed here

Protein Sets Window

If you click on Protein Sets sub-menu, you obtain this window:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayidsproteinsetspopupv3.png$

View 1 (at the top): list of all Protein Sets

Note: In the column Proteins, 8 (2, 6) means that there are 8 proteins in the protein set : 2 in the sameset, 6 in the subset.

View 2: list of all Proteins of the selected Protein Set.

View 3: list of all Peptides of the selected Protein

View 4: Protein Sequence of the previously selected Protein and Spectrum of the selected Peptide. Other tabs display Spectrum, Spectrum Error and Fragmentation Table.

Note: Abbreviations used are listed here

PTM Protein Sites Window

If you click on PTM Protein Sites sub-menu, you obtain this window:

WARNING: This windows will be filled in only if you have first run “Identify PTM Sites” from the dataset menu.

View 1 (at the top): This view lists all PTM Protein Sites (PTM on Protein at specific location) by displaying best PSM for current site.

View 2: For selected PTM Protein Site, list all peptides ( which matches. Best PSM for each peptide is displayed.

View 3: For selected peptide in View 2, display all associated PSMs.

View 4: For selected PSM in View 3, display all PSMs of same MSQuery.

The user can filter on PTM Modification (by choosing Modification in filters list).

In the following example, the user keeps only Oxidation on residue M. I it possible to specify no residue to accept all, or to specify a list of residues.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\ptmmodificationfilter.png$

View 4: Protein Sequence, Spectrum, Spectrum Error, Fragmentation Table and Spectrum Values

Adjacency Matrix Window

If you click on Adjacency Matrix sub-menu, you obtain this window:

View 1: All the matrices. Each matrix correspond of a cluster composed of linked Proteins/Peptides.

Note: use the Search tool to display an Adjacency Matrix for a particular Protein or Peptide

View 2: The currently selected matrix.

In the example, you can see two different protein sets which share only two peptides.

Thanks to the settings you can hide proteins with exactly the same peptides.

Display Additional Information on Search Result/Identification Summary

Functionality Access $D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\propertiespopup.png$

To display properties of a Search Result or Identification Summary:

right click on a Search Result/Identification Summary
click on the menu “Properties”

Note: it is possible to select multiple Search Results/Identification Summaries to compare the values.

Properties Window

Property window opened:

General Information: Various information on the analysis (instrument name, peaklist software…)

Search Properties: Information extracted from the Result File (date, software version, search settings...)

Search Result Information: Amount of Queries, PSM and Proteins in the Search Result.

Identification Summary Information: Information obtained after validation process

Note 1: Validation parameters are tagged with “validation_properties / params”

Note 1: Validation results are tagged with “validation_properties / results”

Sql Ids: Database ids related to this item

Property window opened with multiple Identification summaries selected:

The color of the type column indicate if the values are the same (white) or different (yellow).

Display a Spectral Count

You can display a generated Spectral Count by using the right mouse popup.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayscpopup.png$

To have more details about the results, see spectral_count_result

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\sc.png$

The overview is based by default on the weighted spectral count values. (Note: if you sort on the overview column, the sort is based on max (value-mean (values))/mean (values). So, you will obtain the most homogenous and confident rows first)

For each dataset, are displayed:

- status ( typical, sameset, / )

- peptide numbers

- the basic spectral count

- the specific spectral count

- the weighted spectral count (by default)

- the selection level

By clicking on the “Column Display Button” $D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\columnsvisibility11x11.png$ , you can choose the information you want to display or change the overview.

Display a XIC

To display a XIC, right click on the selected XIC node in the Quantitation tree, and select “Display Abundances”, and then the level you want to display:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayxicpopup.png$

Display Protein Sets

Protein Sets

By clicking on “Display Abundances” / “Protein Sets”, you can see all quantified protein sets. For each quantified protein set, you can see below all peptides linked to the selected protein set and peptides Ions linked to the selected peptide. For each peptide Ion, you can see the different features and the graph of the peakels in each quantitation channel.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_proteinset.png$

The overview is based by default on the abundances values.

Note: if you sort on the overview column, the sort is based on max (value-mean (values))/mean (values). So, you obtain the most homogenous and confident rows first.

For each quantitation channel, are displayed:

- the raw abundance

- the peptide match count (by default)

- the abundance (by default)

- the selection level

To display the identification protein Set view, click right on the selected protein Set and select “Display Identification Protein Sets” menu in the popup.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_proteinset_ident.png$

You can also display the identification summary result from the popup menu in the quantitation tree: $D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_popup_ident.png$

Peptides

A graph allows you to see the variations of the abundance (or raw abundance) of a peptide in the different quantitation channels: $D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_peptide_graph.png$

Features and Peakels

You can see the different features in the different quantitation channels and the graph of the peakels: $D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_features.png$

By clicking on $D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\chart-arrow.png$ you can display either:

- the peaks of isotope 0 in all quantitation channels

- all isotopes for the selected quantitation channel:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_isotopes.png$

By clicking on $D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\wave.png$ you can see the chromatograms of the features and their first time scan and last time scan in mzScope. For more details see the mzScope section.

Display Peptides

By clicking on “Display Abundances“ / “Peptides”, you can see:

- identified and quantified Peptides

- non identified but quantified peptides

- identified but not quantified peptides (linked to a quantified protein)

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_peptides.png$

Display Peptides Ions

By clicking on “Display Abundances” / “Peptides Ions”, you can see:

- all identified and quantified Peptides Ions

- non identified but quantified peptides Ions

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_peptideions.png$

Display Experimental Design and Parameters

By clicking on “Exp. Design > Parameters”, you can see the experimental design and the parameters of the selected XIC.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayxicdesign.png$

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayxicparam.png$

If you have launched the refinement of the protein sets abundances on the XIC, you can also display the refinement parameters.

Display Map Alignment

By clicking on “Exp. Design > Map Alignment”, you can see the map of the variation of the alignment of the maps compared to the map alignment of the selected XIC. You can also calculate the predicted time in a map from an elution time in another map.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displaymapalignment.png$

Create a custom User Window

You can lay out your own user window with the desired views.

You can do it from an already displayed window, or by using the right click mouse popup on a dataset like in the following example (Use menu “Search Result>New User Window…” or “Identification Summary>New User Window…”)

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newuserwindowpopup.png$

In the example, the user has clicked on “Identification Summary>New User Window…” and selects the Peptides View as the first view of his window.

You can add other views by using the '+' button.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newuserwindowpeptides.png$

In this example, the user has added a Spectrum View and he saves his window by clicking on the “Disk” Button.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newuserwindowpeptidesandspectrum.png$

The user selects 'Peptides Spectrum' as his user window name

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newuserwindowsave.png$

Now, the user can use his new 'Peptides Spectrum' on a different Identification Summary.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newuserwindowpopupwithsavedwindow.png$

Frame Toolbars Functionalities

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\viewtoolbarsprolinev3.png$

A: Display Decoy Data.

B: Search in the Table. (Using * and ? wild cards)

C: Filter data displayed in the Table

D: Export data displayed in the Table

E: Send to Data Analyzer to compare data from different views

F: Create a Graphic : histogram or scatter plot

G: Right click on the marker bar to display Line Numbers or add Annotations/Bookmarks

H: Expands the frame to its maximum (other frames are hidden). Click again to undo.

I: Gather the frame with the previous one as a tab.

J: Split the last tab as a frame underneath

K: Remove the last Tab or Frame

L: Open a dialog to let the user add a View (as a Frame, a Tab or a splitted Frame)

M: Save the window as a user window, to display the same window with different data later

N: Export view as an image

O: Generate Spectrum Matches

Filter Tables

You can filter data displayed in the different tables thanks to the filter button at the top right corner of a table.

When you have clicked on the filter button, a dialog is opened. In this dialog you can select the columns of the table you want to filter thanks to the “+” button.

In the following example, we have added two filters:

- one on the Protein Name column (available wildcards are * to replace multiple characters and ? to replace one character)

- one on the Score Column (Score must be at least 100 and there is no maximum specified).

The result is all the proteins starting with GLPK (correspond to GLPK*) and with a score greater or equal than 100.

Note: for String filters, you can use the following wildcards: * matches zero or more characters, ? matches one character.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\filterresultprolinev2.png$

Search Tables

In some tables, a Search Functionality is available thanks to the search button at the top right corner.

When you have clicked on the search button, a floating panel is opened. In this panel you can select the column searched and fill in the searched expression, or the value range.

For searched expressions, two wild cards are available:

'*' : can replace all characters

'?' : can replace one character

In the following example, the user search for a protein set whose name contains “PGK”.

You can do an incremental search by clicking again on the search button of the floating panel, or by pressing the Enter key.

Graphics

Create a Graphic

There are two ways to obtain a graphic from data:

In the windows with PSM of a Search Result or of an Identification Summary, you can ask for the display of a histogram in a new window to check the quality of your identification.

In any window, you can click on the '+' button to add a graphic (Scatter Plot or Histogram) as a view in the same window

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\histogrambutton.png$

If you have clicked on the '+' button, the Add View Dialog is opened and you must select the Graphic View

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\addaview.png$

Graphic options

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\histogram.png$

A: Display/Remove Grid toggle button

B: Modify color of the graphic

C: Lock/Unlock incoming data. If it is unlocked, the graphic is updated when the user apply a new filter to the previous view (for instance Peptide Score >= 50) If it is locked, changing filtering on the previous view does not modify the graphic.

D: Select Data in the graphic according to data selected in table in the previous view.

E: Select data in the table of the previous view according to data selected in the graphic.

F: Export graphic to image

G: Select the graphic type: Scatter Plot / Histogram

H/I: Select data used for X / Y axis.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\logaxis.png$

It is possible to select linear or log axis by right clicking on an axis.

Zooming / Selection

Zoom in: Press the right mouse button and drag to the right bottom direction. A red box is displayed. Release the mouse button when you have selected the area to zoom in.

Zoom out: Press the right mouse button and drag to the left top direction. When you release the mouse button, the zooming is reset to view all.

Select: Press the left mouse button and drag the mouse to surround the data you want to select. When you release the button, the selection is done. Or left click on the data you want to select. It is possible to use Ctrl key to add to the previous selection.

Unselect: Left click on an empty area to clear the selection.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\graphicselection.png$

Quality Control

Search Result QC

You can run a Quality Control on any Search Result. It consists in a transversal view of the imported data: rather than visualising the results per PSM or Proteins, results are sorted according to the score, charge state...

Choose the menu option:

Settings

Configure some settings before launching the process

Score windows: you can split your data in different groups based on the score. The default groups are : less than 20, between 20 and 40, between 40 and 60, over 60
Max rank: data can be filtered to get a view focused on the best ranks. Default is to consider only the first rank.

QC results

The report will appear in a matter of seconds (depending of the amount of data to be processed). You will get the following tabs:

Assigned and unassigned spectra: Pie chart presenting the ratio of assigned spectra
Score repartition for Target/Decoy data: Histogram presenting the amount of PSM per group of score, separating target and decoy data
PSM per charge and score: Histogram presenting the amount of PSM per group of score and charge state
Experimental M/z per charge and score: Box plot presenting M/z information for each category of score and charge state
Number of matches per minute of RT and score: histogram presenting the amount of PSM per score and retention time. This view is only calculated when retention time is available.
Each graph is also available in a table view

Ms Files Tab

In order to facilitate different actions on Ms Files, Proline Studio contains an homonym tab providing the end user with a view over his local and a remote file system, called Local File System and Proline Server File System respectively. Furthermore, through an appropriate popup menu, a series of actions can take place on the encountered .mzdb and .raw files, including among others the:

Conversion of a .raw file to an .mzdb file
The upload of an .mzdb file
View .mzdb files
Detect its peakels

Apart from the popup menu supported functionality, since Proline Studio 1.5, conversions and uploads can be triggered via drag and drop mechanism. In this context, an .mzdb file’s drag and drop triggers a file upload while dragging and dropping a .raw file from the local site to the remote one provokes a conversion proceeded by its respective upload.

mzDB File Upload

As mentioned earlier, after selecting a number of files, the user can either drag and drop them inside the remote site, or use the popup menu as shown in the following screenshot. It is important to precise that both approaches are not compatible with a selected group consisting of different file types.

As we can see, clicking on upload opens a dedicated dialog packing a series of uploading options:

The deletion of the file(s) after the successful upload
The creation of the file(s) parent directory in destination
The mounting point at the server

Furthermore, the dialog permits us to change the synthesis of the selected group by adding or removing supplementary .mzdb files. While the last option is significant as it defines the final destination, the first two, the creation of the parent file and the deletion after the successful upload are considered to be an aid to keep the destination folder well organized. Clicking OK on the dialog will make it disappear dispatching at the same time an uploading batch, consisting of an uploading task for each .mzdb file. Uploading tasks, as all tasks that are dispatched within Proline Studio, can be monitored at the Logs tab.

Raw File Conversion

In the same way, when the user desires to convert and upload a raw file, he or she, can either drag and drop it on the Remote Site view, or use the respective dialog through the popup menu.

As we can see, this dialog follows a very similar format and logic compared to the one used by the upload dialog. Apart from the already seen settings that regard the upload itself, we can distinguish a few more, purely regarding the process of conversion. The two most important options are the selected converter and the output path. The latter one corresponds to the directory where the .mzdb file will be created when raw2mzDB.exe will finish its cycle. It must be noted that if a conversion has never taken place before, there is a very high probability that the converter’s option is blank, unless a default converter has been defined through the general settings dialog. Furthermore, this value among others, is saved or updated everytime the user clicks on OK, so that the latter one does not have to fill in all the fields when he or she wants to perform a conversion.

Dragging and dropping a .raw file at the remote view follows a little bit different approach. Since we cannot upload a raw file to the server, the drag and drop action basically correspond to a .raw file conversion followed by the respective upload. Given the fact that when dragging a file no dialog appears, the conversion is based on the default converter set in the general settings window. If such setting has not yet taken place, a popup dialog will appear demanding the selection of a converter to be used. The latter one also updates the aforementioned field in the general settings.

TIC or BPI chromatogram

By default, the TIC chromatogram is displayed. You can click on “BPI” to see the best peak intensity graph.

By clicking in the graph, you can see below the scan at the selected time.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscope.png$

You can choose to display 2 or more chromatograms on the same graph, by selecting 2 files and clicking on “View Data”

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscopemultifile.png$ You can extract a chromatogram at a given mass by entering the specified value in the panel above.

Scan

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscopescan.png$

You can navigate through the scans

- by increasing or decreasing the scan Ids

- by entering a retention time

- by clicking the keys arrows on the keyboard (Ctrl+Arrows to keep the same ms level)

By double clicking on the scan, the corresponding chromatogram is displayed above (The Alt key or the check box “XIC overlay” allows you to overlay the chromatograms in the same graph).

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscopechromato.png$

Peakels

By selecting a file, you can click on “Detect Peakels” in the popup menu.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscope_detectpeakels_menu.png$

A dialog allows you to choose the parameters of the peakels detection: the tolerance and eventually a range of m/z, or a m/z value:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\detectpeakelsparam.png$

The results are displayed in a table:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscopepeakels.png$

You can double-click (or through the popup menu) on a row to display the peakel in the corresponding raw file:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscopepeakelview.png$

Export Data / Image

There are many ways to do an export:

- Export a Table using the export button (supported formats: {xlsx, xls, csv})

- Export data using Copy/Paste from the selected rows of a Table to an application like Excel.

- Export all data corresponding to an Identification Summary, XIC or Spectral Count

- Export an image of a view

- Export Identification Summary data into Pride (ProteomeXchange) format.

- Export Identification Summary spectra list.

1. Export a Table

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\before_export.png$

To export a table, click on the Export Button at the left top of a table.

An Export Dialog is opened, you can select the file path for the export and the format of the export (supported formats: {xlsx, xls, csv}).

In case that the selected format is either .xls or .xlsx, the user has now the ability to maintain in his exported excel document any rich text format elements (color, font weight etc.) apparent on the original table in Proline Studio. Choice is done using the checkbox shown on the following screenshot.

To perform the export, click on the Export Button. The task can take a few seconds if the table has a lot of rows and so a progress bar is displayed.

Note: the following feature regarding the export will be functional in the next release of Proline Studio.

2. Copy/Paste a Table

To copy/Paste a Table:

- Select rows you want to copy

- Press Ctrl and C keys at the same time

- Open your spreadsheet editor and press Ctrl and V keys at the same time to paste the copied rows.

3. Export an Identification Summary, a XIC or a Spectral Count

To Export all data of an Identification Summary, a XIC or a Spectral Count, right-click on the dataset to open the contextual menu and select the “Export” menu and then “Excel...” sub-menu.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\export_rsm_popup.png$

You can also export multiple dataset simultaneously, if they have the same type (Identification Summary or XIC or Spectral Count).

An Export Dialog is opened, you can select the file path and the type of the export : Excel (.xlsx) or Tabulation separated values (.tsv).

You can export with the default parameters or perform a custom export. To enable custom export, click on the tickbox located on the right of the dialog:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\custom_export_a.png$

Custom export allows a number of parameters in addition to the file format to be chosen:

the sheets to be created

the fields to be exported for each sheet

comma ',' or point '.' for decimal output

time stamp format

protein sets ('all' or 'validated only')

export profile ('best' or 'all)

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\custom_export_b.png$

You can enable/disable individual sheets, rename them, rename individual fields and move then up and down, disable them. You can also save your own configuration and load it later on, and even share it with colleagues. (configuration file stored locally).

Description of exported file is available here.

4. Export an Image

To export a graphics, click on the Export Image Button at the left top of the image.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\export_image_button.png$

An Export Dialog is opened, you can select the file path for the export and the type of the export. You can export any images in PNG format or SVG format. SVG format produces a vector image that can be edited and resized afterwards.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\image_export_dialog.png$

5. Export Identification Summary into Pride format

Note: Before exporting data to Pride format, all spectrum matches should have been generated. To do so, right click on the dataset and select “Generate Spectrum Matches”.

To export all data of an Identification Summary into Pride (ProteomeXchange compatible) format, you must right click on the dataset to open the contextual popup and select the “Export > Pride…” Menu.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\export_pride.png$

Some additional information could be specified (some of them are required, those with a *) in the displayed dialogs.

a. Experimental Details: Specify Project and Experiment name and contact

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\exportpride_1.png$

b. Protocol Description: Specify a name for the Protocol Description. Add at least one Step for the description by clicking on “Add Step” button.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\exportpride_protocol1.png$

An ontology search dialog will be opened to help getting protocol step description

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\exportpride_protocol2.png$

c. Sample Description: Specify a sample name and description. Species, Tissue and Cell Type is specified using Controlled Vocabulary. If desired data is not listed, you can click on the “other…” button to search through complete ontology.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\exportpride_3.png$

6. Export Identification Summary spectra list

To export valid PSM Spectra from an Identification Summary or from a XIC Dataset. The exported tsv file is compatible with Peakview.

Note: all Spectrum Matches must be generated first.

Generate Spectrum Matches

When importing a Search Result in Proline, user can view PSM with their associated Spectrum but by default no annotation is defined. User can generate (and save) this information

For a single PSM, select the icon near the Spectrum (see Display Peptide and PSM )

For a whole Search Result, Identification Summary or Quantitation Result :

right click on a Dataset

select “Generate Spectrum Matches”

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\howtogsm.png$

How to validate a Search Result

See description of Validation Algorithm.

It is possible to validate identification Search Result or merged ones. In latest case, the filters and validation threshold can be propagated to child Search Results.

Starting Validation

To validate a Search Result:

- Select one or multiple Search Results to validate

- Right Click to display the popup

- Click on “Validate…” menu

Validation Dialog

In the Validation Dialog, fill the different Parameters (see Validation description):

- you can add multiple PSM Prefilter Parameters ( Rank, Length, Score, e-Value, Identity p-Value, Homology p-Value) by selecting them in the combobox and clicking on Add Button '+'

- you can ensure a FDR on PSM which will be reached according to the variable selected ( Score, e-Value, Identity p-Value, Homology p-Value,… )

- you can add a Protein Set Prefilter on Specific Peptides.

- you can ensure a FDR on protein Sets.

Note: FDR can be used only for Search Results with Decoy Data.

If you choose to propagate to child Search Result, specified prefilters will be used as defined. For the FDR Filter, it is the threshold found by the validation algorithm which will be used for childs, as a prefilter.

In the second tab, you can define rules for choosing the Typical Protein of a Protein Set by using a match string with wildcards ( * or ? ) on Protein Accession or Protein Description. (see Change Typical Protein of Protein Sets).

Validation Processing

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\validationprocessingprolinev2.png$

Validating a Search Result can take some time. While it is not finished, the Search Results are shown greyed with an hourglass over them. The tasks are displayed as running in the “Tasks Log Dialog”.

Validation Done

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\validationdoneprolinev2.png$

When the validation is finished, the icon becomes orange and blue. Orange part correspond to the Identification Summary. Blue is for the Search Result part.

How to filter Protein Sets

See description of Protein Sets Filtering.

:!: The protein sets windows are not updated after filtering Protein Set. You should close and reopen the window :!:

Starting filtering

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\filterprotset.png$

To filter Protein sets of Identification Summaries:

- Select one or multiple Identification Summaries to filter

- Right Click to display the popup

- Click on “Filter ProteinSets…” menu

Filtering Dialog

you can add multiple filters (Specific Peptides, Peptide count, Peptide sequence count, Protein Set Score) by selecting them in the combobox and clicking on Add Button '+'

Once the filtering is done, you will have to open new protein sets window in order to see modification.

Change Typical Protein of Protein Sets

The protein sets windows are not updated after changing Typical Protein. You should close and reopen the window :!:

Open the Dialog

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\changetypicalproteinpopupv2.png$

To change the Typical Protein of the Protein Sets of an Identification Summary:

- Select one or multiple Identification Summaries

- Right Click to display the popup

- Click on “Change Typical Protein…” menu

Dialog Parameters

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\changetypicalproteindialogv2.png$

You can set the choice for the Typical Protein of Protein Sets by using a match string with wildcards (* or ?) on Protein Accession or Protein Description.

For Advanced user, a fully regular expression could be specified. In this case, check the corresponding option.

Three rules could be specified. They are applied in priority order, i.e. if no protein of a protein set satisfy the first rule, the second one is tested and so on.

Processing

The modification of Typical Proteins can take some time. During the processing, Identification Summaries are displayed grayed with an hourglass and the tasks are displayed in the Tasks Log Window

Merge

Merge can be done on Search Results or on Identification Summaries. In both case, you can specify if aggregation mode or union mode should be used. See description for Search Results merging and Identification Summaries merging

Merge on Search Results

To merge a dataset with multiple Search Results:

- Select the parent dataset

- Right Click to display the popup

- Click on “Merge” menu

When the merge is finished, the dataset is displayed with an M in the blue part of the icon, indicating that the Merge has been done at a Search Result level.

Merge on Identification Summaries

If you merge a dataset containing Identification Summaries. The merge is done on an Identification Summary level. Therefore the dataset is displayed with an M in the orange part of the icon.

Data Analyzer

The purpose of the Data Analyzer is to easily do calculations/comparisons on data.

To open the data analyzer, you have two possibilities:

- you can use the dedicated button that you can find in the toolbar of all views. If you use this button, the corresponding data is directly sent to the data analyzer.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\startdatamixer.png$

- you can use the menu “Window > Data Analyzer”

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\menudataanalyzer.png$

In the Data Analyzer view, you can access to all data views, to some functions and graphics. In the following example, we create a graph by adding by Drag & Drop the Spectral Count Data and the beta-binomial (BBinomial) function. Then we link them together.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\datamixerv2.png$

You have to specify the parameters of the BBinomial Function: right click on the function and select the “settings” menu

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\datamixersettingsmenu.png$

In the settings menu, select the two groups of columns on which you want to perform the BBinomial function. When the parameters are set, the calculation is started immediately and an hourglass icon is shown.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\datamixersettingsdialog.png$

When the calculation is finished: the hourglass icon becomes a green tick, and the user can right click and select the “Display” menu to see the result.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\datamixerdisplaymenu.png$

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\datamixerresult.png$

Available Functions

Compute FDR Function

This function is used by ProStar Macro to compute the FDR.

More information: http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_UserManual.pdf

Join Function

Join data of two tables according to selected key.

Diff Function

Perform a difference between two joined table data according to a selected key. When a key value is not found in one of the data source table, the line is displayed as empty. For numerical values a difference is done and for string values, the '<>' symbol is displayed when values are different.

Adjustp Function / calibration Plot

Calibration Plot for Proteomics is described here: https://cran.r-project.org/web/packages/cp4p/index.html

BBinomial Function

bbinomial function is useful for Spectral Count Quantitations

Diff analysis Function

This function is used by ProStar Macro. Two tests are available: Welch t-test and Limma t-test.

More information: http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_UserManual.pdf

Columns Filter Function

Columns filter, let the user remove unnecessary columns in a matrix. A combobox, with prefix and suffix of the columns, let you select multiple similar columns to filter then rapidly.

Rows Filter Function

Rows filter function let the user filter some rows of a matrix according to settings on columns.

Import CSV/TSV

This module lets you import data from a CSV or TSV file. Then you can do calculations and display these data directly in Proline Studio.

The separator is automatically selected according to the csv file. But you can modify it.

The preview zone display the first lines of the file as it will be loaded.

Expression Builder

The expression builder let you create an expression with built-in functions or comparator and variables (columns from the linked matrix). In the example, we calculate the mean of a column in the matrix.S

Missing values filter Function

This function is used by ProStar Macro to remove rows with too many missing quantitative values.

The available missing values algorithm are:

Whole Groups: The lines (across all groups) in the quantitative dataset which contain less non-missing value than a user-defined threshold are deleted.

For every group: The lines for which each condition contain less non-missing value than a user-defined threshold are deleted.

At least one group: The lines for which at least one condition contain less non-missing value than a user-defined threshold are deleted.

Missing values imputation Function

This function is used by ProStar Macro to impute missing values.

More information: http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_UserManual.pdf

Normalize Function

This function is used by ProStar Macro to normalize quantitative values.

More information on algorithms: http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_UserManual.pdf

pvalue and ttd Functions

pvalue and ttd functions are useful for XIC Quantitations

Prostar Macro

1 or 2: Add XIC Data to Data Analyzer from the Protein Set View or by importing data from a csv file.

3: Add Prostar Macro by a drag and drop and link XIC Data to the Macro. And do the calculation the by clicking on the button Process Graph.

During the process, the Data Analyzer will ask you settings for each function.

4: Filter unnecessary columns from your data if. Settings can be validated with no parameters if you don't need it.

5: Filter is needed only if you want to remove contaminants. Settings can be validated with no parameters if you don't need it.

6: Log is needed to log abundances (Data from Proline). For Data coming from MaxQuant, data is already logged.

7 to 13: follow the settings asked ( you can find some help in Prostar documentation, or information in corresponding functions.)

During the process, results will be automatically displayed:

14: FDR Result

15: Calibration Plots

16: Result Table with differential Proteins Table and the corresponding scatter plot. You can select differential proteins in the table, to import them in the scatter plot and create a colored group with them.

If you want to look at other results, right click on a function and select “Display in New Window”

Prostar User Manual:

http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_UserManual.pdf

Prostar Tutorial :

http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_Tutorial.pdf

Calculator

Calculator lets you write python scripts to manipulate freely viewed data.

1) To open the calculator, click on the calculator icon (not available on all views for the moment)

On the left part of the calculator, you can access to all viewed data, double click to add a table or a column to the script.

2) Write your python script on the text area

3) Execute it by clicking on the green Arrow.

4) When the script has been executed, the results of the calculations (variables, new columns) are available in the “Results” tab. Double click on a new column to add it to the table. Or like in the example, directly add the column to the table programmatically.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\calculatorv2.png$

Examples

Script to calculate a log column

#### Algorithm to calculate the logarithm of a column ####

# get the Table 3 which corresponds to table newSC Quanti Protein Set
t = Table.get(3)

# get the constant column 10 of the table t ( Specific SC column)
# mutable() is called to be able to modify data
specificSCCol = t[10].mutable()

# number of rows of the column
nb = len(specificSCCol)

# loop on the data of the column
for i in range (0,nb):
# calculate the log (NaN values for errors)
v = specificSCCol[i]
if v <= 0:
specificSCCol[i] = float('NaN')
else:
specificSCCol[i] = math.log(v)

# set the column name which will be used to the user
specificSCCol.setColumnName("log(specificSC)")

# add the created column to the table t
t.addColumn(specificSCCol)

Script to perform a difference and a mean between two columns

#### Algorithm to perform a difference and a mean between two columns ####

t = Table.get(9)
colAbundance1 = t[3]
colAbundance2 = t[5]

# difference between two columns
colDiff = colAbundance1-colAbundance2

# set the name of the column
colDiff.setColumnName("diff")

# mean between two columns
colMean = (colAbundance1+colAbundance2)/2

# set the name of the column
colMean.setColumnName("mean")

# add columns to the table
t.addColumn(colDiff)
t.addColumn(colMean)

Script to perform a perform a pvalue and a ttd on a XIC quantitation table

#### Algorithm to perform a pvalue and a ttd on abundances column of a XIC quantitation ####
t = Table.get(1)

pvalueCol = Stats.pvalue( (t[2], t[3]), (t[4],t[5]) )
ttdCol = Stats.ttd( (t[2], t[3]), (t[4],t[5]) )

pvalueCol.setColumnName("pvalue")
ttdCol.setColumnName("ttd")

t.addColumn(pvalueCol)
t.addColumn(ttdCol)

Update Spectrum using Peaklist software

When importing search result, the software used for the peaklist creation has to be specified. This parameter is mandatory for the XIC quantitation as it is used to find scan number or RT in the spectrum title. Indeed, this information is then used to extract abundances in the raw files.

If an invalid software has been specified when importing, it is possible to change the peaklist software afterwards. This option is only valid for Identification DataSets.

Right click on the identification DataSet, and select “Update Spectrum using Peaklist software”

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\howtoupdatepeaklistsoftware.png$

The following dialog will be displayed allowing user to select the peaklist software to used.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\updatespectrummatches.png$

Spectral Count

See description of Compare Identification Summaries with Spectral Count.

Generate a Spectral Count

To obtain a spectral count, right click on a Dataset with merged Identification Summaries and select the “Compare with SC” menu in the popup. This Dataset is used as the reference Dataset and Protein Set list as well as specifics peptides are defined there.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\howtosc.png$

In the Spectral Count window, fill the name and description of your Spectral Count and press Next.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\sc_step1.png$

Then select the Identification Summaries on which you want to perform the Spectral Count and press Next.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\sc_step2.png$

Finally select the DataSet where shared peptides spectral count weight should be calculated and press OK.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\sc_step3.png$

A Spectral Count is created and added to the Quantitations Panel.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\sc_dataset.png$

Display a Spectral Count

You can then display a Spectral Count, see Display a Spectral Count

XIC Quantitation

For description on LC-MS Quantitation you can first read the principles in this page: Quantitation: principles

Generate a XIC

There are two ways to generate a XIC, either by clicking on the “Extract Abundances” action or by selecting the “Clone & Extract Abundances” option, both being located on the respective pop-up menu as shown in the following screenshots. The difference among the two approaches lies on the fact that in the second case, the new XIC is generated for an existing Experimental Design.

Create Design

To create the Design of your XIC, drag and drop the desired identifications from the right panel to the left one. If you drop an identification on the XIC Node, Group and Sample parents' nodes will be automatically created. You can also directly drop on a Group or Sample Node. Regarding the name of the aforementioned parent nodes, the general rule that is followed dictates that if possible, a parent node name should be based on the children nodes’ earliest common ancestor, if one should exist. Furthermore, since Proline Studio 1.5, the user can further enhance his XiC Design by creating his own Samples and Groups which in turn can populate by dragging inside them elements from the identifications tree. Once more, going to the next step requires that no empty Group and Sample nodes exist within the design.

Rename Design

Furthermore the user has the capacity to rename any node he wishes using a pop-up menu as shown in the screenshot below. Additionally, for Group, Sample or root nodes, renaming can be triggered either by pressing F2 while having the in-question node selected, or by consequently clicking three times on the aforementioned node. Using triple-click somes times can be a little confusing for the user as it can also lead to possible expansion/retraction in the case of a parent node. Finally, it is recommended that at least the XIC node is renamed.

Link to Raw Files

In order to be able to perform any XiC design, all participating Sample Analysis nodes must be associated with a corresponding raw file. If this is not the case, corrective actions must be taken by the users in order to establish those associations. The association can be created at the second step of the XIC design using two different interface features. The first way to establish such a “connection” is by explicitly drag n drop an .mzdb file that is stored on the server file system. It must be noted that although it is possible to overwrite an existing association, the is no mechanism responsible for ensuring compatibility. So it is thus advised, either to avoid such action or to proceed with caution considering the encountered risks.

The second way to establish a missing connection is to use a newly introduced feature in Proline Studio, the Drop Zone. The latter one can be quite handy in cases where multiple associations are missing or when the plethora of uploaded .mzdb files intimidates the user from manually searching the files one by one. The feature itself is extremely easy to use as it just requires dragging a set of files or folders containing .mzdb files from the user part. As soon as a drop takes place, all missing connections will be automatically created as long as a matching .mzdb file has been dropped in the Drop zone. It must also be noted that since version 1.5, users have now at their disposal indices about the association source. Furthermore, in order to protect existing in database association from possible a possible corruption, the latter ones cannot be overwritten.

XIC Parameters

When the XIC Design is finished, click on the next button and select the parameters. See Label-free LC-MS quantitation configuration to have more details about the different parameters.

The XIC parameters are not all displayed. You can display a complete set of parameters by clicking on the “Advances Parameters”.

Note: all the parameters are already set with default values.

XIC Results

Newly generated XIC designs are immediately added to the Quantitation Tree. Through the latter one, and via a popup menu, the end user has the capacity either to view a design’s properties as seen to the following screenshot, or to apply a series of actions on it, including among others:

Delete a XIC Design, see how to Delete Data
Rename a XIC Design, by clicking on “Rename…” in the popup menu.
Export the XIC results, see how to Export Data

Refine Protein Sets Abundances

Advanced Protein Sets abundances

Right click on the selected XIC node in the Quantitation tree, and select “Refine Proteins Sets Abundances…”

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\refinexicpopup.png$

Configuration

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_profilizer.png$

In the dialog, you can:

- specify peptides to consider for quantitation

- configure parameters used for peptides quantitation

- configure parameters used for proteins quantitation

For more details, see Post-processing of LC-MS quantitative results

Advanced XIC results

You can see the results by displaying the XIC (Display a XIC) or export them Export Data

General Settings

Since version 1.4 Proline Studio includes a general settings dialog which can be accessed from the top menu bar clicking on “General Settings” as shown in the screenshot below.

The latter one consists of a constantly but slowly growing number of user preferences regarding various aspects of the utilization of Proline Studio. Based on their context, for the time being, preferences are organized into the following four tabs:

JMS Settings
Conversion/Upload Settings
Table Parameters
General Application Settings

JMS Settings

JMS Settings tab contains parameters that concern the exchange of messages between your local machine and the JMS Server. It should be made clear that compared to other preferences, preferences that are included in this tab should be treated with caution. Mistreating a communication preference can lead either to communication/connection problems or to users’ confusion to whether they are connected to the correct server version.

Service Request Queue Name

Parameter can be seen as a name which represents a server address. The parameter’s existence is justified by the fact that multiple server versions might run on the same server machine imposing the need to be able to canalize Studio’s messages appropriately.

Table Parameters

Table parameters’ tab encapsulates a short list of preferences regarding all tables generated throughout Proline Studio. More specifically those preferences control the arrangement of the participating columns as well as their respective width.

Columns Arrangement

This field dictates the spatial arrangement of table columns. Three arrangements are possible:

Fixed Column Size
Automatic Columns Size
Smart Columns Size

When “Automatic Columns Size” is used, all columns are width-wise readjusted in a way that they all fit to their container. Given that it is a “fit-to-screen” approach, it lacks scrollbars and does not guarantee the readability of the presented date, especially when the number of columns is high.

On the other hand a simpler approach that guarantees readability is to select “Fixed Column Size”. In this case all table columns have a fixed width, explicitly dictated by the user using the parameter “Column Width”.

The less clear option is “Smart Column Size” which serves as a trade off between the aforementioned ones. It tackles with the cases that either we have too many columns to visualize using the “Automatic” approach or too few and the selected default and globally applied width imposes unneeded scrollbars ending in hiding some columns at the same time. In this context, “Smart Column Size” can be seen as a simple rule based on the ratio between the mean column width needed in case of “Automatic Column Size” for a specific table and the globally selected width. For the sake of simplicity we have set a threshold of 0.7 or 70% which on its turn determines which one of the two first modes will be used given a table. If the ratio is smaller than 0.7 then the table in question will be presented in “Fixed Column Size” mode. On contraire if ratio is equal or greater than 0.7, then we consider that using “Automatic Column Size” mode is more appropriate as it balances between a possible slightly smaller than desired width and the possibility of hiding a column using scrollbars.

Column Width

The second preference is more or less self explanatory and corresponds to the globally desired columns’ width when fixed rune is applied, either directly or as a result of the smart mode.

General Application Settings

In this tab we can find a diverse set of preferences regarding various tasks encountered in Proline Studio. For the time being those preferences are:

Hide gettings Started Dialog
Default Search Name Source
Export Decorated
Use dataset type to create a XiC design by DnD

Default Search Name Source

Unlike the last three, Hide Getting Started Dialog is pretty much self explanatory. The second preference on the other hand, “Default Search Name Source” affects the way identification datasets are named on importation. For this preference we have three possible options:

Search Name (E.g Gamme Levure UPS1)
Peaklist (E.g OEMMA121101_36.raw)
MSI Search Filename (E.g F054967)
Mascot Rule (...)

Export Decorated

This parameter affects the .xls and .xlsx files that are produced in the process of export (client side). It could be easily described as a preservation of any existing Rich Text Feature in a table. (Colors, Font Weight etc.)

Use dataset type to create a XiC design by DnD

Conversion/Upload Settings

Converter (.exe)

Corresponds to the default raw2mzDB converter. While it is left at the discretion of the user, which version to choose, it must be noted that different versions do tend to work better with specific type raw files. Said that, it is also important to understand that in order for a conversion to be successful within Proline Studio, all system requirements set by the specific raw2mzDB version must be met.

Administration

Compared to the previous versions, 1.5 of Proline Studio has been enhanced with basic administration functionality. At this moment, this basic infrastructure permits an advanced user, who at the same time holds the status of Admin, to carry a limited range of tasks on accounts and registered in the system Peaklist softwares.

User Accounts

As we can see, regarding user accounts, the respective administration dialog offers us two possibilities, adding a new user account or modifying an existing one. At this time, both actions are quite simplistic, relying on minimalistic popup dialogs as seen in the following two screenshots.

As we can see, both actions share the same interface with the exception of the predefined and not editable login field in the case of user account modification. Finally, it is important to note that at Studio’s Installation, a default admin user is created permitting us to create the rest of the users.

Peaklist Softwares

As seen in the preceding screenshot, the Peaklist Softwares tab follows a similar approach with the accounts one. Again, tab permits us to either add one or modify an existing element via popup dialogs triggered by the two respective buttons. In order to add a new Peaklist Software it suffices to click on the button with the homonym reading, action which will trigger the appropriate dialog.

Similarly, as seen in the proceeding screenshot clicking on the modifying button permits the end user to alter a small attributes subset of an existing Peaklist Software.

Proline Web

Server Connection

Prerequisite: You must have an account to login to the server. Ask your administrator to create one if you don't have any. After the installation, the default account is “admin” with password “admin”

- Open your Google Chrome web browser and connect to the address of the server (ask your administrator)

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_login_form.png$

- Enter your username and password and click “OK”.

- To create a project, please follow the instructions detailed on this page.

Register Raw & MzDB Files

In order to create and run Quantitation analyses, you must register your MzDB files into Proline databases. To do so, click on the “Settings” button in the bottom bar of the Dataset Explorer application, and go to the “Raw File Registerer” tab.

Use the “Add MzDB Files” button in order to browse and select MzDB Files
Choose an Instrument Name and select the files owner.
Click on “Register”. You can now close the Settings window.

Note that this operation will override information if some of the files is already registered.

Close the Settings Window.

Create a Project

Open the DataSet Explorer app from the left panel (if closed, open it by clicking on the “apps” button in the upper-left corner of the page)

When the DataSet Explorer is opened, you can see your Project Tree on the left.

My Project lists your own projects

If you are an admin, or if other people shared their projects with you, you can see their projects below

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_project.png$

To create a New Project, just click on the New project button at the bottom of this panel or in the context menu of the user node.

Create a User

You must be logged in as an administrator. Click the gear-shaped button in the top task-bar (circled in blue on the right) to open the administrator settings menu.

On the first tab, “User Administration”, you can create a new user by setting up its name, its password, and define whether or not the user will have the “administration” permissions (including applications, users and service management).

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_user.png$

Submit the form to create the user. You can manage existing users from the “users” tab.

Please note that the Proline Web Desktop has its own database and its own users collection. However, if you configured the Proline Service running inside the Proline Web Desktop, it will synchronise the Proline Web Desktop and the Proline Core users each time a user logs in the Proline Web Desktop:

the users you create in the Proline Web Desktop Administration panel are automatically added to the Proline Core database (User Data Set database) when they sign in on the Proline Web Desktop.

the users registered in the Proline Core Database (UDS) are automatically registered in the Proline Web Desktop database when you log in as administrator in the Proline Web Desktop, if they were missing.

Import Result Files

The first thing to do in your brand new project is to import Result Files.

There are many ways to open the dedicated tab:

Right-click the Imported Search Results node, then click Import result File

Right-click the project node, then click Import result File

Double-click the project node, so that the project summary opens. Then click Import result File in this panel toolbar.

A new tab will open and display the following panel.
Click on Select result files to browse and select them. The left side of the browsing window is meant for the directories. When you click on one of them, its content is shown on the main panel. Choose one or multiple files then click Ok.

Then configure the following parameters:

Parameters

Software Engine: the software which generated your Result File

Instrument: mass-spectrometer used for sample analysis

Peaklist Software: the software used for the peaklist creation

Decoy Parameters

Decoy Strategy: The type of decoy search which was performed.

“No Decoy Database”: if the search was performed against a target database only.

“Concatenated Decoy Database”: if target and decoy sequences were merged into a single database.

“Software Decoy”: if the “Decoy” sequences were generated on-the-fly by your search engine.

Protein Match Decoy Rule: for concatenated searches only. Select the rule to apply for the discrimination of target and decoy protein matches.

Parser Parameters: according to your Software Engine, this will display some extra-parameters.

Mascot:

Ion Score Cutoff: peptide matches with a score lower than this value won't be imported

Subset Threshold: the percentage of score between a given protein match and the master protein match (superset). Protein matches with a relative score lower than

Master_protein_score * (1-subset threshold)

won't be imported.

Omssa:

Usermods file path: an XML file containing the definitions for each user defined PTM used in the OMSSA search.

PTM Composition file path: a text file containing the chemical composition for each user defined PTM. This is required for PTMs not already imported in another Search Result. The format is the following: PTM name=<PTM composition> (one per line). Example:

Acetyl peptide N-term=H(-6) C(-7) O(-1)

You can now click on “Start Import” to launch the check and the import tasks. $D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_project_tree_tasks.png$

The server checks your files first, then the import itself is launched automatically.

You can follow the current state of your tasks by clicking on the Tasks button at the bottom of the Project Tree panel.

It opens the Tasks Panel, listing all the tasks that have been run in Proline.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\task_panel_import.png$

When a task is done, you are notified by a small message in the top of the screen, and you can see its status in the tasks window:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_import_finished.png$

All the Result Files you have imported are listed in the Imported Search Results panel you can access by double-clicking the Imported Search Results node in your project tree.

Imported Search Results

Double click on the Imported Search Results node to open the corresponding grid.

The grid itself allows to:

See if the result file has been associated to a raw file (and which one)
See if the raw file is registered into Proline databases
Edit the sample name and/or search title by double-clicking on the cell
See associated data

The grid toolbar allows to:

Automatically associate result file to its raw file
Register raw files
Change the software that created the peaklists
Automatically look for the sample name in the result file
Update the samples names in batch (see below).

Update sample names in batch

From the Search Results grid, click on Edit sample names in batch to open the edition window.

In this window, you can copy and paste a grid from, let’s say, an Excel file. Each row corresponds to a result file. It must contain 2 columns: the sample name and either the peaklist file name/raw file identifier/result file name.

Let’s say you have the following grid:

Peaklist File	Condition
OVEMB150205_21.raw.mgf	UPS1 50 fmol
OVEMB150205_23.raw.mgf	UPS1 50 fmol
OVEMB150205_25.raw.mgf	UPS1 50 fmol

Copy it then paste it into the edition window as shown below. Notice that in this example, we are using the peaklist file name as a reference, so we must specify it in the selectable box on top of the window to update the column name.

You can also double click on a cell to edit its text directly in the edition window.

The result set table will be updated with the new sample names:

As says the warning, if many imported result files share the same raw file identifier, all of them will be impacted by this action.

Identification Tree and Dataset creation

Once your Result Files have been imported, you can use them to create a new Identification Dataset. To create it from result files, you can:

right-click the Identification Trees node in the project tree, then select New Dataset or

double-click on the Identification Trees node to show the identifications grid. It is meant to list all your datasets, and the validation results (if run). For now, click on the New Dataset button from the toolbar.

Note that you can also create an empty dataset to further assemble complex structures using drag and drop in the project tree. However, this way is not favoured.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_dataset_based_on_sresults.png$

You should now see a window asking to choose a source of data for your new Dataset.
The only option available yet is Result Set: it allows you to build a new dataset from the Result Files you have imported. Click OK to continue.

Create a simple dataset (single group)

Dataset properties

In the right panel, two text fields allow you to enter a name and description (optional) for your dataset.

By default, the result sets will be named in the tree using the result file name. You can change that with the Name child results using box. The possibilities are: peaklist file name, raw file identifier, result file name, sample, search title.

Files selection

To add one or many files to your selection, select them in the grid (you can use the Ctrl and the Shift keys to make a multiple selection), then click on Add to dataset (top-right of the panel). You can also double click on one file to quickly add it to the selection.

To remove any file from the selection, just select them and click on Remove selected Items.

Created dataset

The creation of your identification dataset happens as follows:

An “Aggregation” Dataset Node is created. It takes the name that you provided during the creation.

One “Identification” Dataset Node is created for each one of the Result Files you have selected. They take the name of the Result File.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_newly_created_dataset.png$

Once your Identification Dataset has been created, you can see it on the tree, in the left side of the window. You may need to collapse and expand again the Identifications node to see it appear.
The white icon let you known that it is not yet validated (becomes green when validated).

Double click on the aggregation node to open the identification summary. This panel shows a list of your Identification fractions (corresponding to each imported file) and, after the validation process, it will display the Merged Result Summary infos.

Create a complex dataset (automatic grouping)

Dataset properties and files selection

Refer to the previous paragraph to know how to do these steps.

Add annotations to files

Once you have selected your result files, click on Add annotations. A window shows up, with as many empty lines as selected files.

Example 1: multiple biological groups

Let's say you compare 2 conditions, and you have the following result table (Excel file for instance). The idea is to copy/paste this table to the annotation editor.

Result file	Peaklist File	Condition
F078594.dat	OVEMB150205_21.raw.mgf	UPS1 50 fmol
F078596.dat	OVEMB150205_23.raw.mgf	UPS1 50 fmol
F078592.dat	OVEMB150205_25.raw.mgf	UPS1 50 fmol
F078590.dat	OVEMB150205_27.raw.mgf	UPS1 50 fmol
F078591.dat	OVEMB150205_12.raw.mgf	UPS1 5 fmol
F078595.dat	OVEMB150205_14.raw.mgf	UPS1 5 fmol
F078597.dat	OVEMB150205_16.raw.mgf	UPS1 5 fmol
F078593.dat	OVEMB150205_18.raw.mgf	UPS1 5 fmol

Select a property to identify the result set with the Map annotations using… box. It doesn't need to be the same property as for dataset children name (see dataset properties), so long as it refers to the selected files. In this example, it is convenient to select “Peaklist file name”.

In the toolbar at the top of the grid, the annotation Biological group is selected by default. Click Add to add this column to the grid.

Copy the “Peaklist File” and “Condition” columns of your table, and paste it (using Ctrl + V) in the empty grid of the annotation editor. Note that the files order doesn't matter, so long as there is the right count.

The window should now look like this:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_edit_annotations_bio_groups.png$

Click OK to register these annotations. The window closes and the files selection has been annotated as shown below (on the left). Click Create Dataset. The generated dataset (below, on the right) has 2 levels of aggregation: biological group (with the copy/pasted names) and top-level (chosen name for dataset).

Example 2: multiple biological groups and biological samples

Let's take a slightly more complex design, introducing biological replicates for each condition:

Result file	Peaklist File	Biological replicate	Condition
F078594.dat	OVEMB150205_21.raw.mgf	BRep 1	UPS1 50 fmol
F078596.dat	OVEMB150205_23.raw.mgf	BRep 1	UPS1 50 fmol
F078592.dat	OVEMB150205_25.raw.mgf	BRep 2	UPS1 50 fmol
F078590.dat	OVEMB150205_27.raw.mgf	BRep 2	UPS1 50 fmol
F078591.dat	OVEMB150205_12.raw.mgf	BRep 1	UPS1 5 fmol
F078595.dat	OVEMB150205_14.raw.mgf	BRep 1	UPS1 5 fmol
F078597.dat	OVEMB150205_16.raw.mgf	BRep 2	UPS1 5 fmol
F078593.dat	OVEMB150205_18.raw.mgf	BRep 2	UPS1 5 fmol

Note In most of the cases, the definitions go as following:
- biological group = biological condition
- biological sample = biological replicate
- sample analysis = technical replicate
However, these definitions are not strict and you can use and adapt the hierarchy to fit your needs.

As shown previously (see example 1), choose the result set mapping property and add the Biological group column.

In the toolbar box, now select the Biological sample annotation, and add it by clicking on Add.

You can re-arrange the columns order to fit your data table: drag and drop the column headers to move them.

Copy/paste the relevant columns of your table in the annotation editor.

Click OK to register the annotations and see them in the creation panel.
Then click Create Dataset.

The generated dataset has 3 levels of aggregation:

top-level (name: chosen in creation panel)

biological groups (names: from annotation editor column)
biological samples (names: from annotation editor column)

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_edit_annotations_bio_groups_bio_rep_created.png$

Validate a Search Result

To launch a validation on a dataset:

right-click on its node in the project tree then select Validate tree or

double-click to open identification table, then click Validate tree in the toolbar.

The following form appears:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_validate_tree_params.png$

The validation form handles several settings, especially:

PSM Filters: use the selection box and the green button to add more filters on Peptide Spectrum Matches to be applied during the validation process: Score, Rank, Sequence Length, E-Value and P-Value.

PSM Validation Parameters: choose on which param the Peptide FDR should be applied: Score, E-Value or P-Value.

Merging Mode : determines if the merging operation should be done before the validation (on result sets) or after the validation process (on identification summaries)

When ready, click on Validate to launch the validation task. You can see it in the tasks panel. When the validation finishes, a notification is displayed and the node icon is colored in green.

See more details about the validation process.

Create a Quantitation

You can either create a quantitation from scratch, meaning you enter all information; or clone an existing quantitation to pre-fill the form with previous parameters.

Clone an existing quantitation

Right-click on a quantitation, then on Clone (new quantitation).

This will open the quantitation launcher, with pre-filled parameters.

The new quantitation name will be, by default, the same as the cloned quantitation, suffixed by “(copy)”. You can change it.

When cloning, a popup window will appear, offering to Use the previous peakels detections. This mean the software will not look for signal again, but use the peaks that were found in the first quantitation. This will dramatically reduce the computation time.

You can change this parameter afterwards in the LFQ parameters section of the form (see description below).

Create a new quantitation $D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_lfquant_rclick.png$

There are 2 ways to open the Quantitation Creation form:

Right-click on the Quantitations node of the project tree, then click New Label Free Quantitation in its menu

Double-click on this node to open the Quantitations table panel. It is empty if you have not created any quantitation in this project yet. Click the New Label Free Quantitation button to open the creation form.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_lfquant_table.png$

Title, Type and Method

The first tab of the creation panel, entitled Information, let you define a name and an optional description for your quantitation. Type and method are disabled since the label free base on abundance extraction is the only combination available yet.

Click on Next to go to the next step.

Experimental Design

The Experimental design tab is where you define your Groups and Samples.

The simplest way to create biological groups (e.g. representing biological conditions) is to drag the validated nodes from the project tree and drop them in the dedicated zone. Drag/drop the nodes one by one.

In the example below, two groups are created this way:

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_lfq_exp_desing_complete.png$

For more complex experimental design, click on Add New Sample to add a sample (e.g. biological replicate) to your group. Enter the name of the sample, then drag/drop the corresponding node to the right panel to add it.

In most of the cases, the definitions go as following:

biological group = biological condition

biological sample = biological replicate

sample analysis = technical replicate

However, these definitions are not strict and you can use and adapt the hierarchy to fit your needs. Tooltips are available to help you remind it (place your mouse over the terms to see them).

Once you have prepared all your groups and samples, click on the Next button.

Abundance Extraction parameters (LFQ parameters)

This tab let you set up you abundance extraction parameters.
Depending on your choices, some parameters will become available/unavailable (e.g. clustering parameters are visible only when extracting from MS/MS events).

Tooltips are available to help you understand each parameter (keep he mouse on a parameter name to see it appear).
See more details on these parameters.
See more details on the quantitation process.

Tip: If you are creating a quantitation that you have already done in the past (i.e. using the same files), select the Use previous peakel detection to go much faster. It will skip the long step of peakel detection and use what was previously found instead.

Click on Next to go to the last tab.

Ratios

The purpose of this tab is to define the ratios between the groups your quantitation will rely on.

You can define the ratios:

one by one: specify the numerator and denominator in the Add single ratio section

in batch: automatically generate the ratio of each group against a reference group. Use the first section to do it.

Click the “double arrow” icon to invert numerator and denominator in a ratio.

Launch the Analysis

When you're done, just press the Launch Quantitation button. You will be noticed when the task is finished

Delete Data

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_move_ident_ds_to_trash_rclick.png$

You can delete an Identification dataset by:

right-clicking on it in the project tree and select Move to trash

double-clicking on it to open the Identification panel, then click on Move to Trash in the toolbar of the Summary tab.

All deleted datasets are visible in the Trash node in the project tree.

Display Identification information

Double-click on an identification or an aggregation of identifications to open its summary and results panel.

Display Identification Summary

The Summary panel sums up the validation parameters and results.

Summary of an aggregation dataset

If you clicked on an Aggregate node, this panel shows the information of the Merged Result Summary and a grid listing all of the identifications of this aggregate.

Summary of a result set

If you clicked on a result set (a leaf in the tree, corresponding to a single file), then the Summary panel will display its validation parameters; including the parameters chosen when launching the validation and the ones that were automatically computed.

Display proteins sets

In order to browse the protein set data of a validated identification, click on the validated identification node in the project tree, on the left side panel of the dataset explorer, and then open the Proteins tab.

IMPORTANT: Please note that the Protein Table only displays the validated proteins. Furthermore, they actually are the representative math of a protein set.

Note that the identified proteins can be marked as Favorite. See the dedicated documentation here.

Related entities

When clicking a protein, the other tables will be filled with related information:

protein set matches: the proteins included in the protein set represented by the clicked protein.
peptides: the peptides that appear in this protein. Double click on a peptide to open the Peptide view and focus on this one.

Filters

Each table of the Identification data viewer provides a set of Filters for numerical, text and boolean data. It is placed on the left of the grid. They are circled in red in the first screenshot. Click on the double arrow to fold/unfold these panels. Use them as follows:

Display Peptides Table

The Peptides tab of a validated Identification Dataset (or Merged Dataset) allows you to browse the peptides of the related Result Summary.

Note that the identified peptides can be marked as Favorite. See the dedicated documentation here.

When clicking a peptides, the Peptide Matches and Protein Matches tables are filled with entities related to this peptide. Double-click on one of these entities to open the dedicated tab and focus on it.

Filters

Each table of the Result Summary data viewer provides a set of Filters for Numerical, Text and Boolean data, placed on the left of the grid.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_ident_peptides_filters.png$

Display MS Queries Table

The MS Queries table displays the MS Queries of the Result Summary and offers the same filters options as the others tables.

MS Query Click Actions

Click on an MS Query: will automatically load the corresponding Peptide Matches in the “MS Query Peptides Matches” table

Double click on a Peptide Match item: will open a Peptides viewer tab focused on the corresponding peptide.

Display Quantitation information

Double-click on a quantitation node in the Project Tree in order to open its tabbed-panel.

Display Quantitation Summary

There are 3 tabs under the Summary: Experimental design, LFQ Params and Stats Params.

It also provides the “Post processing” tool you need to launch in order to compute statistics on your data (get ratios information). Note that the quantified proteins and peptides tables must be reloaded after the post processing (use the Refresh icon on the table top right).

Experimental design

This panel is a summary of your experimental design, listing your biological groups, their samples and their technical replicates.

LFQ params

It has exactly the same layout as the quantitation launcher form, but is not editable. It allows you to see with which parameters the label-free quantitation was run.

Stats params

It has exactly the same layout as the parametrization window for the post processing, but is not editable. It allows you to see the parameters used for the last post processing of the quantitation.

This tab won’t be shown if no post processing has been run on this quantitation.

Proteins quantitation

The graphics are related to the files (quantitation channels) in which the protein has been -or not- quantified. Tooltips are available by keeping the mouse over a bar or a dot of the graphics.

The quantified proteins can be marked as Favorite. See the dedicated documentation here.

As for the identification, the tables Protein Set Matches and Master Quant. Peptide will be filled when a protein is clicked.

This grid is also provided a filter tool that you can use as follows:

Double-clicking a Protein will open the Protein details panel:

Protein details panel

On this panel, you get an insight of all quantified peptides for this protein (double click on the peptide in the 'Quantified Peptides' table or expand directly the peptide panel).

You can click on the eye icon (circled in green) to access the Speclight viewer of this Peptide.

TODO: insert link to Speclight doc

Peptides quantitation

This is the list of all quantified peptides.

The quantified peptides can be marked as Favorite. See the dedicated documentation here.

Click on a peptide to see its Master Quant Peptide Ions in the dedicated table below.

Double-click on a peptide to open the panel for the feature corresponding to this peptide (see LC-MS Features).

Note that the peptide grid has got a filter tool too.

LC-MS Maps

This panel offers two tabs to give you information about the LC-Ms maps of the experiment.

Map alignments

IMPORTANT: it is crucial to visit this graphics to manually control the processed map alignments. A bad alignment (showing many curve trends, or too large time deltas) will lead to quantitation errors. If you have the feeling that the alignment needs improvement, re-run the quantitation, tuning the Alignment Computation part of the LFQ Params tab (see how to create a quantitation).

On this chart, you can see all the delta time of each LC-MS map against the reference LC-MS map. The reference map is the one specified in the title of the chart. It corresponds to the x axis.

LC-MS maps 2D viewer

Click on one or many raw files to display their LC-Ms map features.

LC-MS Features

This panel lists all the features detected and processed by Proline.

Click a master feature to see: - the corresponding XIC on the right-side panel - the corresponding features in each file in the bottom grid (first tab) - the corresponding master quant. peptide ion in the bottom gris (second tab).

You can double click an entry in the bottom grid to open a dedicated tab (feature/peptide ion details).

LC-MS Peaks

All the peaks are listed here, independently from the features they have been gathered to.

Click on a LC-MS Map to list all its peaks in the center grid. Note that this grid is provided a filtering tool.
Click on a peak (center grid) to see its feature and XIC charts in the bottom panel. You can alternatively double-click a peak to open the XIC Viewer in a new tab.

Quantitation Stats

Post Processing

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_launch_post_processing.png$

In order to see the statistics on the ratios, you have to launch the Post Processing (previously called Profile Analysis). Launch it from the Summary tab toolbar, or from the quantitation node (in project tree) context menu.

You can see the Post Processing configuration window below.
See more details on the post-processing parameters.
Launch the post processing by clicking Run. You can watch the task progression in the Tasks window. You will also be notified when the task is done. You can then watch the results in the Quantitation Stats tab.

$D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_post_processing.png$

Quantitation Statistics

This tab offers many panel to help you visualize statistics about your quantitation.

Proteins volcano plot

The first panel is a volcano chart of your Proteins based on their -Log10 T-Test P-Value over Log2 Ratio.

The Left grid shows a list of all the proteins, and if they are in the actual boundaries of the selection/exclusion tool.

Hover a dot to see information on the protein.
Click on it to open its protein details panel.

The AC Filters let you select a color (by clicking on it) and enter an Accession Number (or a part of it) to highlight matching proteins on the chart). In this example, all proteins containing “HUMAN” in their accession number are colored in orange. By defaults, dots are light blue.

The Log2 Fold Change (of the ratios) and P-Value fields let you define new boundaries for the protein selection/exclusion. It is shown on the chart with black lines and gray points to show excluded proteins.

Select a zone with the mouse cursor to zoom on it. Use the icon at the top-right of the volcano plot to export it at several picture formats.

Peptides volcano plot

The second tab offers the same volcano chart and options for peptides. Instead of accession number, you can color the dots according to their sequence or PTM definition.

Quantitation folders

In the tree, you can organize your quantitations into folders. To create a new folder, right click on the Quantitations node, then on Create folder. Enter a name for the folder. You can now drag and drop quantitations or other folders into the newly created one. You can also drag and drop quantitations/folders out of a folder.

Favorite proteins and peptides

Four kind of entities can be labeled as “favorite”: the identified proteins, the identified peptides, the quantified proteins and the quantified peptides.

The column entitled Favorite in each table corresponding to those entities allows to:

know if the entity has been marked as favorite, either in this dataset or another dataset of this project
set or unset the entity as favorite.

The favorite level

There a 3 levels of favorites:

0 - grey star: not set as favorite in this dataset, nor in another
1 - blue star: not set as favorite in this dataset, but in one or many other datasets of the project (see references)
2 - golden star: set as favorite in this dataset, perhaps also in other datasets (see references)

Set/Unset a favorite

Update a single entity

Click on the star icon of the protein or peptide you want to update. A contextual menu will open, with 3 options:

set as favorite
unset as favorite
see references in project

You can set as favorite any entity that is not yet marked so (blue and grey stars).

Unsetting an entity as favorite will apply this change only in the current dataset (if it was labeled as favorite in another dataset, the latter won’t be affected).

Update many entities at a time

You can also use the Favorites menu of the grid to set/unset all the grid items as favorites. You may want to filter the grid before.

In the example above, all proteins that were quantified in at least 7 files and have a protein set score superior to 150 will be set/unset as favorites in one click.

Sort and filter favorites

Since the favorite status of an entity is coded with an integer, you can sort the Favorite column (use the header menu) and/or filter the grid items on their “favorite status”. To do so, use the Button Data tab of the filter.

See a favorite’s references in the project

When you marked an entity as favorite, you may want to know if it was also set as favorite in other datasets.

To do so, click on the star icon the select See references in project.

The panel below will open.

In this panel, you will see every reference of the entity that has been marked as a favorite. You can click a green arrow to open the corresponding panel, focused on the entity.

Copy grids content

All grids are copyable, but not all the same way.

The grids with a “Save” icon on the top right can be copied entirely (the whole content) with this icon.

The others can be copied line by line (select several lines by maintaining Shift actioned while clicking on the lines, or select all lines with Ctrl+A) with the keys Ctrl+C.

Export Data

The “Export” button of any identification or quantitation node is accessible by right-click on it, or in its summary tab when you double-click on it. All your exported data will appear under the “Exported Files” node of the project you chose during export (the dataset current project, by default).