Proline User Guide

Release 2.0


Proline Software Suite

Proline is a suite of software and components dedicated to mass spectrometry proteomics. Proline lets you extract data from raw files, import results from MS/MS identification engines, organize and store your data in a relational database, process and analyse this data to finally visualize and extract knowledge from MS based proteomics results.

 

The current version support the following features:

The software suite is based on two main components: a server handling processing tasks and based on relational database management system storing the data generated by the software and two different graphical user interfaces, both allowing users to start tasks and visualize the data: Proline Studio which is a rich client interface and Proline Web the web client interface. An additional component called ProlineAdmin dedicated to system  administrators to setup and manage Proline.

Concepts

Read the Concepts & Principles documentation to understand the main concepts and algorithms implemented in Proline.

How-to

Find quick answer to your questions in this How to section.

Raw file conversion to mzDB

This procedure is detailed in the mzDB Documentation section. 


Proline Concepts & Principles

Introduction

Proline considers different types of identification data: Result Files, Search Results and Identification Summaries which will be defined in the following sections. All these data are connected according to this chart:

Result File

A Result File is the file created by a search engine when a search is submitted. OMSSA (.omx files), Mascot (.dat files) and X!Tandem (.xml files) search engines are currently supported by Proline. Generic mzIdentML files could also be imported.

A first version for MaxQuant support has been implemented. It is possible to import only search result or to import search result as well as quantitation (beta version) values from MaxQuant files.

A first step when using Proline is to import Result Files through Proline Studio or Proline Web.

Search engines may provide different types of searches for MS and MS/MS data. It is important to highlight that the Result File content depends on the search type. Proline only supports MS/MS ions searches at this point.

Search Result

A Search Result is the raw interpretation of a given set of MS/MS spectra given by a search engine or a de novo interpretation process. It contains one or many peptides matching the submitted MS/MS spectra (PSM, i.e. Peptide Spectrum Match), and the protein sequences these peptides belong to. The Search Result also contains additional information such as search parameters, used protein sequence databank, etc.

A Search Result is created when a Result File (Mascot .dat file, OMSSA .omx or a X! Tandem .xml) is imported in Proline. In the case of a target-decoy search, two Search Results are created: one for the target PSMs, one for decoy PSMs.

Content of a Search Result

Importing a Result File creates a new Search Result in the database which contains the following information:

Mascot result importation

The PSM score corresponds to Mascot ion score.

OMSSA result importation

The PSM score corresponds to the negative common logarithm of the E-value:

Note: Proline only supports OMSSA Result Files generated with the 2.1.9 release.

X!Tandem result importation

The X!Tandem standard hyperscore is used as a PSM score.

Note: Proline supports X!Tandem Result Files generated with the Sledgehammer release (or later).

Decoy Searches

Proline handles decoy searches performed from two different strategies:

Decoy and Target Search Result

See Search Result to view which information is saved.

 

Identification Summary

An Identification Summary is a set of identified proteins inferred from a subset of the PSM contained in the Search Result. The subset of PSM taken into account are the PSM that have been validated using a filtering process (example: PSM fulfilling some specified criteria such as score greater than a threshold value).

Content of an Identification Summary

Peptide Set

Protein Set

Protein Inference

All peptides identifying a protein are grouped in a Peptides Set. A same Peptides Set can identify many proteins, represented by one Proteins Set. In this case, one protein of this Protein Set is chosen to represent the set, it is the Typical Protein. If only a subset of peptides identify a (or some) protein(s), a new Peptide Set is created. This Peptide Set is a subset of the first one, and identified Proteins are Subset Proteins.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\pepprotset1.png

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\pepprotset2.png

All peptides sets and associated protein sets are represented, even if there are no specific peptides. In both cases above, no choice is done on which protein set / peptide set to keep. These protein sets could be filtered after inference (see Protein sets filtering).

Proteins and Proteins sets scoring

There are multiple algorithms that could be used to calculate the Proteins and Protein Sets scores. Proteins scores are computed during the importation phase while Protein Sets scores are computed during the validation phase.

Protein

Each individual protein match is scored according to all peptide matches associated with this protein, independently of any validation of these peptide matches. The sum of the peptide matches scores is used as protein score (called standard scoring for Mascot result files).

Protein Set

Each individual protein set is scored according to the validated peptide matches belonging to this protein set (see inference).

Scoring schemes

Mascot Standard Scoring

The score associated to each identified protein (or protein set) is the sum of the score of all peptide matches identifying this protein (or protein set). In case of duplicate peptide matches (peptide matched by multiple queries) only the match with the best score is considered.

Mascot MudPIT Scoring

This scoring scheme is also based on the sum of all non-duplicate peptide matches score. However the score for each peptide match is not its absolute value, but the amount that it is above the threshold: the score offset. Therefore, peptide matches with a score below the threshold do not contribute to the protein score. Finally, the average of the thresholds used is added to the score. For each peptide match, the “threshold” is the homology threshold if it exists, otherwise it is the identity threshold. The algorithm below illustrates the MudPIT score computation procedure:

Protein score = 0
For each peptide match {
 If there is a homology threshold and ions score > homology threshold {
         Protein score += peptide score - homology threshold
 } else if ions score > identity threshold {
         Protein score += peptide score - identity threshold
 }
}
Protein score += 1 * average of all the subtracted thresholds

The benefit of the MudPIT score over the standard score is that it removes many of the junk protein sets, which have a high standard score but no high scoring peptide matches. Indeed, protein sets with a large number of weak peptide matches do not have a good MudPIT score.

Mascot Modified MudPIT Scoring

This scoring scheme, introduced by Proline, is a modified version of the Mascot MudPIT one. The difference with the latter is that it does not take into account the average of the substracted thresholds. This leads to the following scoring procedure:

Protein score = 0
For each peptide match {
 If there is a homology threshold and ions score > homology threshold {
         Protein score += peptide score - homology threshold
 } else if ions score > identity threshold {
         Protein score += peptide score - identity threshold
 }
}

This score has the same benefits than the MudPIT one. The main difference is that the minimum value of this modified version will be always close to zero while the genuine MudPIT score defines a minimum value which is not constant between the datasets and the proteins (i.e. the average of all the subtracted thresholds).

FDR Estimation

There are several ways to calculate FDR depending on the database search type. In Proline the FDR is calculated at peptide and protein levels using the following rules:

Note: when computing PSM FDR, peptide sequences matching a Target Protein and a Decoy Protein is taken into account in both cases.

Validation Algorithm

Once a result file has been imported and a search result created, the validation is performed in four main steps:

  1. Peptide Matches filtering and Validation
  2. Protein Inference (peptides and proteins grouping)
  3. Protein and Proteins Sets scoring
  4. Protein Sets Filtering and Validation

Finally, the Identification Summary issued from these steps is stored in the identification database. Different validation of a Search Result can be performed and a new Identification Summary of this Search Result is created for each validation.

When validating a merged Search Result, it is possible to propagate the same validation parameters to all childs Search Results. In this case Peptide Matches filtering and Validation will be applied on childs as well as Protein Sets filtering. Note: actually, Protein Sets validation is not propagated to childs Search Results.

Peptide Matches Filtering

Peptide Matches identified in search result can be filtered using one or multiple predefined filters (described hereafter). Only validated peptide matches will be considered for further steps.

Basic Score Filter

All PSMs which score is lower than a given threshold are invalidated.

Pretty Rank Filter

This filtering is performed after having temporarily joined target and decoy PSMs corresponding to the same query (only really needed for separated forward/reverse database searches). Then for each query, PSMs from target and decoy are sorted by score. A rank (Mascot pretty rank) is computed for each PSM depending on their score position: PSM with almost equal score (difference < 0.1) are assigned the same rank. All PSMs with rank greater than specified one are invalidated.

Minimum Sequence Length Filter (Length)

PSMs corresponding to short peptide sequences (length lower than the provided one) can be invalidated using this parameter.

Mascot e-Value Filter (e-Value)

Allows to filter PSMs by using the Mascot expectation value (e-value) which reflects the difference between the PSM score and the Mascot identity threshold (p=0.05). PSMs having an e-value greater than the specified one are invalidated.

Mascot adjusted e-Value Filter (Adjusted e-Value)

Proline is able to compute an adjusted e-value. It first selects the lowest threshold between the identity and homology ones (p=0.05). Then it computes the e-value using this selected threshold. PSMs having an adjusted e-value greater than the specified one are invalidated.

Mascot p-Value on Identity Filter (Identity p-Value)

Given a specific p-value, the Mascot identity threshold is calculated for each query and all peptide matches associated to the query with a score lower than calculated identity threshold are invalidated.
When parsing Mascot result file, the number of PSM candidate for a spectra is saved and could be used to recalculate identity threshold for any p-value.

Mascot p-Value on homology Filter (Homology p-Value)

Given a specific p-value, the Mascot homology threshold is inferred for each query and all peptide matches associated to the query with a score lower than calculated homology threshold are invalidated.

Single PSM per MS Query Filter

This filter validates only one PSM per Query. To select a PSM, following rules are applied:

For each query:

:!: For testing purpose, it is possible to request for this filter to be executed after Peptide Matches Validation (see below). In this case, the requested FDR in validation step is modified by this filter. This is just to confirm the need or not of this filter and to validate the way we apply it!

Single PSM per Rank Filter

This filter validates only one PSM per Pretty Rank. If you choose this filter + a pretty rank filter you should have the same behaviour than the “Single PSM per Query Filter”.

In order to choose the PSM, following rules are applied:

For Pretty Rank of each query:

:!: For testing purpose, it is possible to request for this filter to be executed after Peptide Matches Validation (see below). In this case, the requested FDR in validation step is modified by this filter. This is just to confirm the need or not of this filter and to validate the way we apply it!  

Peptide Matches Validation

Specify an expected FDR and tune a specified filter in order to obtain this FDR. See how FDR is calculated.

Once previously described prefilters have been applied, a validation algorithm can be run to control the FDR: given a criteria, the system estimates the better threshold value in order to reach a specific FDR.

 

Protein Sets Filtering

Filtering applied during validation is the same as Protein Sets Filtering 

Protein Sets Validation

Once prefilters (see above) have been applied, a validation algorithm can be run to control the FDR. See how FDR is calculated.

At the moment, it is only possible to control the FDR by changing the Protein Set Score threshold. Three different protein set scoring functions are available.

Given an expected FDR, the system tries to estimate the best score threshold to reach this FDR. Two validation rules (R1 and R2) corresponding to two different groups of protein sets (see below the detailed procedure) are optimized by the algorithm. Each rule defines the optimum score threshold allowing to obtain the closest FDR to the expected one for the corresponding group of protein sets.

Here is the procedure used for FDR optimization:

The separation of proteins sets in two groups allows to increase the power of discrimination between target and decoy hits. Indeed, the score threshold of the G1 group is often much higher than the G2 one. If we were using a single average threshold, this would reduce the number of G2 validated proteins, leading to a decrease in sensitivity for a same value of FDR.

Identification Summary Filtering

Any Identification Summary, generated by validation or merging could be filtered.

Filtering consists in invalidating Protein Sets which doesn't follow specified criteria. Invalidated Protein Sets are not been taken into account for further algorithms or display.

Available filtering criteria are defined below.

Specific peptides Filter

This filter invalidates protein sets that don't have at least x peptides identifying only that protein set. The specificity is considered at the DataSet level.

This filtering go through all Protein Sets from worse score to best score. For each, if the protein set is invalidated, associated peptides properties are updated before going to next protein set. Peptide property is the number of identified protein sets.

Peptides count Filter

This filter invalidates protein sets that don't have at least x peptides identifying that protein set, independently of the number of protein sets identified by the same peptide.

This filtering go through all Protein Sets. For each, if the protein set is invalidated, associated peptides properties are updated before going to next protein set. Peptide property is the number of identified protein sets.

Peptide sequence count Filter

This filter invalidates protein sets that don't have at least x different peptide sequences (independently of PTMs) identifying that protein set.

This filtering go through all Protein Sets from worse score to best score. For each, if the protein set is invalidated, associated peptides properties are updated before going to next protein set. Peptide property is the number of identified protein sets.

Protein set score Filter

This filter invalidates protein sets which score is below the a given value.

Search Results Merging

Two king of merge is allowed in Proline.

Merge using aggregation 

Merging several Search Results consists in creating a parent Search Result which contains the best child PSM for each peptide. The best PSM is the PSM with the highest score.

Merge using union

Merging several Search Results consists in creating a parent Search Result which contains all child PSMs from each child Search Result. Warning: Depending on the size of the child Search Result, this operation may be long and size of project databases may increase quickly.

Once Search Results have been merged, only Parent Search Result can be directly validated. However, this validation could be propagated to the child. In this case Peptide Matches filtering and Validation will be applied on childs as well as Protein Sets filtering. Note: actually, Protein Sets validation is not propagated to childs Search Results.

Another merge algorithm could be used: see Merge Identification Summaries 

Identification Summaries Merging

Merge using aggregation

Merging several Identification Summaries consists in creating a parent Identification Summary which contains the best child PSM for each peptide ( The best PSM is the PSM with the highest score).

A Search Result corresponding to this parent Identification Summary is generated and  Protein Inference is applied.

Merge using union

Merging several Identification Summaries consists in creating a parent Identification Summary which contains all validated PSMs from child Identification Summaries.

A Search Result corresponding to this parent Identification Summary is generated and  Protein Inference is applied.

Even in union mode this operation should less time and size consuming as only validated PSMs are taken into account.

Compare Identification Summaries with Spectral Count

Definition

Example calculation of spectral count

Specificity and weight reference

The peptide specificity and the spectral count weight could be defined in the context of the Identification Summary where the spectral count is calculated as shown in previous schema. It could also could be done using another Identification Summary as reference, like using the common parent Identification Summary. This allow to consider only identified and validated protein in the merge context.

If we consider the following case, where Sample1 Identification Summary is the merge of Replicat1 and Replicat2.

If the spectral count calculation is done at each child level, aligning protein sets identified in parent to protein sets in child, we get the following result:

 

Sample1 Protein Sets

Replicat1

Replicat2

Ref Prot.

BSC

SSC

WSC

Ref Prot.

BSC

SSC

WSC

P2

P2

5

2

4

P3

7

7

7

P3

P3

4

1

2

P3

7

7

7

We can see that when different parent protein sets are seen as one protein set in a child, the spectral count is biased. This calculation was not retain! 

Now, if we align on child protein rather than protein sets, we get the following result:

Sample1 Protein Sets

Replicat1

Replicat2

Ref Prot.

BSC

SSC

WSC

Ref Prot.

BSC

SSC

WSC

P2

P2

5

2

4

P2

2

0

0

P3

P3

4

1

2

P3

7

7

7

Again, when considering specificity at child level, the result of spectral count in Replicat2 is not representative, as it has a null SSC and WSC. This calculation was not retain! 

A way to make some correction is to define the specificity of the peptide and their weight at the parent level, and apply it at the child level. Therefore, specific peptides for P2 is pe8 and for P3 it is pe6 and pe7. For peptide weight, if we consider pe4 for example, it will be define as follow:

The spectral count result will thus be:

Sample1 Protein Sets

Replicat1

Replicat2

Ref Prot.

BSC

SSC

WSC

Ref Prot.

BSC

SSC

WSC

P2

P2

5

2

2.75

P2

2

0

0.5

P3

P3

4

1

3.25

P3

7

5

6.5

NOTE: 

In case of multiple level hierarchy (Sample → Condition1 vs Condition2 → 3 replicates by conditions), it could make sense to calculate the spectral count weight at “Condition1” and “Condition2” levels rather than “Sample” level to keep the difference involved by the experiment condition.

In Proline

Actually, spectral count is calculated for a set of hierarchy related Identification Summaries. In other words, this means that Identification Summaries should have a common parent. The list of protein to compare or to consider is created at the parent level as the peptide specificity. User can select the dataset where the shared peptides spectral count weight is calculated. (See previous chapter)

Firstly, the peptide spectral count is calculated using following rules

Once, peptide spectral count is known for each peptide, protein spectral count is calculated using following rules

FAQ

Why the BSC is less than Peptide Count ?

When running SC even on a simple hierarchy (1 parent, 2 childs) in some case we obtain a BSC less than peptide count. This occurs only for Invalid Protein Sets. Invalid Protein Sets are the one that are present at the parent level but was filtered at child level (filter on specific peptide for example).

Indeed, the peptide count value is read in the child Protein Sets. On the other hand, the BSC is calculated by getting the spectral count information at child level for each peptide identified at parent level. If a Protein Set is invalidated, its peptide are not taken into account while merging so some of them could be missing at parent level if there were not identified in the other child.

This case is illustrated here

PepCountAndSC.png

LC-MS Quantitation: Principles

This section will describes in details the quantitation principles and concepts implemented in Proline:

 

Label-free LC-MS quantitation workflow

Analyzing Label-free LC-MS data requires a series of algorithms presented below.

Figure 1 : overview of the different stages of label-free LC-MS data processing

Generation of the LC-MS maps

LC-MS maps are created directly through Proline by using its own algorithms.

These algorithms take advantage of the mzDB file format to process ions not scan by scan but in the m/z dimension by processing elution peaks in LC-MS regions of 5 Da width, also called run slices.

Proline provides multiple algorithms

Unsupervised detection algorithm

The preferred algorithm is Proline is the one performing an unsupervised detection of chromatographic peaks (also called peakels).

The detection of all the MS signals is made in a single iteration on all the run slices of the mzDB file. Thus all peptide signals whose mass is contained in a specific run slice are detected simultaneously.

In a given run slice, m/z peaks are first sorted in descending intensity order. Starting from the most intense m/z peak, the algorithm searches for the peak of the same m/z value in the preceding and following MS1 scans according to a user-defined m/z tolerance depending on the mass analyzer. The lookup procedure stops as soon as the m/z peak is absent in more than a predefined number of consecutive scans. Once collected this m/z peak list, which is comparable to a chromatogram, is then smoothed using a Savitzky-Golay filter and split in the time dimension to form elution peaks by applying a peak picking procedure that searches for significant minima and maxima of signal intensity. At this stage, the process returns a list of elution peaks defined by an m/z value, an apex elution time and an elution time range.

The deisotoping of the obtained peaks is performed separately using two possible algorithms:

MS2 driven algorithm

This alternative algorithm performs a target signal extraction rather than an unspervised detection. It has been now deprecated but I think it is important to understand the differences with the unsupervised approach.

In this second algorithm, we consider each MS/MS event triggered by the spectrometer as an evidence for the presence of a peptide ion. Each of these events provides a set of information about the targeted precursor ion: the m/z ratio (assuming it is monoisotopic), the moment when the MS/MS has been triggered (usually not the maximum of the elution peak) and the charge state of the ion. The first and second information can be considered as close coordinates for the peptide signal on the LC-MS map. The charge state (z) can provide additional information to simplify the extraction of different isotopes of the features which are approximately separated by 1/z. For each MS/MS event:

  1. The run slice containing the precursor m/z of the MS/MS event is retrieved (default window is 5 Da, more details in the mzDB documentation), as well as the following run slice, in order to load into memory everything about the peptidic signal including the isotopes. The XIC for the MS/MS precursor mass can be then easily accessed with a user defined mass precision (default is 5 ppm).

 

  1. The apex of the monoisotopic mass elution peak does not exactly fit to the moment the MS/MS was triggered. Knowing that, the signal on the XIC is integrated on both sides of the moment the MS/MS was triggered (default value is 10 scans) to determine the ascendant slope in order to find the apex. The integration of the signal is done by summing the intensities of n isotopes, n being a user-defined value (default value is 3, including the monoisotopic peak).

  1. For each isotopic profile, the intensities are extracted allowing gaps (default value is 1) until a minimal intensity is reached. This minimal intensity is defined as a ratio of the detected apex intensity (default value is 0.001%). Only one extraction is done per spectrum, hence reducing the extraction time (theoretically).

 

  1. The peak is detected on the extracted signal corresponding to the isotope signal with the highest relative intensity predicted by the averagine (most of the time it corresponds to the monoisotopic peak, in conventional conditions such as tryptic digestion). The limits of this peak are used to tune all the limits of the isotopes (elution peaks). To do so, two different algorithms are being tested:
  1. “Basic” algorithm: applying a Savitzky-Golay smoothing then looking for the local highest point.
  1. Wavelet-based algorithm: using multiple wavelet transformed curves to determine the position of the peaks
  1. The last step consists in extracting the peptide signals containing a strong overlap with the previously extracted signal (especially with the first two isotopes).

Feature clustering

Maps generated with peak picking algorithms cannot be 100% reliable and often contain redundant signals, corresponding to the same compound. Furthermore, modified peptides having the same sequence can have different PTM polymorphisms that can give different MS signals with the same m/z ratio but having slightly different retention times. Comparing LC-MS maps with such cases is a problem as it may lead to an inversion of feature matches between maps. Creating feature clusters is a way to avoid this issue. This operation is called “Clustering” (cf. figure 2).

Figure 2: grouping features into cluster. All features with the same charge state, close m/z ratio and retention times are grouped in a single cluster. The other features are stored without clustering.

Feature clustering consists in grouping, in a given LC-MS map, the features with the same charge state, close in retention time and m/z ratio (default tolerances are respectively 15 seconds and 10 ppm). Some metrics are calculated for each cluster (equivalent to those used for the features):

The resulting maps are “cleaner”, thus reducing ambiguities for map alignment and comparison. Quantitative data extracted from these maps will be processes in the following steps. It is necessary to eliminate the ambiguities found by the clustering step. To do so, it is possible to rely on the information given by the search engine on each identified peptide. If some ambiguities remain, the end user must be aware of them and be able to either manually handle them or either exclude them from the analysis.

Note: do not mix up clustering and deconvolution which consists in grouping all the charge states detected for a single molecule.

LC-MS map alignment

Feature matching

Because chromatographic separation is not completely reproducible, LC-MS maps must be aligned before being compared. The first step of the alignment algorithm is to randomly pick a reference map and then compare every other map to it. On each comparison, the algorithm will determine all possible matches between detected features, considering time and mass windows (the default values are respectively 600 seconds and 10 ppm). Only landmarks involving unambiguous links between the maps (only one feature on each map) are kept (cf. figure 3).

Figure 3 : Matching features with the reference map respecting a mass (10 ppm) and time tolerance (600 s)

The result of this alignment algorithm can be represented with a scatter plot (cf. figure 5).

Selection of the reference map

The algorithm completes this alignment process several times with randomly chosen reference maps. Then it sums the absolute values of the distance between each map to an average map (cf. figure 4). The map with the lowest sum is the closest to the other maps and will be considered as the final reference map from this point.

Figure 4: Selection of the reference map. The chart on the left shows the time distances between each map and the average map obtained by multiple alignments. The chart on the right summarizes the integration of each curve in the chart on the left. The map closest to the average map is selected as the reference map.

Two algorithms have been implemented to make this selection.

Exhaustive algorithm

This algorithm considers every possible couple between maps:

  1. For each map, compute the distance in time to all the other maps (Sum of the distances in seconds)
  1. The reference map is the one with the lowest distance

Iterative algorithm

  1. Randomly select a reference map
  1. Align this map with all the other maps
  1. Compute the distance in time to all the other maps
  1. The new reference map is the one with the lowest distance
  1. Steps 2 to 4 are repeated unless
  1. the reference map remains the same for two consecutive iteration
  1. the maximum number of iteration is reached (default value is 3)

Alignment smoothing

The last thing to do is to find the path going through the regions with the highest density of points in the scatter plot. This step was implemented using a moving median smoothing (cf. figure 5).

Figure 5: Alignment smoothing of two maps using a moving median calculation. The scatter plot represents the time variation (in seconds) of multiple landmarks (between the compared map and the reference map) against the observed time (in seconds) in the reference map. A user-defined window is moved along the plot, computing on each step a median time difference (left plot). The smoothed alignment curve is constituted of all the median values (right plot).

Creation of the master map

Once the maps have been corrected and aligned, the final step consists in creating a consensus map or master map. It is produced by searching the best match for each feature detected on different maps. The master map can be seen as a representation of all the features detected on the maps, without redundancy. (cf. figure 6).

Figure 6: Creation of the master map by matching the features detected on two LC-MS maps. The elution times used here are the ones corrected by the alignment step. The intensity of a feature can vary from one map to another, it can also happen that a feature appears in only one map.

During the creation of the master map, the algorithm first considers matches for the most intense features (higher than a given threshold), and then consider the other features only if they match a feature with a high intensity in another map. This is done in order to avoid to include background noise to the master map (cf. figure 7).

Figure 7: Distribution of the intensities of the maps considered to build the master map. The construction is done in 3 steps: 1) removing features with a normalized intensity lower than a given threshold 2) matching the most intense features 3) features without matches in at least one map are compared again with the low intensity features, put aside in first step.

“Predicted time extractor” algorithm

This algorithm is used for cross-assignment, when a peptidic signal is detected in a file but does not have an equivalent signal in another one (frequently in DDA). In this case, the algorithm tries to extract some signal from the file where the signal has not been found. The aim of this algorithm is to reduce the number of missing values.

  1. Extracting a 4-minutes XIC (user-defined value) around:
  1. the time predicted by the alignment
  1. the ratio m/z of the isotope with the highest intensity predicted by the averagine (which is estimated from the mean value of the m/z of the observed signals in other conditions)
  1. The isotopic patterns are extracted for each spectrum. Many peptide signals can be detected and need to be filtered in order to find the best candidate.
  1. To do so, we verify beforehand that :
  1. The chromatographic elution peaks of the monoisotopic mass are really corresponding to monoisotopic masses: i.e., if no elution peak P is present before the considered monoisotopic mass M that has a difference of mass equal to 1.0027/z (z being the charge of M), having a distance apex-to-apex (P vs. M) lower than a user-defined threshold of number of cycles (default value is 5), a Pearson correlation higher than a user-defined threshold (default value is 0.7) and finally a P/M area ratio agreeing with the predicted value for P using averagine.
  1. If needed a filter of the duration of a peptide signal (which is usually peptide-specific).
  2. Considering the signals close to each other in time (elution time at the apex vs. predicted time).
  3. Considering the signals close to each other in m/z ratio.

Solving conflicts

It has been seen that ambiguous features with close m/z and retention times can be grouped into clusters. Other conflicts are also generated during the creation of the master map due to wrong matches. Adding the peptide sequence is the key to solve these conflicts by identifying without ambiguity a feature. Proline has access to the list of all identified and validated PSMs as well as the identifier (id) of each MS/MS spectrum related to an identification. This means that the link between the scan id and the peptide id is known. On the other hand the list of MS/MS events simultaneous to the elution window of each feature is known. For each of these events the corresponding peptide sequences can be retrieved. If only one peptide sequence is found for the master feature, it is be kept as it is. Otherwise the master feature is cloned in order to have one feature per peptide sequence. During this duplication step the daughter features are distributed on the new master features according to the identified peptide sequences.

Cross assignment

When the master map is created some intensity values could be missing. Proline reads the mzDB files to reduce the number of missing values, using the expected coordinates (m/z – RT) for each missing feature to extract new features. These new extractions are added to copies of the daughters and the master maps. This gives a new master map with a limited number of missing values.

Normalizing LC-MS maps

The comparison of LC-MS maps is confronted to another problem which is the variability of the MS signals measured by the instrument. This variability can be technical or biological. Technical variations between MS signals in two analyses can depend on the injected quantity of material, the reproducibility of the instrument configuration and also the software used for the signal processing. The observed systematic biases on the intensity measurements between two successive and similar analysis are mainly due to errors in the total amount of injected material in each case, or the LC-MS system instabilities that can cause variable performances during a series of analysis and thus a different response in MS signal for peptides having the same abundance. Data may not be used if the difference is too important. It is always recommended to do a quality control of the acquisition before considering any computational analysis. However, there are always biases in any analytic measurement but they can usually be fixed by normalizing the signals. Numerous normalization methods have been developed, each of them using a different mathematical approach (Christin, Bischoff et al. 2011). Methods are usually split in two categories, linear and non-linear calculation methods, and it has been demonstrated that linear methods can fix most of the biases (Callister, Barry et al. 2006). Three different linear methods have been implemented in Proline by calculating normalization factors as the ratio of the sum of the intensities, as the ratio of the median of the intensities, or as the ratio of the median of the intensities.

Sum of the intensities

How to calculate this factor:

  1. For each map, sum the intensities of the features
  1. The reference map is the median map
  1. The normalization factor of a map = sum of the intensities of the reference map / sum of the intensities of the map

Median of the intensities

How to calculate this factor:

  1. For each map, calculate the median of the intensities in the map
  1. The reference map is the median map
  1. The normalization factor of a map = median of the intensities of the reference map / median of the intensities of the map

Median of ratios

This last strategy has been published in 2006 (Dieterle, Ross et al. 2006) and gives the best results. It consists in calculating the intensity ratios between two maps to be compared then set the normalization factor as the inverse value of the median of these ratios (cf. figure 8). The procedure is the following:

  1. For each map in a “map set”, sum the intensities of the features
  1. The reference map is the median map
  1. For each feature of the master map, ratio = intensity of the feature in the reference map / intensity of the feature for this map
  1. Normalization factor = median of these ratios

Figure 8 : Distribution of the ratios transformed in log2 and calculated with the intensities of features observed in two LC-MS maps. The red line representing the median is slightly off-centered. The normalization factor is equal to the inverse of this median value. The normalization process will refocus the ratio distribution on 0 which is represented by the black arrow.

Proline makes this normalization process for each match with the reference map and has a normalization factor for each map, independently of the choice of the algorithm. The normalization factor for the reference map is equal to 1.

Building a "QuantResultSummary"

Once the master map is normalized, it is stored in the Proline LCMS database and used to create a “QuantResultSummary”. This object links the quantitative data to the identification summaries validated in Proline. This “QuantResultSummary” is then stored in the Proline MSI database (cf. figure 9).

Figure 9: From raw files to the « QuantResultSummary » object.

Quantitation: configuration

The first quantitation step as well as the advanced quantitation (see Quantitation: principles) has some parameters that could be modified by the user.

Label-free LC-MS quantitation configuration

Here is the description of the parameters that could be modified by the user.

Feature extraction Strategy

Detection parameters

These parameters are used by signal extraction algorithms .

Clustering parameters

Clustering must be applied to the imported LC-MS maps to group features that are close in time and m/z. This step reduces ambiguities and errors that could occur during the feature mapping phase.

Alignment Computation (Optional)

This is an important step in the LCMS process. It consists in aligning maps of the map set to correct the RT values. RT shifts of shared features between the compared maps follow a curve reflecting the fluctuations of the LC separation. The time deviation trend is obtained by computing a moving median using a smoothing algorithm. This trend is then used as a model of the alignment of the compared LC-MS maps. This model provides a basis for the correction of RT values.

Then all other maps are aligned to this computed reference map and their retention times are corrected.

Feature mapping

Alignment smoothing

When alignment is done, a trend can be extracted with a smoothing method permitting the correction of the aligned map retention time.

Master map creation/Cross Assignment (optional)

This step consists in creating the “master map” (also called consensus map), this map resulting from the superimposition of all compared maps.

Feature mapping

Filtering/Correction

  1. Median ratio normalization method algorithm: first, compute sum of feature intensities for each map of the map set and sort maps by computed intensities. The map ranking nearest from the median is taken as the reference map. Then for each master map feature, compute ratio as reference map feature intensity / feature intensity for the considered map. The normalization factor corresponds to the median of the computed ratios.
  1. Median normalization method algorithm: first compute median intensity for each map, set the reference map to median map, normalization factor for map M = reference map median intensity / map M median intensity.
  1. Sum normalization method algorithm: first, compute feature intensities sums for each map, set the reference map to the median map, normalization factor for map M = intensities sum of reference map / intensities sum of map M.

If you choose Relative intensity for master feature filter type, the only possibility you have is percent, so features which intensities are beyond the relative intensity threshold in percentage of the median intensity are removed. If you choose Intensity for master feature filter type, you also have only one possibility at the moment of the intensity method: basic. Features which intensities are beyond the intensity threshold are removed and not considered for the master map building process.

Post-processing of LC-MS quantitative results

This procedure is used to compute ratios of peptide and protein abundances. Several filters can also be set to increase the quality of quantitative results.

Here is the description of the parameters that could be modified by the user.

Peptide filters

Peptide and protein common parameters

Aggregation of peptides in proteins

Peptide abundances can be summarized into protein abundances using several mathematical methods:

Identification Summary Export

When exporting a whole Identification Summary in an excel file, the following sheets may be generated:


How to

Note: Read the Concepts & Principles documentation to understand main concepts and algorithms used in Proline.

Proline Studio

Creation/Deletion

Display

Save, import and export

Algorithm and other operation

Quantitation

Proline Web

Workflow

Users management

Display

Save, import and export

Algorithm and other operation

Proline Studio

List of Abbreviations

Calc. Mass: Calculated Mass

Delta MoZ: Delta Mass to Charge Ratio

Exp. MoZ: Experimental Mass to Charge Ratio

Ion Parent Int.: Ion Parent Intensity

Missed Cl.: Missed Cleavage

Modification D. Mass: Modification Delta Mass

Modification Loc.: Modification Location

Next AA: Next Amino-Acid

Prev. AA: Previous Amino-Acid

Protein Loc.: Protein Location of the Modification

Protein S. Matches: Protein Set Matches

PSM: Peptide Spectrum Match

PTM: Post Translational Modification

PTM D. Mass: PTM Delta Mass

RT: Retention Time

Server Connection

When you start Proline Studio for the first time, the Server Connection Dialog is automatically displayed.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\login_dialog.png

You must fill the following fields:

- Server Host: this information must be asked to your IT Administrator. It corresponds to the Proline server name

- User: your username (an account must have been previously created by the IT Administrator).

- Password: password corresponding to your account (username).

If the field “Remember Password” is checked, the password is saved for future use. Server connection dialog continues to open with Proline Studio, the user though does not need to fill in his password, unless the last one is changed after his last login.

Create a New Project

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\addprojectpopupprolinev3.png                D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\addprojectdialogprolinev3.png

To create a Project, click on “+“ button at the right of the Project Combobox. The Add Project Dialog opens.

Fill the following fields:

- Name: name of your project

- Description: description of your project

You can specify other people to share this new project with them. Then click on OK Button

Creation of a Project can takes a few seconds. During its creation, the Project is displayed grayed with a small hourglass over it.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newprojectcreatedprolinev3.png

Create a Dataset

In the Identification tree, you can create a Dataset to group your data

To create a Dataset:

- right click on Identifications or on a Dataset to display the popup.

- click on the menu “Add Dataset…”

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\adddatasetpopup.png

On the dialog opened:

- fill the name of the Dataset

- choose the type of the Dataset

- optional: click on “Create Multiple Datasets” and select the number of datasets you want to create

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\addaggregatewindowprolinev3.png

Let's see the result of the creation of 3 datasets named “Replicate”:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\addaggregateresultprolinev4.png

Create a Folder

In both Identification and Quantitation tree, you can create Folders to organize your data

To create a Folder :

- right click on Identifications, Quantitations or on a Folder to display the popup.

- click on the menu “Add Identification Folder…” or “Add Quantitation Folder…”

Import a Search Result

There are two possibilities to import Search Results:

- import multiple Search Results in “All Imported” and put them later in different datasets.

- import directly a Search Result in a dataset.

Import in "All Imported"

Mascot/X!Tandem/OMSSA/MzIdentML

To import in “All Imported”:

- right click on “All Imported” to show the popup

- click on the menu “Import Search Result…”

In the Import Search Results Dialog:

- select the file(s) you want to import thanks to the file button (the Parser will be automatically selected according to the type of file selected)

- select the different parameters (see description below)

- click on OK button

Note 1: You can only browse the files accessible from the server according to the configuration done by your IT Administrator. Ask him if your files are not reachable. (Look for Setting up Mount-points paragraph in Installation & Setup page).

Note 2: Proline is able to import OMSSA files compressed with BZip2.

Note 3: MaxQuant import will generate a dataset hierarchy with the result from the different acquisition.

Parameters description: 

Importing a Search Result can take some time. While the import is not finished, the “All Imported” is shown grayed with an hourglass and you can follow the imports in the Tasks Log Window (Menu Window > Tasks Log to show it).         

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\importidentificationlogv3.png

To show all the Search Results imported, double click on “All Imported”, or right click to popup the contextual menu and select “Display List”

From the All Imported window, you can drag and drop one or multiple Search Result to an existing dataset.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\draganddropsearchresultprolinev3.png

Import directly in a Dataset

It is possible to import a Search Result directly in a Dataset. In this case, the Search Result is available in “All Imported” too.

Mascot/X!Tandem/OMSSA/MzIdentML

To import a Search Result in a Dataset, right click on a dataset and then click on “Import Search Result…” menu. Same dialog and parameters as in “Import in “All Imported”” above will be displayed.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\importsearchresultpopup.png

 

MaxQuant

To import a MaxQuant Search Result, right click on a dataset and then select “Import MaxQuant”

The following dialog will be displayed

- select the directory containing the files generated by MaxQuant.  This folder should look like:
        <root_folder>\mqpar.xml
        <root_folder>\combined\txt\summary.txt
        <root_folder>\combined\txt\proteinGroups.txt
        <root_folder>\combined\txt\parameters.txt
        <root_folder>\combined\txt\msmsScans.txt
        <root_folder>\combined\txt\msms.txt

- select the Instrument: mass-spectrometer used for sample analysis different parameters

- specify, if needed, the regular expression to extract protein accessions from  MaxQuant protein ids.

- you can choose to import also quantitation data

- click on OK button

Delete Data

You can delete Search Results, Identification Summaries and Datasets in the data tree. You can also delete XIC or Spectral Counts in the quantitation tree.

Delete the Datasets (identification or quantitation…) from the tree view (Search Result always accessible from “All Imported” view…).

There are two ways to delete data: use the contextual popup or drag and drop data to the Trash.

Delete Data from the contextual popup

Select the data you want to delete, right-click to open the contextual menu and click on delete menu.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\deletedatasetpopup.png

The selected data is put in the Trash. So it is possible to restore it while the Trash has not been emptied.

Delete Data by Drag and Drop

Select the data you want to delete and drag it to the Trash. It is possible to restore data while the Trash has not been emptied

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\deletedatasetprolinedndprolinev3.png

Empty the Trash

To empty the Trash, you have to Right click on it and select the “Empty Trash” menu.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\emptytrashprolinev3.png

A confirmation dialog is displayed and if accepted Dataset will be removed from the Trash.

Search Results are not completely removed, you can retrieve them from the “All Imported” window.

Delete a Project

It is not possible to delete a Project by yourself. If you need to do it, ask to your IT Administrator.

Connection Management

Once user is connected (see Server Connection), it is possible to:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\reconnect_dialog.png

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\changepassword.png

Display MS Queries, Peptides/PSM or Proteins of a Search Result

Functionality Access

To display data of a Search Result:

- right click on a Search Result

- click on the menu “Display Search Result >” and on the sub-menu “MSQueries” or “PSM” or “Proteins”

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displaysearchresultpopup.png

MSQueries Window

If you click on MSQueries sub-menu, you obtain this window:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\msquerysearchresult.png

Upper View: list of MSQueries.

Bottom Window: list of all Peptides linked to the current selected MSQuery.

Note: Abbreviations used are listed here 

Peptides/PSM Window

If you click on PSM sub-menu, you obtain this window:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displaysearchresultpsmdialogv3.png

Upper View: list of all PSM/Peptides.

Middle View: Spectrum, Spectrum Error and Fragmentation Table of the selected PSM. If no annotation is displayed, you can generate Spectrum Matches by clicking on the according button

Bottom Window: list of all Proteins containing the currently selected Peptide.

Note: Abbreviations used are listed here 

Proteins Window

If you click on Proteins sub-menu, you obtain this window:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displaysearchresultproteindialogv3.png

Upper View: list of all Proteins

Bottom View: list of all Peptides for the selected Protein.

Note: Abbreviations used are listed here 

Display MS Queries, PSM, Peptides, Protein Sets, PTM Protein Sites or Adjacency Matrices of an Identification Summary

Functionality Access

To display data of an Identification Summary:

- right click on an Identification Summary

- click on the menu “Display Identification Summary >” and on the sub-menu “MSQueries”, “PSM”, “Peptides”, “Protein Sets”, “PTM Protein Sites” or “Adjacency Matrix

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\popupdisplayidentification.png

MSQueries Window

If you click on MSQueries sub-menu, you obtain this window:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\msqueryrsm.png

Upper View: list of MSQueries.

Bottom Window: list of all Peptides linked to the current selected MSQuery.

Note: Abbreviations used are listed here 

PSM Window

If you click on PSM sub-menu, you obtain this window:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayidspsmdialogv3.png

Note: Abbreviations used are listed here 

Peptides Window

If you click on Peptides sub-menu, you obtain this window:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayidspetidesdialogv3.png

Upper View: list of all Peptides

Middle View: list of all Protein Sets containing the selected peptide.

Bottom Left View: list of all Proteins of the selected Protein Set

Bottom Right View: list of all Peptides of the selected Protein

Note: Abbreviations used are listed here 

Protein Sets Window

If you click on Protein Sets sub-menu, you obtain this window:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayidsproteinsetspopupv3.png

View 1 (at the top): list of all Protein Sets

Note: In the column Proteins, 8 (2, 6) means that there are 8 proteins in the protein set : 2 in the sameset, 6 in the subset.

View 2: list of all Proteins of the selected Protein Set.

View 3: list of all Peptides of the selected Protein

View 4: Protein Sequence of the previously selected Protein and Spectrum of the selected Peptide. Other tabs display Spectrum, Spectrum Error and Fragmentation Table.

Note: Abbreviations used are listed here 

PTM Protein Sites Window

If you click on PTM Protein Sites sub-menu, you obtain this window:

WARNING: This windows will be filled in only if you have first run “Identify PTM Sites” from the dataset menu.

View 1 (at the top): This view lists all PTM Protein Sites (PTM on Protein at specific location) by displaying best PSM for current site.  

View 2: For selected PTM Protein Site, list all peptides ( which matches. Best PSM for each peptide is displayed.

View 3: For selected peptide in View 2, display all associated PSMs.

View 4: For selected PSM in View 3, display all PSMs of same MSQuery.

The user can filter on PTM Modification (by choosing Modification in filters list).

In the following example, the user keeps only Oxidation on residue M. I it possible to specify no residue to accept all, or to specify a list of residues.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\ptmmodificationfilter.png

View 4: Protein Sequence, Spectrum, Spectrum Error, Fragmentation Table and Spectrum Values

Adjacency Matrix Window

If you click on Adjacency Matrix sub-menu, you obtain this window:

View 1: All the matrices. Each matrix correspond of a cluster composed of linked Proteins/Peptides.

Note: use the Search tool to display an Adjacency Matrix for a particular Protein or Peptide

View 2: The currently selected matrix.

In the example, you can see two different protein sets which share only two peptides.

Thanks to the settings you can hide proteins with exactly the same peptides. 

Display Additional Information on Search Result/Identification Summary

Functionality AccessD:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\propertiespopup.png

To display properties of a Search Result or Identification Summary:

Note: it is possible to select multiple Search Results/Identification Summaries to compare the values.

Properties Window

Property window opened:

General Information: Various information on the analysis (instrument name, peaklist software…)

Search Properties: Information extracted from the Result File (date, software version, search settings...)

Search Result Information: Amount of Queries, PSM and Proteins in the Search Result.

Identification Summary Information: Information obtained after validation process

Note 1: Validation parameters are tagged with “validation_properties / params”

Note 1: Validation results are tagged with “validation_properties / results”

Sql Ids: Database ids related to this item

Property window opened with multiple Identification summaries selected:

The color of the type column indicate if the values are the same (white) or different (yellow).

Display a Spectral Count

You can display a generated Spectral Count by using the right mouse popup.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayscpopup.png

To have more details about the results, see spectral_count_result 

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\sc.png

The overview is based by default on the weighted spectral count values. (Note: if you sort on the overview column, the sort is based on max (value-mean (values))/mean (values). So, you will obtain the most homogenous and confident rows first)

For each dataset, are displayed:

- status ( typical, sameset, / )

- peptide numbers

- the basic spectral count

- the specific spectral count

- the weighted spectral count (by default)

- the selection level

By clicking on the “Column Display Button” D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\columnsvisibility11x11.png, you can choose the information you want to display or change the overview.

Display a XIC

To display a XIC, right click on the selected XIC node in the Quantitation tree, and select “Display Abundances”, and then the level you want to display:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayxicpopup.png

Display Protein Sets

Protein Sets

By clicking on “Display Abundances” / “Protein Sets”, you can see all quantified protein sets. For each quantified protein set, you can see below all peptides linked to the selected protein set and peptides Ions linked to the selected peptide. For each peptide Ion, you can see the different features and the graph of the peakels in each quantitation channel.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_proteinset.png

The overview is based by default on the abundances values.

Note: if you sort on the overview column, the sort is based on max (value-mean (values))/mean (values). So, you obtain the most homogenous and confident rows first. 

For each quantitation channel, are displayed:

- the raw abundance

- the peptide match count (by default)

- the abundance (by default)

- the selection level

By clicking on the “Column Display Button” D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\columnsvisibility11x11.png, you can choose the information you want to display or change the overview.

To display the identification protein Set view, click right on the selected protein Set and select “Display Identification Protein Sets” menu in the popup.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_proteinset_ident.png

You can also display the identification summary result from the popup menu in the quantitation tree: D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_popup_ident.png

Peptides

A graph allows you to see the variations of the abundance (or raw abundance) of a peptide in the different quantitation channels: D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_peptide_graph.png

Features and Peakels

You can see the different features in the different quantitation channels and the graph of the peakels: D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_features.png

By clicking on D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\chart-arrow.pngyou can display either:

- the peaks of isotope 0 in all quantitation channels

- all isotopes for the selected quantitation channel:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_isotopes.png

By clicking on D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\wave.pngyou can see the chromatograms of the features and their first time scan and last time scan in mzScope. For more details see the mzScope section.

Display Peptides

By clicking on “Display Abundances“ / “Peptides”, you can see:

- identified and quantified Peptides

- non identified but quantified peptides

- identified but not quantified peptides (linked to a quantified protein)

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_peptides.png

Display Peptides Ions

By clicking on “Display Abundances” / “Peptides Ions”, you can see:

- all identified and quantified Peptides Ions

- non identified but quantified peptides Ions

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_display_peptideions.png

Display Experimental Design and Parameters

By clicking on “Exp. Design > Parameters”, you can see the experimental design and the parameters of the selected XIC.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayxicdesign.png

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displayxicparam.png

If you have launched the refinement of the protein sets abundances on the XIC, you can also display the refinement parameters.

Display Map Alignment

By clicking on “Exp. Design > Map Alignment”, you can see the map of the variation of the alignment of the maps compared to the map alignment of the selected XIC. You can also calculate the predicted time in a map from an elution time in another map.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\displaymapalignment.png

Create a custom User Window

You can lay out your own user window with the desired views.

You can do it from an already displayed window, or by using the right click mouse popup on a dataset like in the following example (Use menu “Search Result>New User Window…” or “Identification Summary>New User Window…”)

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newuserwindowpopup.png

In the example, the user has clicked on “Identification Summary>New User Window…” and selects the Peptides View as the first view of his window.

You can add other views by using the '+' button.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newuserwindowpeptides.png

In this example, the user has added a Spectrum View and he saves his window by clicking on the “Disk” Button.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newuserwindowpeptidesandspectrum.png

The user selects 'Peptides Spectrum' as his user window name

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newuserwindowsave.png

Now, the user can use his new 'Peptides Spectrum' on a different Identification Summary.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\newuserwindowpopupwithsavedwindow.png

Frame Toolbars Functionalities

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\viewtoolbarsprolinev3.png

A: Display Decoy Data.

B: Search in the Table. (Using * and ? wild cards)

C: Filter data displayed in the Table

D: Export data displayed in the Table

E: Send to Data Analyzer to compare data from different views

F: Create a Graphic : histogram or scatter plot

G: Right click on the marker bar to display Line Numbers or add Annotations/Bookmarks

H: Expands the frame to its maximum (other frames are hidden). Click again to undo.

I: Gather the frame with the previous one as a tab.

J: Split the last tab as a frame underneath

K: Remove the last Tab or Frame

L: Open a dialog to let the user add a View (as a Frame, a Tab or a splitted Frame)

M: Save the window as a user window, to display the same window with different data later

N: Export view as an image

O: Generate Spectrum Matches

Filter Tables

You can filter data displayed in the different tables thanks to the filter button at the top right corner of a table.

When you have clicked on the filter button, a dialog is opened. In this dialog you can select the columns of the table you want to filter thanks to the “+” button.

In the following example, we have added two filters:

- one on the Protein Name column (available wildcards are * to replace multiple characters and ? to replace one character)

- one on the Score Column (Score must be at least 100 and there is no maximum specified).

The result is all the proteins starting with GLPK (correspond to GLPK*) and with a score greater or equal than 100.

Note: for String filters, you can use the following wildcards: * matches zero or more characters, ? matches one character.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\filterresultprolinev2.png

Search Tables

In some tables, a Search Functionality is available thanks to the search button at the top right corner.

When you have clicked on the search button, a floating panel is opened. In this panel you can select the column searched and fill in the searched expression, or the value range.

For searched expressions, two wild cards are available:

In the following example, the user search for a protein set whose name contains “PGK”.

You can do an incremental search by clicking again on the search button of the floating panel, or by pressing the Enter key.

Graphics

Create a Graphic

There are two ways to obtain a graphic from data:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\histogrambutton.png

If you have clicked on the '+' button, the Add View Dialog is opened and you must select the Graphic View

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\addaview.png

Graphic options

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\histogram.png

A: Display/Remove Grid toggle button

B: Modify color of the graphic

C: Lock/Unlock incoming data. If it is unlocked, the graphic is updated when the user apply a new filter to the previous view (for instance Peptide Score >= 50) If it is locked, changing filtering on the previous view does not modify the graphic.

D: Select Data in the graphic according to data selected in table in the previous view.

E: Select data in the table of the previous view according to data selected in the graphic.

F: Export graphic to image

G: Select the graphic type: Scatter Plot / Histogram

H/I: Select data used for X / Y axis.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\logaxis.png

It is possible to select linear or log axis by right clicking on an axis.

Zooming / Selection

Zoom in: Press the right mouse button and drag to the right bottom direction. A red box is displayed. Release the mouse button when you have selected the area to zoom in.

Zoom out: Press the right mouse button and drag to the left top direction. When you release the mouse button, the zooming is reset to view all.

Select: Press the left mouse button and drag the mouse to surround the data you want to select. When you release the button, the selection is done. Or left click on the data you want to select. It is possible to use Ctrl key to add to the previous selection.

Unselect: Left click on an empty area to clear the selection.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\graphicselection.png


Quality Control

Search Result QC

You can run a Quality Control on any Search Result. It consists in a transversal view of the imported data: rather than visualising the results per PSM or Proteins, results are sorted according to the score, charge state...

 

Choose the menu option:

Settings

Configure some settings before launching the process

 

QC results

The report will appear in a matter of seconds (depending of the amount of data to be processed). You will get the following tabs:

Assigned and unassigned spectra: Pie chart presenting the ratio of assigned spectra

Score repartition for Target/Decoy data: Histogram presenting the amount of PSM per group of score, separating target and decoy data

PSM per charge and score: Histogram presenting the amount of PSM per group of score and charge state

Experimental M/z per charge and score: Box plot presenting M/z information for each category of score and charge state

Number of matches per minute of RT and score: histogram presenting the amount of PSM per score and retention time. This view is only calculated when retention time is available.

Each graph is also available in a table view

 


Ms Files Tab

In order to facilitate different actions on Ms Files, Proline Studio contains an homonym tab providing the end user with a view over his local and a remote file system, called Local File System and Proline Server File System respectively. Furthermore, through an appropriate popup menu, a series of actions can take place on the encountered .mzdb and .raw files, including among others the:

Apart from the popup menu supported functionality, since Proline Studio 1.5, conversions and uploads can be triggered via drag and drop mechanism. In this context, an .mzdb file’s drag and drop triggers a file upload while dragging and dropping a .raw file from the local site to the remote one provokes a conversion proceeded by its respective upload.

mzDB File Upload

As mentioned earlier, after selecting a number of files, the user can either drag and drop them inside the remote site, or use the popup menu as shown in the following screenshot. It is important to precise that both approaches are not compatible with a selected group consisting of different file types.

As we can see, clicking on upload opens a dedicated dialog packing a series of uploading options:

Furthermore, the dialog permits us to change the synthesis of the selected group by adding or removing supplementary .mzdb files. While the last option is significant as it defines the final destination, the first two, the creation of the parent file and the deletion after the successful upload are considered to be an aid to keep the destination folder well organized. Clicking OK on the dialog will make it disappear dispatching at the same time an uploading batch, consisting of an uploading task for each .mzdb file. Uploading tasks, as all tasks that are dispatched within Proline Studio, can be monitored at the Logs tab.

upload_dialog.png


Raw File Conversion

In the same way, when the user desires to convert and upload a raw file, he or she, can either drag and drop it on the Remote Site view, or use the respective dialog through the popup menu.

convert_dialog.png

As we can see, this dialog follows a very similar format and logic compared to the one used by the upload dialog. Apart from the already seen settings that regard the upload itself, we can distinguish a few more, purely regarding the process of conversion. The two most important options are the selected converter and the output path. The latter one corresponds to the directory where the .mzdb file will be created when raw2mzDB.exe will finish its cycle. It must be noted that if a conversion has never taken place before, there is a very high probability that the converter’s option is blank, unless a default converter has been defined through the general settings dialog. Furthermore, this value among others, is saved or updated everytime the user clicks on OK, so that the latter one does not have to fill in all the fields when he or she wants to perform a conversion.

Dragging and dropping a .raw file at the remote view follows a little bit different approach. Since we cannot upload a raw file to the server, the drag and drop action basically correspond to a .raw file conversion followed by the respective upload. Given the fact that when dragging a file no dialog appears, the conversion is based on the default converter set in the general settings window. If such setting has not yet taken place, a popup dialog will appear demanding the selection of a converter to be used. The latter one also updates the aforementioned field in the general settings.

no_converter_dialog.png

TIC or BPI chromatogram

By default, the TIC chromatogram is displayed. You can click on “BPI” to see the best peak intensity graph.

By clicking in the graph, you can see below the scan at the selected time.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscope.png

You can choose to display 2 or more chromatograms on the same graph, by selecting 2 files and clicking on “View Data

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscopemultifile.pngYou can extract a chromatogram at a given mass by entering the specified value in the panel above.

Scan

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscopescan.png

You can navigate through the scans

- by increasing or decreasing the scan Ids

- by entering a retention time

- by clicking the keys arrows on the keyboard (Ctrl+Arrows to keep the same ms level)

By double clicking on the scan, the corresponding chromatogram is displayed above (The Alt key or the check box “XIC overlay” allows you to overlay the chromatograms in the same graph).

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscopechromato.png


Peakels

By selecting a file, you can click on “Detect Peakels” in the popup menu.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscope_detectpeakels_menu.png

A dialog allows you to choose the parameters of the peakels detection: the tolerance and eventually a range of m/z, or a m/z value:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\detectpeakelsparam.png

The results are displayed in a table:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscopepeakels.png

You can double-click (or through the popup menu) on a row to display the peakel in the corresponding raw file:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\mzscopepeakelview.png


Export Data / Image

There are many ways to do an export:

- Export a Table using the export button (supported formats: {xlsx, xls, csv})

- Export data using Copy/Paste from the selected rows of a Table to an application like Excel.

- Export all data corresponding to an Identification Summary, XIC or Spectral Count

- Export an image of a view

- Export Identification Summary data into Pride (ProteomeXchange) format.

- Export Identification Summary spectra list.

1. Export a Table

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\before_export.png

To export a table, click on the Export Button at the left top of a table.

An Export Dialog is opened, you can select the file path for the export and the format of the export (supported formats: {xlsx, xls, csv}).

In case that the selected format is either .xls or .xlsx, the user has now the ability to maintain in his exported excel document any rich text format elements (color, font weight etc.) apparent on the original table in Proline Studio. Choice is done using the checkbox shown on the following screenshot.

To perform the export, click on the Export Button. The task can take a few seconds if the table has a lot of rows and so a progress bar is displayed.

Note: the following feature regarding the export will be functional in the next release of Proline Studio.

2. Copy/Paste a Table

To copy/Paste a Table:

- Select rows you want to copy

- Press Ctrl and C keys at the same time

- Open your spreadsheet editor and press Ctrl and V keys at the same time to paste the copied rows.

3. Export an Identification Summary, a XIC or a Spectral Count

To Export all data of an Identification Summary, a XIC or a Spectral Count, right-click on the dataset to open the contextual menu and select the “Export” menu and then “Excel...” sub-menu.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\export_rsm_popup.png

You can also export multiple dataset simultaneously, if they have the same type (Identification Summary or XIC or Spectral Count).

An Export Dialog is opened, you can select the file path and the type of the export : Excel (.xlsx) or Tabulation separated values (.tsv).

You can export with the default parameters or perform a custom export. To enable custom export, click on the tickbox located on the right of the dialog:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\custom_export_a.png

Custom export allows a number of parameters in addition to the file format to be chosen:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\custom_export_b.png

You can enable/disable individual sheets, rename them, rename individual fields and move then up and down, disable them. You can also save your own configuration and load it later on, and even share it with colleagues. (configuration file stored locally).

Description of exported file is available here.

4. Export an Image

To export a graphics, click on the Export Image Button at the left top of the image.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\export_image_button.png

An Export Dialog is opened, you can select the file path for the export and the type of the export. You can export any images in PNG format or SVG format. SVG format produces a vector image that can be edited and resized afterwards.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\image_export_dialog.png

5. Export Identification Summary into Pride format

Note: Before exporting data to Pride format, all spectrum matches should have been generated. To do so, right click on the dataset and select “Generate Spectrum Matches”.

To export all data of an Identification Summary into Pride (ProteomeXchange compatible) format, you must right click on the dataset to open the contextual popup and select the “Export > Pride…” Menu.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\export_pride.png

Some additional information could be specified (some of them are required, those with a *) in the displayed dialogs.

a. Experimental Details: Specify Project and Experiment name and contact

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\exportpride_1.png

b. Protocol Description: Specify a name for the Protocol Description. Add at least one Step for the description by clicking on “Add Step” button.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\exportpride_protocol1.png

An ontology search dialog will be opened to help getting protocol step description

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\exportpride_protocol2.png

c. Sample Description: Specify a sample name and description. Species, Tissue and Cell Type is specified using Controlled Vocabulary. If desired data is not listed, you can click on the “other…” button to search through complete ontology.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\exportpride_3.png

6. Export Identification Summary spectra list

To export valid PSM Spectra from an Identification Summary or from a XIC Dataset. The exported tsv file is compatible with Peakview.

Note: all Spectrum Matches must be generated first.

Generate Spectrum Matches

When importing a Search Result in Proline, user can view PSM with their associated Spectrum but by default no annotation is defined. User can generate (and save) this information

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\howtogsm.png

How to validate a Search Result

See description of Validation Algorithm.

It is possible to validate identification Search Result or merged ones. In latest case, the filters and validation threshold can be propagated to child Search Results.

Starting Validation

 

To validate a Search Result:

- Select one or multiple Search Results to validate

- Right Click to display the popup

- Click on “Validate…” menu

Validation Dialog

In the Validation Dialog, fill the different Parameters (see Validation description):

- you can add multiple PSM Prefilter Parameters ( Rank, Length, Score, e-Value, Identity p-Value, Homology p-Value) by selecting them in the combobox and clicking on Add Button '+'

- you can ensure a FDR on PSM which will be reached according to the variable selected ( Score, e-Value, Identity p-Value, Homology p-Value,… )

- you can add a Protein Set Prefilter on Specific Peptides.

- you can ensure a FDR on protein Sets.

Note: FDR can be used only for Search Results with Decoy Data.

If you choose to propagate to child Search Result, specified prefilters will be used as defined. For the FDR Filter, it is the threshold found by the validation algorithm which will be used for childs, as a prefilter.

In the second tab, you can define rules for choosing the Typical Protein of a Protein Set by using a match string with wildcards ( * or ? ) on Protein Accession or Protein Description. (see Change Typical Protein of Protein Sets).

Validation Processing

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\validationprocessingprolinev2.png

Validating a Search Result can take some time. While it is not finished, the Search Results are shown greyed with an hourglass over them. The tasks are displayed as running in the “Tasks Log Dialog”.

Validation Done

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\validationdoneprolinev2.png

When the validation is finished, the icon becomes orange and blue. Orange part correspond to the Identification Summary. Blue is for the Search Result part.

How to filter Protein Sets

See description of Protein Sets Filtering.

:!:The protein sets windows are not updated after filtering Protein Set. You should close and reopen the window :!:

Starting filtering

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\filterprotset.png

To filter Protein sets of Identification Summaries:

- Select one or multiple Identification Summaries to filter

- Right Click to display the popup

- Click on “Filter ProteinSets…” menu

Filtering Dialog

you can add multiple filters (Specific Peptides, Peptide count, Peptide sequence count, Protein Set Score) by selecting them in the combobox and clicking on Add Button '+'

Once the filtering is done, you will have to open new protein sets window in order to see modification.

Change Typical Protein of Protein Sets

 The protein sets windows are not updated after changing Typical Protein. You should close and reopen the window :!:

Open the Dialog

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\changetypicalproteinpopupv2.png

To change the Typical Protein of the Protein Sets of an Identification Summary:

- Select one or multiple Identification Summaries

- Right Click to display the popup

- Click on “Change Typical Protein…” menu

Dialog Parameters

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\changetypicalproteindialogv2.png

You can set the choice for the Typical Protein of Protein Sets by using a match string with wildcards (* or ?) on Protein Accession or Protein Description.

For Advanced user, a fully regular expression could be specified. In this case, check the corresponding option.

Three rules could be specified. They are applied in priority order, i.e. if no protein of a protein set satisfy the first rule, the second one is tested and so on.

Processing

The modification of Typical Proteins can take some time. During the processing, Identification Summaries are displayed grayed with an hourglass and the tasks are displayed in the Tasks Log Window

Merge

Merge can be done on Search Results or on Identification Summaries. In both case, you can specify if aggregation mode or union mode should be used. See description for Search Results merging and Identification Summaries merging 

Merge on Search Results

To merge a dataset with multiple Search Results:

- Select the parent dataset

- Right Click to display the popup

- Click on “Merge” menu

When the merge is finished, the dataset is displayed with an M in the blue part of the icon, indicating that the Merge has been done at a Search Result level.

Merge on Identification Summaries

If you merge a dataset containing Identification Summaries. The merge is done on an Identification Summary level. Therefore the dataset is displayed with an M in the orange part of the icon.

Data Analyzer

The purpose of the Data Analyzer is to easily do calculations/comparisons on data.

To open the data analyzer, you have two possibilities:

- you can use the dedicated button that you can find in the toolbar of all views. If you use this button, the corresponding data is directly sent to the data analyzer.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\startdatamixer.png

- you can use the menu “Window > Data Analyzer”

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\menudataanalyzer.png

In the Data Analyzer view, you can access to all data views, to some functions and graphics. In the following example, we create a graph by adding by Drag & Drop the Spectral Count Data and the beta-binomial (BBinomial) function. Then we link them together.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\datamixerv2.png

You have to specify the parameters of the BBinomial Function: right click on the function and select the “settings” menu

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\datamixersettingsmenu.png

In the settings menu, select the two groups of columns on which you want to perform the BBinomial function. When the parameters are set, the calculation is started immediately and an hourglass icon is shown.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\datamixersettingsdialog.png

When the calculation is finished: the hourglass icon becomes a green tick, and the user can right click and select the “Display” menu to see the result.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\datamixerdisplaymenu.png

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\datamixerresult.png

Available Functions

Compute FDR Function

This function is used by ProStar Macro to compute the FDR.

More information: http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_UserManual.pdf 

Join Function

Join data of two tables according to selected key.

Diff Function

Perform a difference between two joined table data according to a selected key. When a key value is not found in one of the data source table, the line is displayed as empty. For numerical values a difference is done and for string values, the '<>' symbol is displayed when values are different.

Adjustp Function / calibration Plot

Calibration Plot for Proteomics is described here: https://cran.r-project.org/web/packages/cp4p/index.html 

BBinomial Function

bbinomial function is useful for Spectral Count Quantitations

Diff analysis Function

This function is used by ProStar Macro. Two tests are available: Welch t-test and Limma t-test.

More information: http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_UserManual.pdf 

Columns Filter Function

Columns filter, let the user remove unnecessary columns in a matrix. A combobox, with prefix and suffix of the columns, let you select multiple similar columns to filter then rapidly.

Rows Filter Function

Rows filter function let the user filter some rows of a matrix according to settings on columns.

Import CSV/TSV

This module lets you import data from a CSV or TSV file. Then you can do calculations and display these data directly in Proline Studio.

 

The separator is automatically selected according to the csv file. But you can modify it.

The preview zone display the first lines of the file as it will be loaded.

Expression Builder

The expression builder let you create an expression with built-in functions or comparator and variables (columns from the linked matrix). In the example, we calculate the mean of a column in the matrix.S

Missing values filter Function

This function is used by ProStar Macro to remove rows with too many missing quantitative values.

The available missing values algorithm are:

Missing values imputation Function

This function is used by ProStar Macro to impute missing values.

More information: http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_UserManual.pdf 

Normalize Function

This function is used by ProStar Macro to normalize quantitative values.

More information on algorithms: http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_UserManual.pdf 

pvalue and ttd Functions

pvalue and ttd functions are useful for XIC Quantitations

Prostar Macro

1 or 2: Add XIC Data to Data Analyzer from the Protein Set View or by importing data from a csv file.

3: Add Prostar Macro by a drag and drop and link XIC Data to the Macro. And do the calculation the by clicking on the button Process Graph.

During the process, the Data Analyzer will ask you settings for each function.

4: Filter unnecessary columns from your data if. Settings can be validated with no parameters if you don't need it.

5: Filter is needed only if you want to remove contaminants. Settings can be validated with no parameters if you don't need it.

6: Log is needed to log abundances (Data from Proline). For Data coming from MaxQuant, data is already logged.

7 to 13: follow the settings asked ( you can find some help in Prostar documentation, or information in corresponding functions.)

During the process, results will be automatically displayed:

14: FDR Result

15: Calibration Plots

16: Result Table with differential Proteins Table and the corresponding scatter plot. You can select differential proteins in the table, to import them in the scatter plot and create a colored group with them.

If you want to look at other results, right click on a function and select “Display in New Window”

Prostar User Manual:

http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_UserManual.pdf 

Prostar Tutorial :

http://bioconductor.org/packages/release/bioc/vignettes/Prostar/inst/doc/Prostar_Tutorial.pdf 

Calculator

Calculator lets you write python scripts to manipulate freely viewed data.

1) To open the calculator, click on the calculator icon (not available on all views for the moment)

On the left part of the calculator, you can access to all viewed data, double click to add a table or a column to the script.

2) Write your python script on the text area

3) Execute it by clicking on the green Arrow.

4) When the script has been executed, the results of the calculations (variables, new columns) are available in the “Results” tab. Double click on a new column to add it to the table. Or like in the example, directly add the column to the table programmatically.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\calculatorv2.png

Examples

Script to calculate a log column

#### Algorithm to calculate the logarithm of a column ####

# get the Table 3 which corresponds to table newSC Quanti Protein Set
t = Table.get(3)

# get the constant column 10 of the table t ( Specific SC column)
# mutable() is called to be able to modify data
specificSCCol = t[10].mutable()

# number of rows of the column
 nb = len(specificSCCol)

# loop on the data of the column
for i in range (0,nb):
 # calculate the log (NaN values for errors)
 v = specificSCCol[i]
 if v <= 0:
  specificSCCol[i] = float('NaN')
 else:
  specificSCCol[i] = math.log(v)

# set the column name which will be used to the user
specificSCCol.setColumnName("log(specificSC)")

# add the created column to the table t
t.addColumn(specificSCCol)

Script to perform a difference and a mean between two columns

#### Algorithm to perform a difference and a mean between two columns ####

t = Table.get(9)
colAbundance1 = t[3]
colAbundance2 = t[5]

# difference between two columns
colDiff = colAbundance1-colAbundance2

# set the name of the column
colDiff.setColumnName("diff")

# mean between two columns
colMean = (colAbundance1+colAbundance2)/2

# set the name of the column
colMean.setColumnName("mean")

# add columns to the table
t.addColumn(colDiff)
t.addColumn(colMean)

Script to perform a perform a pvalue and a ttd on a XIC quantitation table

#### Algorithm to perform a pvalue and a ttd on abundances column of a XIC quantitation ####
t = Table.get(1)

pvalueCol = Stats.pvalue( (t[2], t[3]), (t[4],t[5]) )
ttdCol = Stats.ttd( (t[2], t[3]), (t[4],t[5]) )

pvalueCol.setColumnName("pvalue")
ttdCol.setColumnName("ttd")

t.addColumn(pvalueCol)
t.addColumn(ttdCol)

Update Spectrum using Peaklist software

When importing search result, the software used for the peaklist creation has to be specified. This parameter is mandatory for the XIC quantitation as it is used to find scan number or RT in the spectrum title. Indeed, this information is then used to extract abundances in the raw files.

If an invalid software has been specified when importing, it is possible to change the peaklist software afterwards. This option is only valid for Identification DataSets.

Right click on the identification DataSet, and select “Update Spectrum using Peaklist software”

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\howtoupdatepeaklistsoftware.png

The following dialog will be displayed allowing user to select the peaklist software to used.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\updatespectrummatches.png

Spectral Count

See description of Compare Identification Summaries with Spectral Count.

Generate a Spectral Count

To obtain a spectral count, right click on a Dataset with merged Identification Summaries and select the “Compare with SC” menu in the popup. This Dataset is used as the reference Dataset and Protein Set list as well as specifics peptides are defined there.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\howtosc.png

In the Spectral Count window, fill the name and description of your Spectral Count and press Next.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\sc_step1.png

Then select the Identification Summaries on which you want to perform the Spectral Count and press Next.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\sc_step2.png

Finally select the DataSet where shared peptides spectral count weight should be calculated and press OK.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\sc_step3.png

A Spectral Count is created and added to the Quantitations Panel.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\sc_dataset.png

Display a Spectral Count

You can then display a Spectral Count, see Display a Spectral Count 

XIC Quantitation

For description on LC-MS Quantitation you can first read the principles in this page: Quantitation: principles 

Generate a XIC

There are two ways to generate a XIC, either by clicking on the “Extract Abundances” action or by selecting the “Clone & Extract Abundances” option, both being located on the respective pop-up menu as shown in the following screenshots. The difference among the two approaches lies on the fact that in  the second case, the new XIC is generated for an existing Experimental Design.

extract_abundances.png

clone_extract_abundances.png

Create Design

To create the Design of your XIC, drag and drop the desired identifications from the right panel to the left one. If you drop an identification on the XIC Node, Group and Sample parents' nodes will be automatically created. You can also directly drop on a Group or Sample Node. Regarding the name of the aforementioned parent nodes, the general rule that is followed dictates that if possible, a parent node name should be based on the children nodes’ earliest common ancestor, if one should exist. Furthermore, since Proline Studio 1.5, the user can further enhance his XiC Design by creating his own Samples and Groups which in turn can populate by dragging inside them elements from the identifications tree. Once more, going to the next step requires that no empty Group and Sample nodes exist within the design.

xic_step1_create_group_sample.png

Rename Design

Furthermore the user has the capacity to rename any node he wishes using a pop-up menu as shown in the screenshot below. Additionally, for Group, Sample or root nodes, renaming can be triggered either by pressing F2 while having the in-question node selected, or by consequently clicking three times on the aforementioned node. Using triple-click somes times can be a little confusing for the user as it can also lead to possible expansion/retraction in the case of a parent node. Finally, it is recommended that at least the XIC node is renamed.

xic_step1_rename.png

Link to Raw Files

In order to be able to perform any XiC design, all participating Sample Analysis nodes must be associated with a corresponding raw file. If this is not the case, corrective actions must be taken by the users in order to establish those associations. The association can be created at the second step of the XIC design using two different interface features. The first way to establish such a “connection” is by explicitly drag n drop an .mzdb file that is stored on the server file system. It must be noted that although it is possible to overwrite an existing association, the is no mechanism responsible for ensuring compatibility. So it is thus advised, either to avoid such action or to proceed with caution considering the encountered risks.

The second way to establish a missing connection is to use a newly introduced feature in Proline Studio, the Drop Zone. The latter one can be quite handy in cases where multiple associations are missing or when the plethora of uploaded .mzdb files intimidates the user from manually searching the files one by one. The feature itself is extremely easy to use as it just requires dragging a set of files or folders containing .mzdb files from the user part. As soon as a drop takes place, all missing connections will be automatically created as long as a matching .mzdb file has been dropped in the Drop zone. It must also be noted that since version 1.5, users have now at their disposal indices about the association source. Furthermore, in order to protect existing in database association from possible a possible corruption, the latter ones cannot be overwritten.

xic_drop_zone.png

XIC Parameters

When the XIC Design is finished, click on the next button and select the parameters. See Label-free LC-MS quantitation configuration to have more details about the different parameters.

The XIC parameters are not all displayed. You can display a complete set of parameters by clicking on the “Advances Parameters”.

Note: all the parameters are already set with default values.

XIC Results

Newly generated XIC designs are immediately added to the Quantitation Tree. Through the latter one, and via a popup menu, the end user has the capacity either to view a design’s properties as seen to the following screenshot, or to apply a series of actions on it, including among others:

xic_properties.png

Refine Protein Sets Abundances

Advanced Protein Sets abundances

Right click on the selected XIC node in the Quantitation tree, and select “Refine Proteins Sets Abundances…”

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\refinexicpopup.png

Configuration

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\xic_profilizer.png

In the dialog, you can:

- specify peptides to consider for quantitation

- configure parameters used for peptides quantitation

- configure parameters used for proteins quantitation

For more details, see Post-processing of LC-MS quantitative results 

Advanced XIC results

You can see the results by displaying the XIC (Display a XIC) or export them Export Data 


General Settings

Since version 1.4 Proline Studio includes a general settings dialog which can be accessed from the top menu bar clicking on “General Settings” as shown in the screenshot below.

access_general_settings.png

The latter one consists of a constantly but slowly growing number of user preferences regarding various aspects of the utilization of Proline Studio. Based on their context, for the time being, preferences are organized into the following four tabs:

JMS Settings

JMS Settings tab contains parameters that concern the exchange of messages between your local machine and the JMS Server. It should be made clear that compared to other preferences, preferences that are included in this tab should be treated with caution. Mistreating a communication preference can lead either to communication/connection problems or to users’ confusion to whether they are connected to the correct server version.

jms_settings.png

Service Request Queue Name

Parameter can be seen as a name which represents a server address. The parameter’s existence is justified by the fact that multiple server versions might run on the same server machine imposing the need to be able to canalize Studio’s messages appropriately.

Table Parameters

Table parameters’ tab encapsulates a short list of preferences regarding all tables generated throughout Proline Studio. More specifically those preferences control the arrangement of the participating columns as well as their respective width.

table_parameters.png

Columns Arrangement

This field dictates the spatial arrangement of table columns. Three arrangements are possible:

When “Automatic Columns Size” is used, all columns are width-wise readjusted in a way that they all fit to their container. Given that it is a “fit-to-screen” approach, it lacks scrollbars and does not guarantee the readability of the presented date, especially when the number of columns is high.

automatic.png

On the other hand a simpler approach that guarantees readability is to select “Fixed Column Size”. In this case all table columns have a fixed width, explicitly dictated by the user using the parameter “Column Width”.

fixed.png

The less clear option is “Smart Column Size” which serves as a trade off between the aforementioned ones. It tackles with the cases that either we have too many columns to visualize using the “Automatic” approach or too few and the selected default and globally applied width imposes unneeded scrollbars ending in hiding some columns at the same time. In this context, “Smart Column Size” can be seen as a simple rule based on the ratio between the mean column width needed in case of “Automatic Column Size” for a specific table and the globally selected width. For the sake of simplicity we have set a threshold of 0.7 or 70% which on its turn determines which one of the two first modes will be used given a table. If the ratio is smaller than 0.7 then the table in question will be presented in “Fixed Column Size” mode. On contraire if ratio is equal or greater than 0.7, then we consider that using “Automatic Column Size” mode is more appropriate as it balances between a possible slightly smaller than desired width and the possibility of hiding a column using scrollbars.

Column Width

The second preference is more or less self explanatory and corresponds to the globally desired columns’ width when fixed rune is applied, either directly or as a result of the smart mode.

General Application Settings

general_application_settings.png

In this tab we can find a diverse set of preferences regarding various tasks encountered in Proline Studio. For the time being those preferences are:

Default Search Name Source

Unlike the last three, Hide Getting Started Dialog is pretty much self explanatory. The second preference on the other hand, “Default Search Name Source” affects the way identification datasets are named on importation. For this preference we have three possible options:

Export Decorated

This parameter affects the .xls and .xlsx files that are produced in the process of export (client side). It could be easily described as a preservation of any existing Rich Text Feature in a table. (Colors, Font Weight etc.)

Use dataset type to create a XiC design by DnD

Conversion/Upload Settings

conversion_upload_settings.png

Converter (.exe)

Corresponds to the default raw2mzDB converter. While it is left at the discretion of the user, which version to choose, it must be noted that different versions do tend to work better with specific type raw files. Said that, it is also important to understand that in order for a conversion to be successful within Proline Studio, all system requirements set by the specific raw2mzDB version must be met.


Administration

Compared to the previous versions, 1.5 of Proline Studio has been enhanced with basic administration functionality. At this moment, this basic infrastructure permits an advanced user, who at the same time holds the status of Admin, to carry a limited range of tasks on accounts and registered in the system Peaklist softwares.

User Accounts

user_accounts.png

As we can see, regarding user accounts, the respective administration dialog offers us two possibilities, adding a new user account or modifying an existing one. At this time, both actions are quite simplistic, relying on minimalistic popup dialogs as seen in the following two screenshots.

create_new_user.png

modify_existing_user.png

As we can see, both actions share the same interface with the exception of the predefined and not editable login field in the case of user account modification. Finally, it is important to note that at Studio’s Installation, a default admin user is created permitting us to create the rest of the users.

Peaklist Softwares

peaklist_softwares.png

As seen in the preceding screenshot, the Peaklist Softwares tab follows a similar approach with the accounts one. Again, tab permits us to either add one or modify an existing element via popup dialogs triggered by the two respective buttons. In order to add a new Peaklist Software it suffices to click on the button with the homonym reading, action which will trigger the appropriate dialog.


new_peaklist_software.png

Similarly, as seen in the proceeding screenshot  clicking on the modifying button permits the end user to alter a small attributes subset of an existing Peaklist Software.

modify_existing_peaklist_software.png


Proline Web

Server Connection

Prerequisite: You must have an account to login to the server. Ask your administrator to create one if you don't have any. After the installation, the default account is “admin” with password “proline 

- Open your Google Chrome web browser and connect to the address of the server (ask your administrator)

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_login_form.png

- Enter your username and password and click “OK”.

- To create a project, please follow the instructions detailed on this page.

Register Raw & MzDB Files

In order to create and run Quantitation analyses, you must register your MzDB files into Proline databases. To do so, click on the “Settings” button in the bottom bar of the Dataset Explorer application, and go to the “Raw File Registerer” tab.

dse_register_mzdb_files.PNG

 Note that this operation will override information if some of the files is already registered.

Close the Settings Window.


Create a Project

Open the DataSet Explorer app from the left panel (if closed, open it by clicking on the “apps” button in the upper-left corner of the page)

When the DataSet Explorer is opened, you can see your Project Tree on the left.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_project.png

To create a New Project, just click on the New project button at the bottom of this panel or in the context menu of the user node.


Create a User

dse_admin_settings.pngYou must be logged in as an administrator. Click the gear-shaped button in the top task-bar (circled in blue on the right) to open the administrator settings menu.

On the first tab, “User Administration”, you can create a new user by setting up its name, its password, and define whether or not the user will have the “administration” permissions (including applications, users and service management).

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_user.png

Submit the form to create the user. You can manage existing users from the “users” tab.

Please note that the Proline Web Desktop has its own database and its own users collection. However, if you configured the Proline Service running inside the Proline Web Desktop, it will synchronise the Proline Web Desktop and the Proline Core users each time a user logs in the Proline Web Desktop:


Import Result Files

The first thing to do in your brand new project is to import Result Files.

There are many ways to open the dedicated tab:

dse_import_result_files.png

A new tab will open and display the following panel.
Click on
Select result files to browse and select them. The left side of the browsing window is meant for the directories. When you click on one of them, its content is shown on the main panel. Choose one or multiple files then click Ok.

dse_import_result_files_add_files.png

Then configure the following parameters:

Master_protein_score * (1-subset threshold)

won't be imported.

Acetyl peptide N-term=H(-6) C(-7) O(-1)



You can now click on “Start Import” to launch the check and the import tasks.
D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_project_tree_tasks.png

The server checks your files first, then the import itself is launched automatically.

You can follow the current state of your tasks by clicking on the Tasks button at the bottom of the Project Tree panel.

It opens the Tasks Panel, listing all the tasks that have been run in Proline.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\task_panel_import.png

When a task is done, you are notified by a small message in the top of the screen, and you can see its status in the tasks window:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_import_finished.png

All the Result Files you have imported are listed in the Imported Search Results panel you can access by double-clicking the Imported Search Results node in your project tree.


 

  Imported Search Results

Double click on the Imported Search Results node to open the corresponding grid.

dse_imported_search_results.PNG

The grid itself allows to:

The grid toolbar allows to:

Update sample names in batch

From the Search Results grid, click on Edit sample names in batch to open the edition window.

In this window, you can copy and paste a grid from, let’s say, an Excel file. Each row corresponds to a result file. It must contain 2 columns: the sample name and either the peaklist file name/raw file identifier/result file name.

Let’s say you have the following grid:

Peaklist File

Condition

OVEMB150205_21.raw.mgf

UPS1 50 fmol

OVEMB150205_23.raw.mgf

UPS1 50 fmol

OVEMB150205_25.raw.mgf

UPS1 50 fmol

Copy it then paste it into the edition window as shown below. Notice that in this example, we are using the peaklist file name as a reference, so we must specify it in the selectable box on top of the window to update the column name.

You can also double click on a cell to edit its text directly in the edition window.

dse_sample_names_batch_editor.PNG

The result set table will be updated with the new sample names:

dse_sample_names_same_if_same_rfi.PNG

As says the warning, if many imported result files share the same raw file identifier, all of them will be impacted by this action.


Identification Tree and Dataset creation

Once your Result Files have been imported, you can use them to create a new Identification Dataset. To create it from result files, you can:

dse_new_dataset.png

Note that you can also create an empty dataset to further assemble complex structures using drag and drop in the project tree. However, this way is not favoured.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_dataset_based_on_sresults.png

You should now see a window asking to choose a source of data for your new Dataset.
The only option available yet is
Result Set: it allows you to build a new dataset from the Result Files you have imported. Click OK to continue.

dse_new_dataset_creation_panel_empty.png

 

Create a simple dataset (single group)

Dataset properties

In the right panel, two text fields allow you to enter a name and description (optional) for your dataset.

By default, the result sets will be named in the tree using the result file name. You can change that with the Name child results using box. The possibilities are: peaklist file name, raw file identifier, result file name, sample, search title.

Files selection

To add one or many files to your selection, select them in the grid (you can use the Ctrl and the Shift keys to make a multiple selection), then click on Add to dataset (top-right of the panel). You can also double click on one file to quickly add it to the selection.

To remove any file from the selection, just select them and click on Remove selected Items.

Created dataset

The creation of your identification dataset happens as follows:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_newly_created_dataset.png

Once your Identification Dataset has been created, you can see it on the tree, in the left side of the window. You may need to collapse and expand again the Identifications node to see it appear.
The white icon let you known that it is not yet validated (becomes green when validated).

Double click on the aggregation node to open the identification summary. This panel shows a list of your Identification fractions (corresponding to each imported file) and, after the validation process, it will display the Merged Result Summary infos.

Create a complex dataset (automatic grouping)

Dataset properties and files selection

Refer to the previous paragraph to know how to do these steps.

Add annotations to files

Once you have selected your result files, click on Add annotations. A window shows up, with as many empty lines as selected files.

dse_edit_annotations.png

Example 1: multiple biological groups

Let's say you compare 2 conditions, and you have the following result table (Excel file for instance). The idea is to copy/paste this table to the annotation editor.

Result file

Peaklist File

Condition

F078594.dat

OVEMB150205_21.raw.mgf

UPS1 50 fmol

F078596.dat

OVEMB150205_23.raw.mgf

UPS1 50 fmol

F078592.dat

OVEMB150205_25.raw.mgf

UPS1 50 fmol

F078590.dat

OVEMB150205_27.raw.mgf

UPS1 50 fmol

F078591.dat

OVEMB150205_12.raw.mgf

UPS1 5 fmol

F078595.dat

OVEMB150205_14.raw.mgf

UPS1 5 fmol

F078597.dat

OVEMB150205_16.raw.mgf

UPS1 5 fmol

F078593.dat

OVEMB150205_18.raw.mgf

UPS1 5 fmol

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_edit_annotations_bio_groups.png

Click OK to register these annotations. The window closes and the files selection has been annotated as shown below (on the left). Click Create Dataset. The generated dataset (below, on the right) has 2 levels of aggregation: biological group (with the copy/pasted names) and top-level (chosen name for dataset).

dse_edit_annotations_bio_groups_created.png

 

Example 2: multiple biological groups and biological samples

Let's take a slightly more complex design, introducing biological replicates for each condition:

Result file

Peaklist File

Biological replicate

Condition

F078594.dat

OVEMB150205_21.raw.mgf

BRep 1

UPS1 50 fmol

F078596.dat

OVEMB150205_23.raw.mgf

BRep 1

UPS1 50 fmol

F078592.dat

OVEMB150205_25.raw.mgf

BRep 2

UPS1 50 fmol

F078590.dat

OVEMB150205_27.raw.mgf

BRep 2

UPS1 50 fmol

F078591.dat

OVEMB150205_12.raw.mgf

BRep 1

UPS1 5 fmol

F078595.dat

OVEMB150205_14.raw.mgf

BRep 1

UPS1 5 fmol

F078597.dat

OVEMB150205_16.raw.mgf

BRep 2

UPS1 5 fmol

F078593.dat

OVEMB150205_18.raw.mgf

BRep 2

UPS1 5 fmol

Note In most of the cases, the definitions go as following:
- biological group = biological condition
- biological sample = biological replicate
- sample analysis = technical replicate
However, these definitions are not strict and you can use and adapt the hierarchy to fit your needs.

dse_edit_annotations_bio_groups_bio_rep.png

 

Click OK to register the annotations and see them in the creation panel.
Then click
Create Dataset.


The generated dataset has 3 levels of aggregation:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_edit_annotations_bio_groups_bio_rep_created.png


Validate a Search Result

To launch a validation on a dataset:

dse_validate_tree_button.png

The following form appears:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_validate_tree_params.png

The validation form handles several settings, especially:

When ready, click on Validate to launch the validation task. You can see it in the tasks panel. When the validation finishes, a notification is displayed and the node icon is colored in green.

See more details about the validation process.


Create a Quantitation

You can either create a quantitation from scratch, meaning you enter all information; or clone an existing quantitation to pre-fill the form with previous parameters.

Clone an existing quantitation

dse_clone_quantitation.png

Right-click on a quantitation, then on Clone (new quantitation).

This will open the quantitation launcher, with pre-filled parameters.

The new quantitation name will be, by default, the same as the cloned quantitation, suffixed by “(copy)”. You can change it.

When cloning, a popup window will appear, offering to Use the previous peakels detections. This mean the software will not look for signal again, but use the peaks that were found in the first quantitation. This will dramatically reduce the computation time.

You can change this parameter afterwards in the LFQ parameters section of the form (see description below).

Create a new quantitation D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_lfquant_rclick.png

There are 2 ways to open the Quantitation Creation form:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_lfquant_table.png

 Title, Type and Method

dse_new_LFQ_info.png

The first tab of the creation panel, entitled Information, let you define a name and an optional description for your quantitation. Type and method are disabled since the label free base on abundance extraction is the only combination available yet.

Click on Next to go to the next step.


Experimental Design

The Experimental design tab is where you define your Groups and Samples.

The simplest way to create biological groups (e.g. representing biological conditions) is to drag the validated nodes from the project tree and drop them in the dedicated zone. Drag/drop the nodes one by one.

dse_new_LFQ_exp_desing_ddrop.png

In the example below, two groups are created this way:

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_new_lfq_exp_desing_complete.png

For more complex experimental design, click on Add New Sample to add a sample (e.g. biological replicate) to your group. Enter the name of the sample, then drag/drop the corresponding node to the right panel to add it.

dse_new_LFQ_exp_desing_bioRep.png

In most of the cases, the definitions go as following:

However, these definitions are not strict and you can use and adapt the hierarchy to fit your needs. Tooltips are available to help you remind it (place your mouse over the terms to see them).

Once you have prepared all your groups and samples, click on the Next button.


Abundance Extraction parameters (LFQ parameters)

This tab let you set up you abundance extraction parameters.
Depending on your choices, some parameters will become available/unavailable (e.g. clustering parameters are visible only when extracting from MS/MS events).

dse_lfq_params.PNG

Tooltips are available to help you understand each parameter (keep he mouse on a parameter name to see it appear).
See
more details on these parameters.
See
more details on the quantitation process.

Tip: If you are creating a quantitation that you have already done in the past (i.e. using the same files), select the Use previous peakel detection to go much faster. It will skip the long step of peakel detection and use what was previously found instead.

Click on Next to go to the last tab.


Ratios

The purpose of this tab is to define the ratios between the groups your quantitation will rely on.

dse_new_LFQuant_ratios.PNG

You can define the ratios:

Click the “double arrow” icon to invert numerator and denominator in a ratio.

Launch the Analysis

When you're done, just press the Launch Quantitation button. You will be noticed when the task is finished


Delete Data

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_move_ident_ds_to_trash_rclick.png

You can delete an Identification dataset by:

All deleted datasets are visible in the Trash node in the project tree.

 

dse_move_ident_ds_to_trash_from_table.PNG

Display Identification information

Double-click on an identification or an aggregation of identifications to open its summary and results panel.


Display Identification Summary

The Summary panel sums up the validation parameters and results.

Summary of an aggregation dataset

If you clicked on an Aggregate node, this panel shows the information of the Merged Result Summary and a grid listing all of the identifications of this aggregate.

dse_ident_summary_aggregation.png

Summary of a result set

If you clicked on a result set (a leaf in the tree, corresponding to a single file), then the Summary panel will display its validation parameters; including the parameters chosen when launching the validation and the ones that were automatically computed.

dse_ident_summary_result_set.png


Display proteins sets

In order to browse the protein set data of a validated identification, click on the validated identification node in the project tree, on the left side panel of the dataset explorer, and then open the Proteins tab.

dse_ident_proteins.png

IMPORTANT: Please note that the Protein Table only displays the validated proteins. Furthermore, they actually are the representative math of a protein set.

Note that the identified proteins can be marked as Favorite. See the dedicated documentation here.

Related entities

When clicking a protein, the other tables will be filled with related information:

 Filters

Each table of the Identification data viewer provides a set of Filters for numerical, text and boolean data. It is placed on the left of the grid. They are circled in red in the first screenshot. Click on the double arrow to fold/unfold these panels. Use them as follows:

dse_ident_protein_filter.PNG


Display Peptides Table

The Peptides tab of a validated Identification Dataset (or Merged Dataset) allows you to browse the peptides of the related Result Summary.

Note that the identified  peptides can be marked as Favorite. See the dedicated documentation here.

When clicking a peptides, the Peptide Matches and Protein Matches tables are filled with entities related to this peptide. Double-click on one of these entities to open the dedicated tab and focus on it.

dse_ident_peptide.png

Filters

Each table of the Result Summary data viewer provides a set of Filters for Numerical, Text and Boolean data, placed on the left of the grid.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_ident_peptides_filters.png

 



Display MS Queries Table

The MS Queries table displays the MS Queries of the Result Summary and offers the same filters options as the others tables.

dse_ident_msqueries_tab.PNG

MS Query Click Actions


Display Quantitation information

Double-click on a quantitation node in the Project Tree in order to open its tabbed-panel.

Display Quantitation Summary

There are 3 tabs under the Summary: Experimental design, LFQ Params and Stats Params.

It also provides the “Post processing” tool you need to launch in order to compute statistics on your data (get ratios information). Note that the quantified proteins and peptides tables must be reloaded after the post processing (use the Refresh icon on the table top right).

Experimental design

This panel is a summary of your experimental design, listing your biological groups, their samples and their technical replicates.

dse_quant_summary.png

LFQ params

It has exactly the same layout as the quantitation launcher form, but is not editable. It allows you to see with which parameters the label-free quantitation was run.

Stats params

It has exactly the same layout as the parametrization window for the post processing, but is not editable. It allows you to see the parameters used for the last post processing of the quantitation.

This tab won’t be shown if no post processing has been run on this quantitation.


Proteins quantitation

dse_quant_proteins.png

The graphics are related to the files (quantitation channels) in which the protein has been -or not- quantified. Tooltips are available by keeping the mouse over a bar or a dot of the graphics.

The quantified proteins can be marked as Favorite. See the dedicated documentation here.

As for the identification, the tables Protein Set Matches and Master Quant. Peptide will be filled when a protein is clicked.

This grid is also provided a filter tool that you can use as follows:

dse_quant_results_proteins_filters.PNG

Double-clicking a Protein will open the Protein details panel:

Protein details panel

dse_quant_protein_details.PNG

On this panel, you get an insight of all quantified peptides for this protein (double click on the peptide in the 'Quantified Peptides' table or expand directly the peptide panel).

dse_quant_protein_details_peptide_expanded.png

You can click on the eye icon (circled in green) to access the Speclight viewer of this Peptide.

TODO: insert link to Speclight doc


Peptides quantitation

dse_quant_peptides.png

This is the list of all quantified peptides.

The quantified peptides can be marked as Favorite. See the dedicated documentation here.

Note that the peptide grid has got a filter tool too.


LC-MS Maps

This panel offers two tabs to give you information about the LC-Ms maps of the experiment.

Map alignments

IMPORTANT: it is crucial to visit this graphics to manually control the processed map alignments. A bad alignment (showing many curve trends, or too large time deltas) will lead to quantitation errors. If you have the feeling that the alignment needs improvement, re-run the quantitation, tuning the Alignment Computation part of the LFQ Params tab (see how to create a quantitation).

dse_quant_results_maps_alignment.png

On this chart, you can see all the delta time of each LC-MS map against the reference LC-MS map. The reference map is the one specified in the title of the chart. It corresponds to the x axis.

LC-MS maps 2D viewer

Click on one or many raw files to display their LC-Ms map features.

dse_quant_results_maps_2D.png


LC-MS Features

This panel lists all the features detected and processed by Proline.


Click a
master feature to see: - the corresponding XIC on the right-side panel - the corresponding features in each file in the bottom grid (first tab) - the corresponding master quant. peptide ion in the bottom gris (second tab).

You can double click an entry in the bottom grid to open a dedicated tab (feature/peptide ion details).

dse_quant_results_features.png


LC-MS Peaks

All the peaks are listed here, independently from the features they have been gathered to.


Quantitation Stats

Post Processing

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_launch_post_processing.png

In order to see the statistics on the ratios, you have to launch the Post Processing (previously called Profile Analysis). Launch it from the Summary tab toolbar, or from the quantitation node (in project tree) context menu.

You can see the Post Processing configuration window below.
See
more details on the post-processing parameters.
Launch the post processing by clicking
Run. You can watch the task progression in the Tasks window. You will also be notified when the task is done. You can then watch the results in the Quantitation Stats tab.

D:\Proline_Data\Documentation\1.3\ProlineSuite_V1.3_MERGED_ODF\dse_post_processing.png

Quantitation Statistics

This tab offers many panel to help you visualize statistics about your quantitation.

Proteins volcano plot

The first panel is a volcano chart of your Proteins based on their -Log10 T-Test P-Value over Log2 Ratio.

dse_quant_results_protein_volcano.png

The Left grid shows a list of all the proteins, and if they are in the actual boundaries of the selection/exclusion tool.

Hover a dot to see information on the protein.
Click on it to open its
protein details panel.

The AC Filters let you select a color (by clicking on it) and enter an Accession Number (or a part of it) to highlight matching proteins on the chart). In this example, all proteins containing “HUMAN” in their accession number are colored in orange. By defaults, dots are light blue.

The Log2 Fold Change (of the ratios) and P-Value fields let you define new boundaries for the protein selection/exclusion. It is shown on the chart with black lines and gray points to show excluded proteins.

dse_quant_results_protein_volcano_filtered.PNG

Select a zone with the mouse cursor to zoom on it. Use the icon at the top-right of the volcano plot to export it at several picture formats.

Peptides volcano plot

The second tab offers the same volcano chart and options for peptides. Instead of accession number, you can color the dots according to their sequence or PTM definition.

dse_quant_results_peptide_volcano.png


Quantitation folders

In the tree, you can organize your quantitations into folders. To create a new folder, right click on the Quantitations node, then on Create folder. Enter a name for the folder. You can now drag and drop quantitations or other folders into the newly created one. You can also drag and drop quantitations/folders out of a folder.

Favorite proteins and peptides

Four kind of entities can be labeled as “favorite”: the identified proteins, the identified peptides, the quantified proteins and the quantified peptides.

The column entitled Favorite in each table corresponding to those entities allows to:

The favorite level

There a 3 levels of favorites:

Set/Unset a favorite

Update a single entity

dse_favorites.png

Click on the star icon of the protein or peptide you want to update. A contextual menu will open, with 3 options:

You can set as favorite any entity that is not yet marked so (blue and grey stars).

Unsetting an entity as favorite will apply this change only in the current dataset (if it was labeled as favorite in another dataset, the latter won’t be affected).

Update many entities at a time

dse_favorites_in_batch.png

You can also use the Favorites menu of the grid to set/unset all the grid items as favorites. You may want to filter the grid before.

In the example above, all proteins that were quantified in at least 7 files and have a protein set score superior to 150 will be set/unset as favorites in one click.

Sort and filter favorites

Since the favorite status of an entity is coded with an integer, you can sort the Favorite column (use the header menu) and/or filter the grid items on their “favorite status”. To do so, use the Button Data tab of the filter.

dse_favorites_filter.PNG


See a favorite’s references in the project

dse_favorites_see_references.png

When you marked an entity as favorite, you may want to know if it was also set as favorite in other datasets.

To do so, click on the star icon the select See references in project.

The panel below will open.

dse_favorites_references.PNG

In this panel, you will see every reference of the entity that has been marked as a favorite. You can click a green arrow to open the corresponding panel, focused on the entity.


Copy grids content

All grids are copyable, but not all the same way.

The grids with a “Save” icon on the top right can be copied entirely (the whole content) with this icon.

The others can be copied line by line (select several lines by maintaining Shift actioned while clicking on the lines, or select all lines with Ctrl+A) with the keys Ctrl+C.

dse_grid_save_icon.png


Export Data

The “Export” button of any identification or quantitation node is accessible by right-click on it, or in its summary tab when you double-click on it. All your exported data will appear under the “Exported Files” node of the project you chose during export (the dataset current project, by default).