## Results

In the Results step you can explore your results with dedicated tools and statistical tests. The default screen displays the results for the statistical analysis that is best adapted to your experimental design. But you can create and manage additional analyses.

# How to

## Discover the Results screen

### Analysis tabs

By default, you will see two tabs at the top of the screen:

- A tab for the current analysis.
- An
**Analysis summary**tab, where you can create and manage additional analyses.

There will be an additional tab for each newly opened analysis. Clicking a tab will show the results of the corresponding analysis, organized into the following screen areas.

### Experimental design view

The experimental design view shows the *Experimental design matrix* used for your analysis. Its color coding is applied to all other elements on the screen. Note that this view is not available for a Gel analysis.

### Plot area

The plot area contains one or several plots. When several plots are available, you can click the corresponding tabs to switch between them. Available plots are:

- Expression profile
- Interactions plots (one for each factor in a 2-factor analysis)

### Display area

The default layout of the images in the *Display area* corresponds to the layout in the *Experimental design matrix*. The colors let you easily see what treatment conditions you are looking at. At the right of the *Display area*, you will find the standard *Toolbar *to manipulate the image views, and set the visualization options.

### Results table

The **Results table** can be found at the bottom of the screen. It summarizes the results for your specific analysis.

Very importantly, the *Results table* allows you to:

#### Filter by values

Click the **Filter by values** icon to apply various selection criteria to your spots. You could, for instance, select all spots with an ANOVA probability < 0.05 and a maximum Fold change > 2.

#### Annotate

Create and combine spot sets, or annotate spots.

#### Validate spots

When you systematically review spots (for instance spots sorted in a column, or belonging to a spot set) you can validate the spots. This means that you confirm that a spot is, or is not, of interest. A validation column has a tick box that can be in three states:

- – The spot is confirmed as being of interest.
- – The spots is confirmed as not being of interest.
- – The spot has not yet been reviewed and validated.

Spots can be validated at two levels:

- You can
**validate spots for the current analysis**. Do this by using the**Analysis-ID**column in the results table. Note that although you can display Analysis-ID columns from other analyses in the current results table, you will only be able to edit the Analysis-ID column for the current analysis. - You can
**validate spots for the total experiment**. Do this by using the**Validation**column. Note that you can display the Analysis-ID columns from all analyses in one results table during final review.

Click the *Settings* icon in the *Results table* to show or hide validation columns.

## Create and manage analyses

The **Statistics** tab shows the list of your different analyses. By default, there will only be one analysis in the list. Each analysis has its own **Analysis-ID**, which can be found in the Analysis list, in the name of its tab, and as the validation column (Analysis-ID) of the corresponding *Results table*. You can hide or show an analysis by ticking in its **Open** box. This will close or open the corresponding tab.

Click **New** to create an additional analysis. Learn more about the available analysis types:

To delete an analysis, select it in the analysis list and click **Delete**.

Click **Properties** to view and edit the **Name** and **Comment **of a selected analysis.

## Perform a Gel analysis

This type of analysis lets you investigate protein expression changes within a set of gels, without taking treatments into consideration.

In the **Group table**, you can sort and filter spots based on descriptive statistics measures such as Mean, Standard Deviation (SD), Coefficient of Variation (CV) and Range ratio.

You can click the **Spot table **icon in the *Group table* toolbar to display the *Spot table*.

## Perform a One-factor analysis

This type of analysis lets you find significant protein expression changes between different levels of a single factor. Use the ANOVA probability to evaluate if there is an effect of the factor on the expression of a particular protein spot.

In the **Anova table**, you can sort and filter spots, for instance based on the p-value for the ANOVA test – indicated **Anova (p)** – and the maximum **Fold** change.

You can click the **Tables **icon to display a number of related tables and views:

**Spot table**– Displays a table with the different spot quantities for each spot in each image.**Expression ratio table**– Displays a table with the expression ratios for all treatments, relative to a selected reference treatment.**Values summary**– Displays the quantification value for the selected spot in all images, in the same arrangement as the*Experimental design matrix*.**Anova summary**– Displays a detailed output of the ANOVA results.

## Perform a Two-factor analysis

This type of analysis lets you find significant protein expression changes between different treatments of a two-factor analysis. Use the ANOVA probabilities (3 columns) to evaluate if there is an effect of one of the factors on the expression of a particular protein spot, or if there is a significant interaction between the two factors.

In the **Anova table**, you can sort and filter spots, for instance based on the different p-values for the ANOVA test – indicated **Interaction (p)**, **FactorA (p)**, **FactorB** (p) – and the maximum **Fold** change.

You can click the **Tables **icon to display a number of related tables and views:

**Spot table**– Displays a table with the different spot quantities for each spot in each image.**Expression ratio table**– Displays a table with the expression ratios for all treatments, relative to a selected reference treatment.**Values summary**– Displays the quantification value for the selected spot in all images, in the same arrangement as the*Experimental design matrix*.**Anova summary**– Displays a detailed output of the ANOVA results.

## Perform a Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a data reduction technique. It is a way of capturing the variance in many variables in a smaller number of principal components, or dimensions, that explain most of the variance observed. It allows a simplified graphical representation of your multidimensional data.

You can use PCA to identify outliers in your data and check whether your gel images cluster according to your experimental design groupings. It is even possible to get rough estimations on which proteins are up- or down-regulated in the different images.

### References

Principal component analysis is an advanced statistical technique, whose comprehensive description is beyond the scope of this user guide. For more information, we recommend the following articles:

Dallas, G. Principal Component Analysis 4 Dummies: Eigenvectors, Eigenvalues and Dimension Reduction October 30, 2013 – Explanation of the principles of PCA, without a single equation.

Lebart, L., Morineau, A., Piron, M. “Section 1.2 – Analyse en Composantes Principales” Statistique exploratoire multidimensionnelle, Paris: Dunod, 1995. 32-66

PennState Eberly College of Science, STAT 505 – Applied Multivariate Statistical Analysis online course, Lesson 11: Principal Component Analysis (PCA)

### Variables and observations

PCA aims to find the principal components that best discriminate the variables based on the observations. When creating a new PCA analysis, you can choose which of the gel images or spots you want to consider as the observations:

**Images as observations (Spots as variables)**– Use this analysis to find image outliers.**Spots as observations (Images as variables)**– Use this analysis to find spot outliers.

### PCA plot

The created PCA analysis appears in a new tab with the *PCA plot* at the top left. It always displays spots as tiny red squares and gel images as colored dots. To identify each spot or image, you can hover over the item to display a tooltip with the spot ID or image name, and additional information.

#### Display spots and/or images

If you prefer to show only images or only spots in the plot, click the **Settings** icon in the *PCA plot* toolbar and choose the desired option in **Display** (**Spots and Images**, **Spots**, **Images**).

#### Choose image colors

The colors of the image dots correspond to those of your main experimental design. You can change those colors by clicking the **Design** drop down arrow at the top of the *PCA plot*. Select **Choose design color** and pick the desired experimental design. The images in the *Display area* will be grouped according to this choice.

#### Principal component selection

By default, Melanie displays the plot for the two **principal components** (**PC**) that explain most of the variance in the data set. You can change a principal component by clicking the black drop down arrow next to its name and see the percent variance for which the component accounts. The proportion of the variation explained by each calculated principal component is also provided in the Principal component table. To display this table, click the **Settings** icon in the *PCA plot* toolbar and choose **Principal component table**.

#### Choice of spots used

PCA is initially performed on all spots in the data set. But you can construct the *PCA plot* on a subset of spots, such as the significantly differentially expressed proteins from your one- or two-factor analysis. Click the **Spots** drop down arrow at the top of the *PCA plot* and choose the desired option (**Use all spots**, **Use spot set**, **Use validated spots**).

#### Zoom in and out

You can zoom in and out of the plot using your mouse scroll wheel. Hold the scroll wheel and drag to translate the plot. You can center again on particular elements by clicking the **Settings** icon in the *PCA plot* toolbar and choosing one of the **Zoom** options (**Zoom on spots and images**, **Zoom on spots**, **Zoom on images**, **Zoom on ellipse**, **Zoom to view all**).

#### Select spots

Click a red square to select the corresponding spot on the images. To select several spots, hold the Ctrl or Shift keys during the selection, or draw an area over several spots. Spot selection is synchronized across all analysis tabs and workflow steps. You can therefore switch to another analysis for more in depth investigation of particular spots, or to the alignment step to fix possible alignment issues.

#### Standardization

When using raw data, PCA tends to give more emphasis to those variables that have higher variances than those variables that have very low variances. In the case of spots with several orders of magnitude of protein expression, this is usually not desired. You probably wish each protein spot to receive equal weight in the analysis. In Melanie, you can therefore standardize the variables before principal component analysis is carried out. This is done by subtracting its mean from the variable and dividing it by its standard deviation.

By default, Melanie always carries out standardization. But you can change this option by choosing **Settings** in the *PCA plot* toolbar and clicking **Use standardized values**.

### PCA table

The position of the observations on the *PCA plot* corresponds to their contribution to the construction of the principal components displayed. The further away an observation is from the origin of a principal component axis, the more the observation contributed to this particular component. The *PCA table* displays the contribution (**Contrib.**) of each observation to the two principal components displayed in the *PCA plot*.

The **Quality** parameter in that table measures whether the projection of the observation is well represented by the two principal components. It tells you how close the distance of the projection is to reality. Observations with very similar behavior (such as images with similar protein expression profiles) are close in space. However, when projected onto a two-dimensional subspace, observations that are actually far apart may appear close together. It is therefore important to look at the Quality to judge whether observations are effectively close. If Quality is high for both observations, the chance is great that they are indeed nearby and have a similar behavior. If one of the observations has a low Quality value, any interpretation becomes tentative.

### Principal component table

The *Principal component table* summarizes, for each principal component (**PC**), the proportion of variation (**%Variance**) in the data set that it accounts for. It also gives the coefficients that describe the contribution of each variable to the component.

Normally, the number of principal components equals the number of variables being analyzed (images or spots, depending on the option chosen). But Melanie only calculates a limited number of components (together accounting for more than 99% of variance), since much of the variance is covered by the first principal components.

### Interpretation of a PCA plot

The PCA plots in Melanie represent the observations and variables in the same biplot. However, the scales used in each of the two representations are not the same. You therefore cannot draw any conclusions from the proximity of spots and images in the plot. You should only consider relative direction and distance from the axis origin.

To clarify how this works, we can look at an example from a two-factor analysis. Bacteria grown on two different substrates (A and B) underwent two different treatments (Treatment 1 and Treatment 2). Three replicate samples were run for each of the four conditions. A two-way ANOVA analysis was carried out to identify the significantly differentially expressed proteins spots (called “significant spots” below), either for the Treatment factor, the Substrate factor or the Interaction between these two factors (p values < 0.001).

We then create two PCA analyses:

#### Images as observations

The following plot is obtained when images are taken as observations and spots as variables. Only significant spots are considered in the analysis.

We can see that the images from the same condition cluster together, in line with the experimental design groupings. If this had not been the case, further investigation would be needed to explain any image outliers. Note that PCA is more likely to cluster images according to their groups when you only use significant spots. This makes sense, as PCA captures variation and significant spots show high variation.

We would also hope to see the expected groupings when considering all spots in the analysis, as an even stronger indication of the presence or absence of image outliers. When considering all spots in the analysis, this is the plot we get:

The images of the same group still cluster together, though with a little more dispersion. This is not surprising, as we not only consider the significant spots (in green) but also the spots with little variation (in red).

We see that the significant spots are furthest away from the origin of the plot, i.e. the (0,0) position were the two axes cross. The further away a spot is from the origin, the more likely it differentiates between the images.

Spots clustering together on the PCA plot have similar protein expression profiles. This can be seen in the illustration below. It shows the expression profiles for five distinct spot clusters. For instance, the spots 94, 425, 432, 335, 301, 215, 972 and 384 are all highly expressed in the group *Treatment 2 – B*, but have much lower abundance in the three other treatment groups.

We can take a closer look at spot 425 in that cluster. We can draw a “spot axis” through the spot and the origin (0,0). Images on the positive side of this axis, i.e. in the upper part of the plot, will have high abundance values for this spot. This is the case for the images of *Treatment 2 – B*. On the other hand, images on the negative side of this axis, i.e. in the lower part of the plot, will have low abundance values for this spot. This is what we see for the three other treatment groups, when looking at the spot’s protein expression profile.

It is important to understand that the position of an image in the *PCA plot* is determined by all spots considered in the analysis. But the closer an image is to a particular “spot axis”, the higher the influence of the spot is for the position of the image. So spot 425 and the other spots in the same cluster highly contribute to the positions of the images B_T2_Gel1, B_T2_Gel2 and B_T2_Gel3. Similarly, spot 793, and the spots 367, 727, 469, 195, 641 and 398 highly contribute to the positions of the images A_T2_Gel1, A_T2_Gel2 and A_T2_Gel3. As shown in their expression profiles, these protein spots are more abundant in *Treatment 2 – A* then in the other groups.

#### Spots as observations

In a second analysis, shown below, we consider the significant spots as observations, and the images as variables. We can first note that although PC1 accounts for 82.7% of the variation, it does not account well for the differences between experimental groups. In fact, PC1 often correlates with protein abundance. Indeed, from the 161 significant spots considered in the analysis, the 20 most abundant protein spots (mean volume) are shown in green.

We can therefore look at the second and third principle components, as shown below. Again, when we draw an “image axis” through A_T2_Gel3 and the origin (0.0) we note that spots close to this axis are highly abundant in A_T2_Gel3 and the other two images in the same group, exactly as observed in the Images as observations analysis.

This PCA plot shows some spot outliers, such as spots 700, 284, 114, 410 and 411. Such outliers can either be very strongly differentially expressed proteins, or mismatched spots. When spots appear mismatched, you can jump back to the Alignment step to correct the alignment, before continuing further analysis.