easy16S • easy16S

Usage

Load data

Users can load a phyloseq object directly when launching the app using the following syntax: easy16S::run_app(physeq = phyloseq.extended::food).

Alternatively, there are three ways to load data when the app is launched:

Use one of the demo datasets provided with the application.
Upload flat files to build a phyloseq object:
- a BIOM file (Standard format or FROGS format) [mandatory].
- a metadata table with variables (in columns) and samples (in rows). Ensure that sample names (1st column) are spelled exactly as in the BIOM file. The delimiter and format of columns can be specified.
- a phylogenetic tree in Newick format.
- a FASTA file with representative sequences.
Upload a phyloseq object as :
- a RDS file.
- a RData file containing a phyloseq object named data.

Additionally, a RDS object can be provided directly from an URL: https://shiny.migale.inrae.fr/app/easy16S/?rds=https://mywebsite.com/path/to/my/data.rds

Preprocess data

Before doing any analysis, it is customary to preprocess the data to refine and clean the raw data. The following operations are available and can be applied iteratively to achieve rich selections:

Select samples based on their name.
Filter samples based on the sample variables available in the metadata table.
Prune samples whose sum (depth) does not satisfy a given threshold.
Aggregate taxa at a specified taxonomic rank (e.g. Genus, Family, etc)
Spread taxonomy to remove unknown and multi-affiliations by spreading the last known rank to further ranks (e.g. “Bacillus;multi-affiliation” would become “Bacillus; unknown Bacillus species”).
Rarefaction (resample the abundance table to ensure that all samples have the same depth, set as the minimum one among samples).
Transform the abundances in the abundance table, using one the following: prop (change abundances to proportions / relative abundances), sqrt (square root), sqrt_prop (square root of relative abundances), clr (centered log-ratio, after adding a pseudo-count of 1).

Once the desired operations are selected, users can seamlessly switch between the raw and preprocessed data to assess the impact of the applied transformations.

A few words about rarefaction

For many analyses (notably all those based on presence / absence data and more generally diversity analyses), it is recommended to normalize the samples by rarefying to account for variations in sequencing effort and ensure that the detection probability is comparable across sampling. Rarefying involves subsampling each sample to the same depth, ensuring a more equitable comparison of microbial diversity across samples. It is however not advised for differential abundance analyses as it decreases statistical power.

Explore and Analyse Data

Tables

Users can visualize and explore key tables constituting the phyloseq object under study:

OTU/ASV Table: Abundance of each OTU/ASV in all samples.
Taxonomy Table: Taxonomic affiliation of each OTU/ASV at different taxonomic ranks (e.g. Phylum to Species).
Agglomerate OTU/ASV Table: Same as OTU/ASV Table but after merging all ASV/OTU sharing the same taxonomic affiliation up to user-specified rank.
Sample Data Table: Metadata associated with each sample, as provided by the user during the import process (metadata table).

For a deeper understanding of how phyloseq objects function, refer to the phyloseq documentation on data import.

Metadata

This section provides access to the sample data table for use with the esquisse addin. It is useful to explore and assess associations between sample variables (but not metabarcoding data).

This addin allows you to interactively explore your data by visualizing it with the ggplot2 package. It allows you to draw bar plots, curves, scatter plots, histograms, boxplot and sf objects, then export the graph or retrieve the code to reproduce the graph.

Barplot

Used to create composition graphs (stacked barplots of relative abundances), based on the phyloseq.extended::plot_composition() function. This feature provides users with the option to:

Specify the taxonomic rank used for aggregation and coloring.
Filter and display results for a specific taxon.
Group samples based on metadata.

Composition barplots show the relative abundance of all or part of the sample diversity.

Rarefaction

Used to create rarefaction curves, based on the phyloseq.extended::ggrare() function. These settings provide users with the option to:

Color, annotate and group samples based on metadata.
Display a minimum sample threshold.

Rarefaction curves are used to evaluate the relationship between richness and sampling effort (number of reads, or sequencing depth) in each sample. This curve shows the expected number of OTUs/ASVs observed in each sample based on the sequencing depth. Rarefaction curves generally grow rapidly at first, as the most common OTUs/ASVs are found, but the curves plateau as the diversity saturates as only the rarest ones remain to be observed.

Heatmap

To create an ecologically-organized heatmap, use the phyloseq::plot_heatmap() function. These settings provide users with the option to:

Select only the n most abundant taxa for display.
Agglomerate taxa at a user-specified taxonomic rank.
Group, annotate and order samples based on metadata.
Display the affiliation of each OTU/ASV at a user-specified taxonomic rank.

Heatmaps can be used to investigate the structuring of sample communities, ordered using a “NMDS” ordination (samples ordered by increasing angle between the x-axis and their projection). It can also be used to observe core and condition-specific microbiota.

\(\alpha\)-Diversity

\(\alpha\)-diversity measures richness within a sample. Detailed information on this concept and the different metrics available in easy16S can be found in the alpha diversity section of the phyloseq documentation.

Table

Compute the main alpha diversity estimators using the phyloseq::estimate_richness() function. If a sample data table is available, it is included in the table for further analyses (e.g. ANOVA, regression, etc)

Plot

Visualize the previously calculated metrics with the phyloseq::plot_richness() function. Users can customize the arrangement of samples along the x-axis (X), color and shape of samples based on metadata. Additionally, diversity data can be displayed as boxplots instead of points.

ANOVA

This section performs ANOVA on the diversity table enriched with the metadata to assess the impact of a covariate of interest on the alpha-diversity. For categorical variables, a post-hoc pairwise comparison table is also provided to identify levels of the variable with significantly different diversities.

\(\beta\)-diversity

\(\beta\)-diversity measures the dissimilarity between samples, capturing richness variations. The selection of a distance metric is crucial, and detailed information is available in the phyloseq documentation or in the gusta.me website. These functions can be compositional or qualitative, phylogenetic or not, and the choice depends on the features of interest.

Different distances capture different features of the samples. There is no “one size fits all.” However, choosing an appropriate measure is essential as it will strongly affect how your data is treated during analysis and what kind of interpretations are meaningful.

Table

Compute distances between each pair of samples using the phyloseq::distance() function and the chosen distance metric.

Samples heatmap

Plot matrix of pairwise distances using the phyloseq.extended::plot_dist_as_heatmap() function. Users can customize sample order based on metadata to highlight patterns (e.g. lower within-group than between-group distances).

Samples clustering

Use the distance matrix and a user-specified linkage method (e.g. Ward, complete, average, etc) to compute and plot a hierarchical clustering tree of the samples with the phyloseq.extended::plot_clust() function. Users can color leaves of the tree (i.e. samples) according to a categorical metadata to identify the variables along which the samples separates.

MultiDimensional Scaling

Use the distance matrix to ordinate the samples (i.e. project them while preserving at best their pairwise distances) in a low-dimensional space with the phyloseq::ordination() function, and visualize this ordination with the phyloseq::plot_ordination() function. In addition to selecting the ordination method (MDS/PCoA, NMDS, etc), users can customize color, shape and labels of samples based on metadata. Additionally, ellipses can be added to group samples in the same category of a variability (e.g. healthy versus diseased individuals). By defaults, the ordination represents the principal plane (axes 1 and 2) of the projection but further axes can be used for plotting.

These graphs serve as powerful tools for exploring and interpreting factors structuring the microbial community structures.

For more examples and details, refer to ordination plots on phyloseq documentation or GUSTA ME.

Multivariate ANOVA

Use Permutational Multivariate ANOVA to assess the impact of one or several covariates on community structure with vegan::adonis2(by = 'terms', perm = 9999). The test compares the structure given by sample data with 9999 randomly generated structures. Permutational Multivariate ANOVA (also called non parametric multivariate ANOVA or npmanova) accommodates complex designs, but it tests only location effects (e.g. are the typical communities similar in groups A and B?) and assumes equal dispersions (i.e. same biological variability in both groups).

Users should specify up to 3 covariates and their potential interactions to be included in the model.

PCA

Perform PCA using stats::prcomp() on the abundance matrix. While MultiDimensional Scaling (MDS) is often recommended for microbiome analysis, Principal Component Analysis (PCA) after appropriate data transformation can be an alternative. The transformed abundances can be centered and/or scaled during the analysis. Users can customize color, shape and labels of samples based on metadata, add ellipses to group samples from the same category, and select the axes of the projection like in Multidimensional Scaling. Loadings (OTU/ASV) of the principal axes can also be incorporated to understand the individual contributions of taxa each axis.

Differential abundance

This section is dedicated to the identification of over- or under-abundant OTU/ASVs based on an experimental variable (categorical or numeric). The main tool for this analysis is the DESeq2 package (with the sfType = "poscounts" used by default to ignore null values when computing scale factors), utilized through the phyloseq::phyloseq_to_deseq2() function (refer to the accompanying vignette).

However, note that while DESeq2 was developed for transcriptomics data using negative binomial models, amplicon metagenomics data are typically very sparse, and how well these models handle such sparsity, even with sfType = "poscounts" is not clear.

To proceed with differential abundance analysis, users need to

select an experimental design model
select a contrast of two covariates (for categorical variables).

An interactive volcano plot representing the differentially abundant OTUs is then showed (clicking on any OTU/ASV displays a barplot representing its relative abundance across the samples) alongside an interactive table with detailed information on the differential abundance statistics (p-value, effect size, etc) and the taxonomy of each OTU.

This analysis allows the user to identify and visualize the taxa that exhibit significant differences in abundance between two conditions, providing valuable insights into the impact of experimental variables on individual microbes.

Export data, plot, and results

Users can export their (potentially preprocessed) data with the “download” icons. The export options include:

Exporting data in .biom format. Note that if a phylogenetic tree is present, it will not be included in the exported biom file. This format facilitates compatibility with other tools.
Exporting the constructed phyloseq object in .rds format. This enables further analysis within R or for use in Easy16S.

For results tables, users can easily export them using the CSV, Copy (to clipboard) or Excel buttons.

To export a plot, click on the camera button located at the top right of each plot. Global export parameters, such as height, width, scale, and format, can be configured through the menu at the top right of the header. This functionality provides users with resize plots as needed before export.

These export features enhance the usability and accessibility of both data and results, allowing users to seamlessly integrate Easy16S with their preferred analysis tools and workflows.