Introduction


Alternative link on Bilibili: https://www.bilibili.com/video/BV1mS4y1g7uB/

Licence

3D RNA-seq is currently under a dual-licensing model.
  • Open source under GPLV3.0. For academic and non-commercial use, it is free.
  • For commercial use, please get in touch to obtain commercial licenses. Contact us

Citation

  • Guo,W. et al. (2020) 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists. RNA Biol., DOI: 10.1080/15476286.2020.1858253.
  • Calixto,C.P.G., Guo,W., James,A.B., Tzioutziou,N.A., Entizne,J.C., Panter,P.E., Knight,H., Nimmo,H.G., Zhang,R., and Brown,J.W.S. (2018) Rapid and Dynamic Alternative Splicing Impacts the Arabidopsis Cold Response Transcriptome. Plant Cell, 30, 1424-1444.

Report issues & feedback

If you have questions to raise or are experiencing difficulties using the 3D RNA-seq, please use the 3D RNA-seq user group.

Description

The ThreeDRNAseq (3D RNA-seq) R package provides an interactive graphical user interface (GUI) for RNA-seq data analysis using accurate quantification of RNA-seq reads. It allows users to perform differential expression (DE), differential alternative splicing (DAS) and differential transcript usage (DTU) (3D) analyses based on limma (Ritchie et al., 2015). The 3D RNA-seq GUI is based on R shiny App and enables a complete RNA-seq analysis to be done within only 3 Days (3D).

Prior to 3D RNA-seq

RNA-seq raw read data is organised by sample. Different samples are either generated by different experimental conditions (multiple genotypes, tissue types or treatments, time-of day circadian experiments) or by biological replication (sometimes also sequencing replication) and each of them is used for the construction of individual libraries. Next, quality control and trimming of the RNA-seq reads should be performed to ensure that high quality RNA-seq reads are used for downstream analysis. The read data are then used for RNA-seq quantification. This step can be performed using many different pipelines, and the type of pipeline determines whether you can use 3D RNA-seq for your downstream expression analyses or not. 3D RNA-seq is only compatible with transcript quantification data derived from Salmon (Patro et al., 2017) or Kallisto (Bray et al., 2016) with the use of a reference transcriptome or Reference Transcript Dataset which contains a list of the known genes and transcripts for the organism under study. The Salmon/Kallisto output file contains the TPM values for each transcript organised by biological repeat and treatment(s). Depending on the size of the dataset, the transcript quantification procedure might take up to 1-2 days.

Inputs to 3D RNA-seq

Three input files are required for downstream analysis:

  1. Gather the meta-data of the experimental design in a csv spreadsheet, the columns of which must include the following information (Figure A):

    • A column of the factor or multiple columns of the factors of the experimental design.
    • A column of the biological replicate labels.
    • A column of the sequencing replicate labels if they exist.
    • A column of the file names of transcript quantifications.
  2. A folder that contains the transcript quantification files. Each file contains transcript quantification data of a single sample. Read counts and TPMs for 3D analysis will be generated from the "quant.sf" objects if these files are generated by Salmon (Patro et al., 2017) and the "abundance.h5"/"abundance.tsv" objects if these files are generated by Kallisto (Bray et al., 2016) (Figure B).

  3. Transcript-gene mapping file in one of the following formats:

    • "csv" spreadsheet with first column of transcript IDs and second column of gene IDs (Figure C).
    • Or a "fa" file of the transcriptome for quantification. Transcript names and gene IDs will be extracted from the "transcript" (if it exists) or ">" tags and "gene" tags in the description lines of sequences, respectively (Figure D).
    • Or a "gtf" file of the transcriptome for quantification. Transcript names and gene IDs will be extracted from the "transcript_id" and "gene_id" tags in the last column, respectively (Figure E).

Note:

  1. Transcript-gene mapping in "csv" file is recommended. Depending on the size, it may take a while to generate the table from a "fa" or a "gtf" file and any missing tags for transcript name and gene ID extraction in these files may lead to errors.

  2. The 3D analysis is executable in a computer with normal memory and CPU size. If the App is running on docker image, it is recommended to reduce the data size to upload to our server. Users can exclude all the files in sub-folders of transcript quantifications, except the files of "quant.sf" from Salmon. If the transcript quantifications are generated using Kallisto, users can keep either "abundance.h5" or "abundance.tsv" in the sub-folders with smaller size (Figure B).

Figure 1: Input files of 3D RNA-seq App. The example is from a RNA-seq study of Arabidopsis in respond to cold (Calixto et al., 2018). (A) Meta-data table of sample information in csv file. (B) The folder contains transcript quantifications. The input of transcript-gene mapping information can be a "csv" spreadsheet with first column of transcript names and second column of gene IDs (C), a ".fa" file (D) or a ".gtf" file (E) of the transcriptome. If a ".fa" or a ".gtf" file is provided, the App will extract transcript-gene association information with specific tags.

Capability of 3D RNA-seq

The 3D RNA-seq analysis pipeline starts with a number of steps to pre-process the data and reduce noise (e.g. low expression filters based on expression mean-variance trend, visualise data variation, batch effect estimation, data normalisation, etc.). The optimal parameters for each step are determined by quality control plots (e.g. mean-variance trend plots, PCA plots, data distribution plots, etc.). Stringent filters are applied to statistical testing of 3D genes/transcripts to control false positives. After the analysis, publication quality plots (e.g. expression profile plots, heatmap, GO annotation plots, etc.) and reports can be generated. The entire 3D RNA-seq analysis takes only 1 Day or less and all actions are performed by simple mouse clicks on the App.

How to get help

3D RNA-seq App "easy-to-use" manual:

https://github.com/wyguo/ThreeDRNAseq/blob/master/vignettes/user_manuals/3D_RNA-seq_App_manual.md

Tooltips:

In the GUI, users can click tooltips in specific steps for help information.

Contact us:

3D RNA-seq App is developed and maintained by Dr. Wenbin Guo from Plant Sciences Division, School of Life Sciences, University of Dundee. If you have any questions and suggestions, please contact:

Pipeline of 3D RNA-seq App

Figure 2: 3D RNA-seq analysis workflow on this App.

References

  • Guo,W. et al. (2020) 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists. RNA Biol., DOI: 10.1080/15476286.2020.1858253.
  • Bray,N.L., Pimentel,H., Melsted,P., and Pachter,L. (2016) Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol., 34, 525-527.
  • Calixto,C.P.G., Guo,W., James,A.B., Tzioutziou,N.A., Entizne,J.C., Panter,P.E., Knight,H., Nimmo,H.G., Zhang,R., and Brown,J.W.S. (2018) Rapid and Dynamic Alternative Splicing Impacts the Arabidopsis Cold Response Transcriptome. Plant Cell, 30, 1424-1444.
  • Guo,W. et al. (2020) 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists. RNA Biol., 00, 1–14.
  • Patro,R., Duggal,G., Love,M.I., Irizarry,R.A., and Kingsford,C. (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods, 14, 417-419.
  • Ritchie,M.E., Phipson,B., Wu,D., Hu,Y., Law,C.W., Shi,W., and Smyth,G.K. (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res, 43, e47.
  • Robinson,M.D., McCarthy,D.J., and Smyth,G.K. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139-40.
Data generation

Workflow

Step 1: Input data of 3D analysis

It may take a while for the App to respond for big dataset. Please wait until one process done before go to next step.


Note: Transcript-gene association mapping in "csv" format is recommended. Otherwise it may take a while to generate the information from a "gtf" or "fa" file.



                      Note: If the App is not running locally, but on our server, this folder is not accessible.
                    

Step 2: Select factors of experimental design

Step 3: Generate read counts and TPMs

  • lengthScaledTPM: scaled using the average transcript length over samples and then the library size (recommended).
  • scaledTPM: scaled up to library size.
  • no: no adjustment.
More details can be found in the tximport user manual.
Introduction
Data pre-processing

Workflow

Step 1: Filter low expressed transcripts and genes

 

Note:
  • In many publications, the CPM cut-off is set to 1 CPM and then varing the number of samples to cut.
  • An expressed transcript must have ≥ n samples ≥ m CPM (Count Per Million reads) expression.
  • An expressed gene must have at least one expressed transcript.
RNA-seq data information

 

Number of transcripts per gene




Step 2: Principal component analysis (PCA)



  1. Select PCs (X- and Y-axis) to visualise
  2. Select "Plot type", either to show all samples or average expression of conditions
  3. Select a factor/facotrs to colour samples into groups (multi-select allowed)
  4. Select "ellipse" or "polygon" to highlight the groups


Step 3: Estimate batch effects

This step is only executed when there are distinct batch effects in the data:
  • The biological replicates of the same conditions stay in separate clusters in the PCA plot.
  • There is experimental explanation of this separation (e.g. bio-reps in different time/labs).
Otherwise skip to Step 4 to avoid data over-modification.







Step 4: Data normalization



Three normalisation methods, TMM (weighted trimmed mean of M-values), RLE (relative log expression) and upperquartile (upper-quartile), have comparable performance (Maza, 2016). TMM is more widely used in differential expression studies of RNA-seq data.




Data generation
3D analysis

Workflow

Step 1: Experimental design

If the information is not incorrect, please go to "Data generation" page to check the selections of experimental design in the sample table.

Step 2: Set contrast groups

Contrast group-1

VS

Note: Select a single group label in each side of "VS" to make pair-wise contrast groups, e.g. "B VS A" for "B-A", "C VS A" for "C-A", etc.
Contrast group-1

VS

Note: Select multiple group labels in each side of "VS" to make contrast groups to compare group means, e.g. "A,B VS C,D" for "(A+B)/2-(C+D)/2", "A,B VS C,D,E" for "(A+B)/2-(C+D+E)/3", etc.
Contrast group-1

VS

Note: Select two group labels in each side of "VS" to make contrast groups to compare pair-wise group differences, e.g. "A,B VS C,D" for "(A-B)-(C-D)", "A,B VS C,E" for "(A-B)-(C-E)", etc.

 

 

Factors of experimental design for contrast groups

Visualise contrast groups

Step 3: 3D analysis

Choose a pipeline



In addition to batch effects that occur for all the samples in a certain biological replicate, RNA-seq data may have variations in sample quality due to, for example, degradation or contamination of specific samples. These problematic samples are often shown as outliers in the PCA plot. In this case, the limma-voomWeights pipeline can be used to balance the outliners. Note: limma-voom is more stringent than limma-voomWeights. User can select a proper pipeline based on the data details.


Set thresholds

PS (percent of splice) is defined as the ratio of transcript average TPMs of conditions divided by the average gene abundance. \(\Delta\)PS is the PS differences of conditions based on the contrast groups.


DE/DAS/DTU testing statistics


Limma pipeline is recommended for 3D analysis. We compared different expression comparision pipelines, such as Sleuth (Pimentel et al., 2017), DESeq2 (Love et al., 2014) and EdgeR (Robinson et al., 2011), Limma is more capable for complex experimental design and has more robust predictions without losing stringency (Smyth et al. 2013). At differential gene expression level, limma and edgeR have comparable performance. Sleuth and DESeq2 have no function of differential alternative splicing analysis, and using edgeR pipeline will give greatly reduced predictions of DAS genes/DTU transcripts compared with Limma.

Step 4: 3D testing statistics

Visulise statistics


Up to top 100 ranked targets can be visulise in this App. The statistics of full gene/transcript lists can be saved to local folder in the "Generate report" step.

Top 100 3D targets

Step 5: Profile plot

Visualise plot of a single gene

E.g. AT1G01060 in Arabidopsis (gene name must match to provided transcript-gene mapping).


Make plots of multiple genes

Note: First line of the csv file is treated as header.



                      Note: If the App is not running locally, these plots can be downloaded in the final step "Generate report".
                    

 

Step 6: Significant 3D numbers

Number of 3D genes/transcripts in each contrast group

Venn diagrams of 3D lists

 

 

Volcano plot

Top n labels of the points will be calculatd according to their distance to coordinate (0,0)

 

Step 7: Transcriptional vs alternative splicing regulation

DE vs DAS genes

DE vs DTU transcripts

 

Transcriptional vs AS regulation

Data pre-processing
Time-series trend

Workflow

Description

Time-series trend analysis is aim to study an experiment with many time-points in each group. When there are many time-points, it is recommended to analyse on smoothed expression changes over time instead of discrete comparisons of individual time-points in each group. This page is for DE, DAS and DTU trend analysis of group-wise (the same set experiments are performed in different groups/blocks):
  • Time-series
  • Developmental series
  • Consecutive conditions with a sequential order

Note: If not applicable, please skip this page to the next analysis; Otherwise, please click "Yes" to expand analysis panels.

Step 1: Time-series experimental design

Note: (1) Development series or consecutive conditions with sequential order can be treated as "time-points" for trend analysis; (2) Spline method requires 3 or more time-points in each group; (3) If "No" is selected, trend will be compared in the way of discrete jumping from one to another time-point in each group.

Step 2: Set groups to compare

Time series groups-1

Step 3: Time-series 3D trend analysis

Choose a pipeline

Note: "Simes" method is more stringent than "Fisher" method. Each contrast group involves several time groups for trend comparisons, e.g. "Null hypothesis: Day1=Day2=Day3". These methods are used to generate overall p-values across these time groups.

Testing statistics

TS trend 3D number

Step 4: Profile plot

Visualise plot of a single gene

E.g. AT1G01060 in Arabidopsis (gene name must match to provided transcript-gene mapping).


Make plots of multiple genes

Note: First line of the csv file is treated as header.



                        Note: If the App is not running locally, these plots can be downloaded in the final step "Generate report".
                      
3D analysis
Functional plot

Workflow

Step 1: Heatmap

Select targets


Clustering

See the R functions for hierarchical clustering: hclust and dist

Options

Select colours of values

Time-series trend heatmap

Select targets

Note: If select "All", union set of all contrast groups is used.

Clustering

See the R functions for hierarchical clustering: hclust and dist

Options

Select colours of values
 

Step 2: GO annotation at gene level

  Download gene list


Gene lists of DE genes, DAS genes, DE transcripts and DTU transcripts will be downloaded and saved in scv file. Users can use these lists and GO analysis tools/websites to create GO annotation table for plot.

Loaded GO annotation table

GO annotation plot

Time-series trend
TSIS

Workflow

Isoform switch analysis

Transcript isoform switches (ISs) occur when a pair of alternatively spliced isoforms reverse the order of their relative expression levels as shown in Figure 1. IsokTSP is a method to detect transcript ISs between pair-wise conditions (Sebestyen et al., 2015) (Figure 1A) while TSIS (time-series IS) is used to identify ISs in time-series data (Figure 1B). To have robust results, isokTSP and TSIS characterize the transcript switch by 1) defining the isoform switch conditions/time-points for any pair of transcript isoforms within a gene, 2) describing the switch using five different features or metrics, 3) filtering the results with user provided specifications and 4) visualizing the results using different plots for the user to examine further details of the switches (see details in our publication Guo et al., 2017).

IsokTSP: In 3D RNA-seq analysis, isokTSP is used to identify ISs between pair-wise conditions that make up contrast groups of expression comparisons. To find IS point, the average TPMs of isoforms are treated as y-axis coordinates and 1 and 2 are used as x-axis coordinates at pair-wise conditions (Figure 1A). Then two lines, which represent two isoforms, can be fitted based on these coordinates. The interaction (if exists) of two lines is defined as the IS point.

TSIS: To find all IS points in time-series expression, average TPMs or spline fitted values of TPMs are used as y-axis coordinates and integers, 1, 2, 3, ..., are x-axis coordinates. Lines of isoforms are fitted between each pair of consecutive time-points. The IS points can be identified from interactions of lines.

Significant IS: Five metrics are used to evaluate the significance of ISs. (1) the probability of switch (i.e., the frequency of samples reversing their relative abundance at the switches); (2) the sum of the average TPM differences of the two isoforms in both intervals before and after the switch point; (3) adjusted p-value (i.e. FDR), which is the significance of the differences between the switched isoform abundances before and after the switch; (4) the time-points at both intervals before and after switch (set to 1 in isokTSP); and (5) Pearson correlation of two isoforms. Users can apply filters based on these metrics to generate significant ISs (Guo et al., 2017).

Figure 1: Isoform switch analysis methods. Expression data with 3 replicates for each condition/time-point is simulated for isoforms \(iso_{i}\) and \(iso_j\) (blue and red circles). The points in the plots represent samples and the black lines connect the average of samples. (A) is the isokTSP method for comparisons of two conditions \(c_1\) and \(c_2\). The Time-Series Isoform Switch (TSIS) tool is designed for detection and characterization of isoform switches for time series data shown in (B). The time-series with 6 time-points is divided into 4 intervals by the intersection points of average expression.
If you use TSIS in your work, please cite:
Wenbin Guo, Cristiane P. G. Calixto, John W.S. Brown, Runxuan Zhang, "TSIS: an R package to infer alternative splicing isoform switches for time-series data", Bioinformatics, https://doi.org/10.1093/bioinformatics/btx411, 2017.
The paper for TSIS application in RNA-seq data from study of Arabidopsis in response to cold:
Calixto,C.P.G., Guo,W., James,A.B., Tzioutziou,N.A., Entizne,J.C., Panter,P.E., Knight,H., Nimmo,H., Zhang,R., and Brown,J.W.S. (2018) Rapid and dynamic alternative splicing impacts the Arabidopsis cold response transcriptome. Plant Cell.

Step 1: Set parameters of IS analysis

Select TSIS or isokTSP

TSIS: Time-Series Isoform Switch across sequencial time-points; isokTSP: Pair-Wise Isoform Switch between conditions of contrast groups.

Scoring parameters

Press Scoring button to implement the scoring of isoform switches. The details of parameters:
  • Method for intersections: Using either mean values or natural spline fitted smooth curves of time-series expression to determine the intersection points of isoforms.
  • Degree of spline: If use spline method, the higher degree leads to more break points of the time-series.

Filtering parameters


Press Filtering button to filter the scores. The details of parameters:
  • Probability cutoff: The isoform switch probability/frequency cut-off for the column "prob" in the output table.
  • Difference cutoff: The isoform switch difference cut-off for the column "diff" in the output table.
  • P-value cutoff: The p-value cut-off of both columns "before.pval" and "after.pval" in the output table.
  • Min time in interval: The minimum time points for both columns "before.t.points" and "after.t.points" in the output table.
  • Correlation cutoff: The cut-off for Pearson correlation of isoform pairs.

TSIS results

The following table shows the feature scores of isoform switches. The columns in the output table:
  • iso1, iso2: the isoform pairs.
  • iso1.mean.ratio, iso2.mean.ratio: The mean ratios of isoforms to their gene.
  • before.interval, after.interval: The intervals before and after switch points.
  • x.value, y.value: The values of x axis (time) and y axis (expression) coordinates of the switch points.
  • prob: The probability/frequency of switch.
  • diff: The sum of average sample differences before and after switch.
  • before.pval, after.pval: The paired t-test p-values of the samples in the intervals before and after switch points.
  • before.t.points, after.t.points: The number of time points in intervals before and after the switch points.
  • cor: The Pearson correlation of iso1 and iso2.

Step 2: Number of significant switches


Step 3: Plot significant switches

Input isoform names:



Plot all significant isoform switches


                      


Note: (1) The plot width, height and png plot resolution are taken from the above isoform switch visualisation plot. (2) It may take quite a while to save all the plots.
Functional plot
Generate report

Save data and results


Results are saved to 3D App working directory:
  • Intermediate data are saved in "data" folder.
  • Significant 3D lists and statistics in .csv (comma delimited) are saved in "result" folder.
  • Figures are saved in "figure" folder.

Reports in word, pdf and html formats are saved in "report" folder in 3D App working directory
Click to download results


If the 3D RNA-seq App is running on a local PC, all the results (figures, tables, reports and intermediate .RData objects) are already saved in the working directory. If the App is running on our server, all the results can be zipped and downloaded by clicking above button.
TSIS