Introduction
Alternative link on Bilibili: https://www.bilibili.com/video/BV1mS4y1g7uB/
Licence
3D RNA-seq is currently under a dual-licensing model.- Open source under GPLV3.0. For academic and non-commercial use, it is free.
- For commercial use, please get in touch to obtain commercial licenses. Contact us
Citation
- Guo,W. et al. (2020) 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists. RNA Biol., DOI: 10.1080/15476286.2020.1858253.
- Calixto,C.P.G., Guo,W., James,A.B., Tzioutziou,N.A., Entizne,J.C., Panter,P.E., Knight,H., Nimmo,H.G., Zhang,R., and Brown,J.W.S. (2018) Rapid and Dynamic Alternative Splicing Impacts the Arabidopsis Cold Response Transcriptome. Plant Cell, 30, 1424-1444.
Report issues & feedback
If you have questions to raise or are experiencing difficulties using the 3D RNA-seq, please use the 3D RNA-seq user group.Table of contents
Description
The ThreeDRNAseq (3D RNA-seq) R package provides an interactive graphical user interface (GUI) for RNA-seq data analysis using accurate quantification of RNA-seq reads. It allows users to perform differential expression (DE), differential alternative splicing (DAS) and differential transcript usage (DTU) (3D) analyses based on limma (Ritchie et al., 2015). The 3D RNA-seq GUI is based on R shiny App and enables a complete RNA-seq analysis to be done within only 3 Days (3D).Prior to 3D RNA-seq
RNA-seq raw read data is organised by sample. Different samples are either generated by different experimental conditions (multiple genotypes, tissue types or treatments, time-of day circadian experiments) or by biological replication (sometimes also sequencing replication) and each of them is used for the construction of individual libraries. Next, quality control and trimming of the RNA-seq reads should be performed to ensure that high quality RNA-seq reads are used for downstream analysis. The read data are then used for RNA-seq quantification. This step can be performed using many different pipelines, and the type of pipeline determines whether you can use 3D RNA-seq for your downstream expression analyses or not. 3D RNA-seq is only compatible with transcript quantification data derived from Salmon (Patro et al., 2017) or Kallisto (Bray et al., 2016) with the use of a reference transcriptome or Reference Transcript Dataset which contains a list of the known genes and transcripts for the organism under study. The Salmon/Kallisto output file contains the TPM values for each transcript organised by biological repeat and treatment(s). Depending on the size of the dataset, the transcript quantification procedure might take up to 1-2 days.
Inputs to 3D RNA-seq
Three input files are required for downstream analysis:
Gather the meta-data of the experimental design in a csv spreadsheet, the columns of which must include the following information (Figure A):
- A column of the factor or multiple columns of the factors of the experimental design.
- A column of the biological replicate labels.
- A column of the sequencing replicate labels if they exist.
- A column of the file names of transcript quantifications.
A folder that contains the transcript quantification files. Each file contains transcript quantification data of a single sample. Read counts and TPMs for 3D analysis will be generated from the "quant.sf" objects if these files are generated by Salmon (Patro et al., 2017) and the "abundance.h5"/"abundance.tsv" objects if these files are generated by Kallisto (Bray et al., 2016) (Figure B).
Transcript-gene mapping file in one of the following formats:
- "csv" spreadsheet with first column of transcript IDs and second column of gene IDs (Figure C).
- Or a "fa" file of the transcriptome for quantification. Transcript names and gene IDs will be extracted from the "transcript" (if it exists) or ">" tags and "gene" tags in the description lines of sequences, respectively (Figure D).
- Or a "gtf" file of the transcriptome for quantification. Transcript names and gene IDs will be extracted from the "transcript_id" and "gene_id" tags in the last column, respectively (Figure E).
Note:
Transcript-gene mapping in "csv" file is recommended. Depending on the size, it may take a while to generate the table from a "fa" or a "gtf" file and any missing tags for transcript name and gene ID extraction in these files may lead to errors.
The 3D analysis is executable in a computer with normal memory and CPU size. If the App is running on docker image, it is recommended to reduce the data size to upload to our server. Users can exclude all the files in sub-folders of transcript quantifications, except the files of "quant.sf" from Salmon. If the transcript quantifications are generated using Kallisto, users can keep either "abundance.h5" or "abundance.tsv" in the sub-folders with smaller size (Figure B).
Figure 1: Input files of 3D RNA-seq App. The example is from a RNA-seq study of Arabidopsis in respond to cold (Calixto et al., 2018). (A) Meta-data table of sample information in csv file. (B) The folder contains transcript quantifications. The input of transcript-gene mapping information can be a "csv" spreadsheet with first column of transcript names and second column of gene IDs (C), a ".fa" file (D) or a ".gtf" file (E) of the transcriptome. If a ".fa" or a ".gtf" file is provided, the App will extract transcript-gene association information with specific tags.
Capability of 3D RNA-seq
The 3D RNA-seq analysis pipeline starts with a number of steps to pre-process the data and reduce noise (e.g. low expression filters based on expression mean-variance trend, visualise data variation, batch effect estimation, data normalisation, etc.). The optimal parameters for each step are determined by quality control plots (e.g. mean-variance trend plots, PCA plots, data distribution plots, etc.). Stringent filters are applied to statistical testing of 3D genes/transcripts to control false positives. After the analysis, publication quality plots (e.g. expression profile plots, heatmap, GO annotation plots, etc.) and reports can be generated. The entire 3D RNA-seq analysis takes only 1 Day or less and all actions are performed by simple mouse clicks on the App.
How to get help
3D RNA-seq App "easy-to-use" manual:
https://github.com/wyguo/ThreeDRNAseq/blob/master/vignettes/user_manuals/3D_RNA-seq_App_manual.md
Tooltips:
In the GUI, users can click tooltips in specific steps for help information.
Contact us:
3D RNA-seq App is developed and maintained by Dr. Wenbin Guo from Plant Sciences Division, School of Life Sciences, University of Dundee. If you have any questions and suggestions, please contact:
- Dr. Wenbin Guo: wenbin.guo@hutton.ac.uk
- Dr. Runxuan Zhang: runxuan.zhang@hutton.ac.uk
Pipeline of 3D RNA-seq App
Figure 2: 3D RNA-seq analysis workflow on this App.
References
- Guo,W. et al. (2020) 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists. RNA Biol., DOI: 10.1080/15476286.2020.1858253.
- Bray,N.L., Pimentel,H., Melsted,P., and Pachter,L. (2016) Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol., 34, 525-527.
- Calixto,C.P.G., Guo,W., James,A.B., Tzioutziou,N.A., Entizne,J.C., Panter,P.E., Knight,H., Nimmo,H.G., Zhang,R., and Brown,J.W.S. (2018) Rapid and Dynamic Alternative Splicing Impacts the Arabidopsis Cold Response Transcriptome. Plant Cell, 30, 1424-1444.
- Guo,W. et al. (2020) 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists. RNA Biol., 00, 1–14.
- Patro,R., Duggal,G., Love,M.I., Irizarry,R.A., and Kingsford,C. (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods, 14, 417-419.
- Ritchie,M.E., Phipson,B., Wu,D., Hu,Y., Law,C.W., Shi,W., and Smyth,G.K. (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res, 43, e47.
- Robinson,M.D., McCarthy,D.J., and Smyth,G.K. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139-40.
Workflow
Step 1: Input data of 3D analysis
It may take a while for the App to respond for big dataset. Please wait until one process done before go to next step.
Note: Transcript-gene association mapping in "csv" format is recommended. Otherwise it may take a while to generate the information from a "gtf" or "fa" file.
Note: If the App is not running locally, but on our server, this folder is not accessible.
Step 2: Select factors of experimental design
Step 3: Generate read counts and TPMs
- lengthScaledTPM: scaled using the average transcript length over samples and then the library size (recommended).
- scaledTPM: scaled up to library size.
- no: no adjustment.
Workflow
Step 1: Filter low expressed transcripts and genes
Note:
- In many publications, the CPM cut-off is set to 1 CPM and then varing the number of samples to cut.
- An expressed transcript must have ≥ n samples ≥ m CPM (Count Per Million reads) expression.
- An expressed gene must have at least one expressed transcript.
Number of transcripts per gene
Step 2: Principal component analysis (PCA)
- Select PCs (X- and Y-axis) to visualise
- Select "Plot type", either to show all samples or average expression of conditions
- Select a factor/facotrs to colour samples into groups (multi-select allowed)
- Select "ellipse" or "polygon" to highlight the groups
Step 3: Estimate batch effects
This step is only executed when there are distinct
batch effects in the data:
- The biological replicates of the same conditions stay in separate clusters in the PCA plot.
- There is experimental explanation of this separation (e.g. bio-reps in different time/labs).
Otherwise skip to Step 4 to avoid data over-modification.
Step 4: Data normalization
Workflow
Step 1: Experimental design
Step 2: Set contrast groups
Factors of experimental design for contrast groups
Visualise contrast groups
Step 3: 3D analysis
Choose a pipeline
In addition to batch effects that occur for all the samples in a certain biological replicate, RNA-seq data may have variations in sample quality due to, for example, degradation or contamination of specific samples. These problematic samples are often shown as outliers in the PCA plot. In this case, the limma-voomWeights pipeline can be used to balance the outliners. Note: limma-voom is more stringent than limma-voomWeights. User can select a proper pipeline based on the data details.
Set thresholds
PS (percent of splice) is defined as the ratio of transcript average TPMs of conditions divided by the average gene abundance. \(\Delta\)PS is the PS differences of conditions based on the contrast groups.
DE/DAS/DTU testing statistics
Limma pipeline is recommended for 3D analysis. We compared different expression comparision pipelines, such as Sleuth (Pimentel et al., 2017), DESeq2 (Love et al., 2014) and EdgeR (Robinson et al., 2011), Limma is more capable for complex experimental design and has more robust predictions without losing stringency (Smyth et al. 2013). At differential gene expression level, limma and edgeR have comparable performance. Sleuth and DESeq2 have no function of differential alternative splicing analysis, and using edgeR pipeline will give greatly reduced predictions of DAS genes/DTU transcripts compared with Limma.
Step 4: 3D testing statistics
Visulise statistics
Up to top 100 ranked targets can be visulise in this App. The statistics of full gene/transcript lists can be saved to local folder in the "Generate report" step.
Top 100 3D targets
Step 5: Profile plot
Visualise plot of a single gene
Make plots of multiple genes
Step 6: Significant 3D numbers
Number of 3D genes/transcripts in each contrast group
Venn diagrams of 3D lists
Volcano plot
Step 7: Transcriptional vs alternative splicing regulation
DE vs DAS genes
DE vs DTU transcripts
Transcriptional vs AS regulation
Workflow
Description
- Time-series
- Developmental series
- Consecutive conditions with a sequential order
Step 1: Time-series experimental design
Step 2: Set groups to compare
Step 3: Time-series 3D trend analysis
Testing statistics
TS trend 3D number
Step 4: Profile plot
Visualise plot of a single gene
Make plots of multiple genes
Workflow
Step 2: GO annotation at gene level
Loaded GO annotation table
GO annotation plot
Workflow
Isoform switch analysis
IsokTSP: In 3D RNA-seq analysis, isokTSP is used to identify ISs between pair-wise conditions that make up contrast groups of expression comparisons. To find IS point, the average TPMs of isoforms are treated as y-axis coordinates and 1 and 2 are used as x-axis coordinates at pair-wise conditions (Figure 1A). Then two lines, which represent two isoforms, can be fitted based on these coordinates. The interaction (if exists) of two lines is defined as the IS point.
TSIS: To find all IS points in time-series expression, average TPMs or spline fitted values of TPMs are used as y-axis coordinates and integers, 1, 2, 3, ..., are x-axis coordinates. Lines of isoforms are fitted between each pair of consecutive time-points. The IS points can be identified from interactions of lines.
Significant IS: Five metrics are used to evaluate the significance of ISs. (1) the probability of switch (i.e., the frequency of samples reversing their relative abundance at the switches); (2) the sum of the average TPM differences of the two isoforms in both intervals before and after the switch point; (3) adjusted p-value (i.e. FDR), which is the significance of the differences between the switched isoform abundances before and after the switch; (4) the time-points at both intervals before and after switch (set to 1 in isokTSP); and (5) Pearson correlation of two isoforms. Users can apply filters based on these metrics to generate significant ISs (Guo et al., 2017).
If you use TSIS in your work, please cite:
Wenbin Guo, Cristiane P. G. Calixto, John W.S. Brown, Runxuan Zhang, "TSIS: an R package to infer alternative splicing isoform switches for time-series data", Bioinformatics, https://doi.org/10.1093/bioinformatics/btx411, 2017.
The paper for TSIS application in RNA-seq data from study of Arabidopsis in response to cold:
Calixto,C.P.G., Guo,W., James,A.B., Tzioutziou,N.A., Entizne,J.C., Panter,P.E., Knight,H., Nimmo,H., Zhang,R., and Brown,J.W.S. (2018) Rapid and dynamic alternative splicing impacts the Arabidopsis cold response transcriptome. Plant Cell.
Step 1: Set parameters of IS analysis
Select TSIS or isokTSP
Scoring parameters
- Method for intersections: Using either mean values or natural spline fitted smooth curves of time-series expression to determine the intersection points of isoforms.
- Degree of spline: If use spline method, the higher degree leads to more break points of the time-series.
Filtering parameters
- Probability cutoff: The isoform switch probability/frequency cut-off for the column "prob" in the output table.
- Difference cutoff: The isoform switch difference cut-off for the column "diff" in the output table.
- P-value cutoff: The p-value cut-off of both columns "before.pval" and "after.pval" in the output table.
- Min time in interval: The minimum time points for both columns "before.t.points" and "after.t.points" in the output table.
- Correlation cutoff: The cut-off for Pearson correlation of isoform pairs.
TSIS results
- iso1, iso2: the isoform pairs.
- iso1.mean.ratio, iso2.mean.ratio: The mean ratios of isoforms to their gene.
- before.interval, after.interval: The intervals before and after switch points.
- x.value, y.value: The values of x axis (time) and y axis (expression) coordinates of the switch points.
- prob: The probability/frequency of switch.
- diff: The sum of average sample differences before and after switch.
- before.pval, after.pval: The paired t-test p-values of the samples in the intervals before and after switch points.
- before.t.points, after.t.points: The number of time points in intervals before and after the switch points.
- cor: The Pearson correlation of iso1 and iso2.
Step 2: Number of significant switches
Step 3: Plot significant switches
Input isoform names:
Plot all significant isoform switches
Note: (1) The plot width, height and png plot resolution are taken from the above isoform switch visualisation plot. (2) It may take quite a while to save all the plots.
Save data and results
Results are saved to 3D App working directory:
- Intermediate data are saved in "data" folder.
- Significant 3D lists and statistics in .csv (comma delimited) are saved in "result" folder.
- Figures are saved in "figure" folder.
Reports in word, pdf and html formats are saved in "report" folder in 3D App working directory