DriverDBv3: A database for human cancer driver gene research



1. What is DriverDBv3?

DriverDBv3 is a cancer omics database which integrates somatic mutation, RNA expression, miRNA expression, methylation, copy number variation and clinical data with annotation bases and published bioinformatics algorithms. DriverDB was previously featured in the 2014 and 2016, which applies published bioinformatics algorithms to dedicate driver gene/mutation identification. In this updated version of DriverDB, we aim to interpret the sophisticated information of cancer omics by concise data visualization.

The database provides three functions, ‘Cancer’, ‘Gene’, and ‘Customized-Analysis’, to help researchers visualize the relationships between cancers and driver genes.
The ‘Cancer’ function summarizes the calculated results of driver genes in different molecular features by using published bioinformatics algorithms/tools for a specific cancer type/dataset.
The ‘Gene’ function visualizes different features of a user-selected gene, such as Differential Expression (DE), CNV, methylation, survival, miRNA.
The ‘Customized-Analysis’ function offers different insights into survival analysis. The function includes two approaches to stratify patients - by expression or by mutation. This allows researchers to select a sub-group of well-defined cancer samples based on one or multiple clinical parameters for driver gene identification.

Our database newly incorporates the cancer driver genes which are defined in CGC and NCG6.0, updated RNA sequencing and exome sequencing data of TCGA collected from GDC data portal (https://portal.gdc.cancer.gov/), and data pre-processing approach are described in the previous publications(1, 2). Level 3 copy number variation data were downloaded by applying the TCGA2BED tool (3). Level 3 Methylation data were collected from firehose (https://gdac.broadinstitute.org/). TCGA clinical data were downloaded by using an R package, ‘TCGAbiolinks’(4). We also incorporate ICGC whole genome sequencing and exome sequencing data from ICGC data portal (https://dcc.icgc.org/). In addition, cancer-related genes are defined according to CGC, which was downloaded from COSMIC (https://cancer.sanger.ac.uk/census) (5), and NCG6.0 database (6).


2. Section One - Cancer

The ‘Cancer’ section summarizes the calculated results of driver genes in different molecular features by publishing using published bioinformatics algorithms/tools for a specific cancer type/dataset.

Dataset selection panel Browse by tissue type In this module, DriverDBv3 offers 33 cancer types by querying the following drop-down panels:
(A). users can search for a specific dataset by browsing tissue types;
(B). users can select a dataset released by different cancer types.
Press Submit to view driver gene information of multiple features in a specific cancer type.

2.1. Cancer Summary The Cancer Summary section provides a Summary network which integrates cancer dysfunction and dysregulation events in a multi-omics level, and a Functional Annotation analysis of these cancer driver genes. 2.1.1. Cancer summary network The Cancer Summary network presents the relationships between driver genes and miRNA drivers in a specific cancer type. The resources of gene dataset include the Cancer Gene Census (GCG) and the Network of Cancer Genes (NGC6.0). Driver genes possess various features and are denoted by nodes gridded with colors (as shown in the legend figure); miRNA drivers are denoted by yellow nodes. These nodes are connected by lines to show the protein-protein interactions (PPIs) in the STRING database and synergistic effects, defining by which hazard ratio (HR) of two genes is greater than 1.5 fold of each gene. All of these factors can be adjusted by toggling the options of the control panel on the right-hand side. 2.1.2. Driver summary table In this table, the relevant genes of user-selected cancer type are listed with numerous information. For example, whether the gene was recorded on CGC database or NCG6.0 database. The numbers of mutations, CNV, methylation, miRNA data are there in each gene. 2.1.3. Functional Annotation The following sections provide three levels of functional analysis of driver genes: Gene Oncology(by biology proves, cellular component, molecular function), Pathways (by KEGG, Reactome, PID, Biocarta, Curated_passway, Motif, Computational_gene_set, Oncogenic_signatures, miTar, miRWalk, position and immunologic_signatures), and Protein/Genetics interaction (by BioGRID, IntAct, and STRING). 2.1.3.1. Gene Ontology analysis The topology of significantly altered GO categories accounted by topGO packages of Bioconductor. GO categories and genes are divided into three groups, which are biology proves, cellular component, molecular function. GO Plot and Significant Term are both available for download based on the chosen category. On the right-hand side is a table of all significantly altered GO categories. 2.1.3.2. Pathway There are 12 collections of gene sets from public databases and classified. The classifications include KEGG, Reactome, PID, Biocarta, Curated_passway, Motif, Computational_gene_set, Oncogenic_signatures, miTar, miRWalk, position and immunologic_signatures. The Network Plot and Significant Term are provided below. A: A tab panel of 12 pathway/gene set collections.
B: A network layout displays pathway/gene set categories of driver genes
C: A table of pathway/gene set categories.
*Only the top 30 pathways and their corresponding genes will be shown.

The following table shows information about these collections.
Collections in "Pathway Analysis" Source from Note Reference
KEGG KEGG (Kyoto Encyclopedia of Genes and Genomes) http://www.genome.jp/kegg/
Reactome Reactome https://reactome.org/
PID Pathway Interaction Database of NCI http://www.ndexbio.org
BioCarta BioCarta http://cgap.nci.nih.gov/Pathways/BioCarta_Pathways
Curated pathway MotifDB (The Molecular Signatures Database) Matrisome Project, SigmaAldrich Signaling Gateway, and SuperArray SABiosciences from C2: curated gene sets http://software.broadinstitute.org/gsea/msigdb/collections.jsp
Motif C3: motif gene sets
Computational gene set C4: computational gene sets
Oncogenic signatures C6: oncogenic gene sets
miRTar miRTar http://mirtar.mbc.nctu.edu.tw/human/
miRWalk miRWalk http://mirwalk.umm.uni-heidelberg.de/
Position MotifDB (The Molecular Signatures Database) C1: positional gene sets http://software.broadinstitute.org/gsea/msigdb/collections.jsp

2.1.3.3. Protein/Genetics Interaction Protein/Genetic interaction network layouts from three databases: BioGRID, IntAct and STRING.

2.2. Cancer Mutation The Cancer Mutation function provides visualizations to illustrate the mutation drivers identified by bioinformatics tools in a specific cancer type. 2.2.1. Visualization of Top 30 genes A plot is provided on the left to illustrate the relations between the top 30 mutation driver genes and cancer patients. The x-axis represents samples of cancer patients; the y-axis lists the top 30 genes. The percentages shown on the left of the plot represents the total percentage of mutation in the samples for each mutation drivers (A). The bar charts on the top (B) and right (C) calculate the total of mutation occurrences by column (each sample) and by row (each mutation driver), respectively. The impact is categorized as ‘High’, ‘Moderate’, ‘Low’, ‘Modifier’. These are just pre-defined categories to help users find more significant variants, not to predict which variant is the one producing a phenotype of interest. 2.2.2. Mutation driver defined by tools The left part shows the driver genes that were identified by tools. The mutation drivers that were defined by more tools indicated higher confidence. The number of genes is set by the users by adjusting the top bar. A table of genes and the detailed information of the tools is also provided below.

2.2. Cancer CNV This section provides visualized information about the copy number variation (CNV) in a selected cancer type. By using multiple bioinformatics tools, the graphs display Top 30 genes and present the CNV gain or loss status as well as the details of locus enrichment. All of the following analyses can be adjusted by selecting one or two defining tools. 2.3.1. Visualization of Top 30 genes This bar chart provides an overview of the copy number variation (CNV) percentages for the top 30 genes. If you move the mouse over the bar of genes, the details of the percentage CNV will be shown in a tooltip. Green areas represent CNV loss; pink areas represent CNV gain; blue areas represent none CNV. This plot illustrates the relations between the top 30 genes and their CNV in cancer patients for a certain cancer type. The x-axis represents samples of cancer patients; the y-axis lists the top 30 genes. The bar charts on the top and right calculate the total CNV occurrences by column (each sample) and by row (each gene), respectively. Green areas represent CNV loss; pink areas represent CNV gain. 2.3.2. Locus enrichment The graph shows the loci of all genes within all chromosomes. Each red dot represents a gene and its related position on the chromosome; move your mouse over the various genes to see chromosome, position, value and name of genes. The value represents the correlation between RNA expression and CNV. If the value is positive, it shows there are copy number gain/loss, further influencing the expression of miRNA.
The details of locus enrichment results are also provided on the right 2.3.3. CNV table The table comprises comprehensive information of CNV, including GENE name, ensg, iGC_gain_p_value, iGC_gain_fdr, iGC_loss_p_value, iGC_loss_fdr, iGC_gain_sample_prop, iGC_normal_sample_prop, iGC_loss_sample_prop, iGC_gain_log2FC, iGC_loss_log2FC, diggit_spearman.pval, spearman_cor, spearman_pv.

2.4. Cancer Methylation This section provides visualized information about the degree of hyper/hypomethylation of different genes in a selected cancer type. By using multiple bioinformatics tools, the graphs display Top 30 genes and present the methylation status as well as the details of locus enrichment. All of the following analyses can be adjusted by selecting one or two defining tools. 2.4.1. Visualization of Top 30 genes This bar chart provides an overview of the methylation percentages for each of the top 30 genes. Green areas represent hypomethylation; pink areas represent hypermethylation; blue areas represent non-methylation. As always, moving the mouse over the bar of the gene shows the details of the percentage of methylation in a tooltip. The following plot illustrates the relations between the top 30 genes and their methylation type in cancer patients for a certain cancer type. The x-axis represents samples of cancer patients; the y-axis lists the top 30 genes. The bar charts on the top and right calculate the total methylation occurrences by column (each sample) and by row (each gene), respectively. Green areas represent hypomethylation; pink areas represent hypermethylation. 2.4.2. Locus enrichment This graph shows the loci of each gene within each chromosome. Each red dot represents a gene and its related position on the chromosome; move your mouse over the various genes to see chromosome, position, value and name of genes. The value represents the correlation between RNA expression and methylation status. If the value is negative, it indicates there are hyper/hypomethylation, further influencing the expression of miRNA. A table of pathways and detailed information of p-value and involved genes is also provided on the right. 2.4.3. Methylation table Methylation relevant factors are listed in the table, comprising gene_symbol, ensg, methylmix_hyper_percent, methylmix_none_percent, methylmix_hypo_percent, Probe, Distance, Pe, MET_type_by_ELMER, spearman_cor, spearman_pv.

2.5. Cancer Survival The Cancer Survival function provides visualizations to illustrate the Survival network and Survival of synergistic effect identified by bioinformatics tools in a specific cancer type. 2.5.1. Survival network The Cancer Survival network illustrates the synergistic effect between significant survival-relevant genes. Firstly, the gene can be chosen from CGC database, NCG6.0 database, or both. Next, if individual genes’ HR both are >1 or <1, the system will calculate their synergistic effect. On the contrary, if the HR of two genes opposite direction, they won’t be included in the plot. Lastly, the synergistic effect is defined by the combined hazard ratio (HR) of two genes and has 2 levels – 1.5 folds and 2 folds. These folds mean that the combined hazard ratio is 1.5 folds or double than the hazard ratio of a single gene. 2.5.2. Survival of Synergistic effect A table of cancer types with gene pairs is provided for each direction of HR. Toggling the desired gene pairs on the table generates corresponding survival plots on the right. On the right-hand side, two figures display the survival probability of the paired gene. Log-Rank P-value and Hazard Ratio elucidate the significance of the difference. The explanations of the abbreviations are shown below.
All.high = patients with high expression in both gene1 and gene2
Other = patients with other combinations
All.low = patients with low expression in both gene 1 and gene 2
Low.high = patients with low expression in gene 1 but high expression in gene 2
High.low = patients with high expression in gene 1 but low expression in gene 2

2.6. Cancer miRNA The Cancer miRNA function provides visualizations to illustrate the Gene-miRNA network and Visualization of differentially expressed gene and miRNA, identified by bioinformatics tools in a specific cancer type. 2.6.1. Cancer miRNA network The Cancer miRNA network illustrates the relations between the genes and miRNAs of a user-selected Cancer type. Validated relations are represented by solid lines; whereas predicted relations are represented by dotted lines. The nodes colored in green represent the genes; the nodes colored in yellow represent the miRNA. Predicted relations can be further filtered by a minimum of 6, 8, or 10 tools.

  • ‘Validated relations’ are validated miRNA-target interactions recorded in miRTarbase (1).
  • ‘Predicted relations’ are the interactions defined by 12 prediction tools, including DIANA-microT (2), MicroT4 (3), miRBridge (4), miRDB (5), miRMap (6), PITA (7), RNAhybrid (8), TargetScan (9), PICTAR2 (10), RNA22 (11), miRWalk (12) and miRanda (13).
Reference:
(1) Chou C.H., Chang N.W., Shrestha S., Hsu S.D., Lin Y.L., Lee W.H., Yang C.D., Hong H.C., Wei T.Y., Tu S.J., et al. miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic Acids Res. 2016;44:D239–D247.
(2) Paraskevopoulou M.D., Georgakilas G., Kostoulas N., Vlachos I.S., Vergoulis T., Reczko M., Filippidis C., Dalamagas T., Hatzigeorgiou A.G. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows. Nucleic Acids Res. 2013;41:W169–W173.
(3) Maragkakis M., Vergoulis T., Alexiou P., Reczko M., Plomaritou K., Gousis M., Kourtis K., Koziris N., Dalamagas T., Hatzigeorgiou A.G. DIANA-microT Web server upgrade supports Fly and Worm miRNA target prediction and bibliographic miRNA to disease association. Nucleic Acids Res. 2011;39:W145–W148.
(4) Tsang J.S., Ebert M.S., van Oudenaarden A. Genome-wide dissection of microRNA functions and cotargeting networks using gene set signatures. Mol. Cell. 2010;38:140–153.
(5) Wong N., Wang X. miRDB: an online resource for microRNA target prediction and functional annotations. Nucleic Acids Res. 2014;43:D146–D152.
(6) Vejnar C.E., Zdobnov E.M. miRmap: Comprehensive prediction of microRNA target repression strength. Nucleic Acids Res. 2012;40:11673–11683.
(7) Kertesz M., Iovino N., Unnerstall U., Gaul U., Segal E. The role of site accessibility in microRNA target recognition. Nat. Genet. 2007;39:1278–1284.
(8) Kruger J., Rehmsmeier M. RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Res. 2006;34:W451–W454.
(9) Shin C., Nam J.W., Farh K.K., Chiang H.R., Shkumatava A., Bartel D.P. Expanding the microRNA targeting code: functional sites with centered pairing. Mol. Cell. 2010;38:789–802.
(10) Shin C., Nam J.W., Farh K.K., Chiang H.R., Shkumatava A., Bartel D.P. Expanding the microRNA targeting code: functional sites with centered pairing. Mol. Cell. 2010;38:789–802.
(11) Miranda K.C., Huynh T., Tay Y., Ang Y.S., Tam W.L., Thomson A.M., Lim B., Rigoutsos I. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell. 2006;126:1203–1217.
(12) Miranda K.C., Huynh T., Tay Y., Ang Y.S., Tam W.L., Thomson A.M., Lim B., Rigoutsos I. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell. 2006;126:1203–1217.
(13) Enright A.J., John B., Gaul U., Tuschl T., Sander C., Marks D.S. MicroRNA targets in Drosophila. Genome Biol. 2003;5:R1.

2.6.2. Visualization of differential expressed gene and miRNA This section visualizes the association between differentially expressed (DE) gene or miRNA and different patient plus healthy individuals. The heatmap on the right column can show DE genes on a per-sample basis, DE miRNA on a per-sample basis, or both. This can be toggled in the ‘Visualization by’ column. The X-axis of the heatmap represents a different individual. The intensity of the red/blue colors represents the expression level. The top and right dendrogram indicate the similarity of symbols or patients. The bar of sample below dendrogram encompasses sample ID. “TP” means tumor sample, which is in dark blue. “NT” means normal sample, which is in light blue. 2.6.3. Gene - miRNA table More analyzed results are shown in the table, including Cancer type abbreviation, miRNA, Gene symbol, ENSG, Validated, Number of tools, Pearson correlation coefficient, Pearson p-value, Spearman correlation coefficient, Spearman p-value, Kendall correlation coefficient, Kendall p-value.

3. Section Two - Gene

In this section, researchers can visualize the mutation data for a specific protein encoded by a gene in seven aspects: Summary, Expression, Hotspot, Mutation, CNV, Methylation, Survival and miRNA.

Gene Search Gene search by inputting HGNC gene symbol or Ensembl ID.

3.1. Gene Summary The Gene Summary provides a visualization of different features of a user-selected gene in relation to the cancer types. The features contain Differential expression (DE), Mutation, Copy Number Variation (CNV), Methylation, Survival and miRNA.

The Differential expression (DE) squares indicate whether a gene is significant with p-value <0.05 and whether it is differentially expressed. Red represents up-regulated genes with log2(fold change) > 1 and green represents down-regulated genes with log2(fold change) < -1.

The Mutation squares indicate the number of mutation tools which identify this gene as a mutation driver. As the number of tools goes from low to high, the blue color goes from light to deep, correspondingly.

The Copy Number Variation (CNV) squares indicate CNV gain or loss of a gene. Red indicates gains with respect to the reference genome (1) and the green represents a loss (-1).

The Methylation squares indicate whether the gene is hyper-methylated (1) or hypomethylated (-1). Red represents hyper and green represents hypo.
The Survival squares indicate whether this is survival-relevant with log-rank p-value < 0.05. Red represents oncogene with log2(hazard ratio) > 0 and green represents tumor suppressor gene with log2(hazard ratio) < 0.

The miRNA squares indicate the number of miRNAs that interacts with this gene. As the number of miRNA goes from low to high, the orange color goes from light to deep, correspondingly.

For those are not significant in the analysis, the squares are colored in grey; Others that has no data for some cancer types are colored in white.

3.2. Gene Expression The expression profiles of the gene across cancer types by sample type and mutation class are illustrated by boxplots. 3.2.1. Visualization of all cancer types The first boxplot shows the distribution of expression of all cancer types for a user- selected gene by sample type. Legends of the sample types are shown on the right- hand side of the boxplots. The boxplots present all sample types by default. To hide certain types from the boxplots, click on the legends of the sample types to unselect the types. Hover the mouse over the quantile boxes to view gene names and IQR (interquartile range) of expression level.
The second boxplot indicates the expression of mutation classes (such as truncating and in-frame mutations) in various cancer types. Legends of the mutation classes are shown on the right-hand side of the boxplots. The boxplots present all mutation classes by default. To hide certain classes from the boxplots, click on the legends to unselect the classes. Hover the mouse over the quantile boxes to view gene names and IQR (interquartile range) of expression level. 3.2.2. Visualization of each cancer type This plot provides a more detailed view of the distribution of expression within a single cancer type. Select the desired cancer type on the table on the left (D) to generate the corresponding expression boxplots (B) on the right. The expression could be grouped in 2 ways: by sample type or by mutation class based on user selection (A), and a table of the corresponding p-value is also generated below (C). By comparing the p-value of group1 between group2, users can see whether there is a significant difference between each of the mutation class or sample type. For the function of ‘By sample type’, based on the cancer type, the visualization of expression will be grouped by one to multiple sample types, including Additional Metastatic, Additional New Primary, Metastatic, Primary Blood-Derived Cancer, Primary Solid Tumor, Recurrent Solid Tumor, and Solid Tissue Normal. For the function of ‘By mutation class’, based on the cancer type, the visualization of expression will be grouped by one to multiple mutation classes, including HIGH, LOW, MODERATE, MODIFIER, Normal Tissue, Tumors without Mutation. Hover the mouse over the boxplot to view detailed expression level within each sample type or mutation class. 3.2.3. Gene Mutation The Gene Mutation section provides a visualization of the Mutation Rate, Mutation Percentage, Exon, and hotspot within a user-selected gene in relation to multiple cancer types. Users can obtain separate results by switching the Data source and visualizing methods. 3.2.3.1. Mutation rate 3.2.3.1.1. Visualization of all cancer types This heat map indicates the mutation rate of a given protein at different positions in several cancer types. (mutation rate = mutation count/ sample count) Exon and protein domain information, alone with protein coordinates, is at the bottom of the heat map. Two bar charts located at the top and the left of the heat map indicate the sum of mutation rate according to protein position and cancer type, respectively. A: The protein coordinate: the length of the protein is divided into 20 equal regions.
B: The ranks of exons of a specific protein are denoted by distinct colors
C: The region of protein domains in the protein coordinate.
D: The heatmap indicates the mutation rates/percentage of 20 regions across various cancer types
E: The bar chart of mutation rates/percentage across protein regions.
F: The bar chart of mutation rates /percentage across cancer types.
* Colors in E and F indicate the functional impact of mutations, such as non-synonymous and frameshift.
3.2.3.1.2. Visualization of a single cancer type This bar chart (B) represents the mutation rate of a single gene selected by the user from the left panel (A). The colors of bars indicate the impacts of mutation, which comprise three levels, red for high, blue for moderate, green for low (D). The Y-axis presents the rate of mutation. The X-axis shows the protein position of the mutation. More details are shown in the tooltips by moving the mouse over (C). 3.2.3.2. Mutation_percent This heat map indicates the mutation percentage of a given protein at different positions in several cancer types. (Percentage of mutation = mutation count/total mutation count) The heights of the two bar charts at the left and the top of the heat map are normalized to the mutation count of a cancer type or a protein region, respectively. Other operation methods are same as Mutation_rate. Please see the section 3.2.3.1.1 above. 3.2.3.3. Exon 3.2.3.3.1. Visualization of all cancer types The top of the plot shows the domains(C) and the exon positions(D) of user-assigned protein. The bar charts below show the mutation count (A) and mutation percentage (B) across exons for all cancer types. The blue color indicates that the mutation is attributed to moderate impact. The green color indicates a low impact. The red color indicates high impact. 3.2.3.3.2. Visualization of each cancer type This bar chart (B) represents the mutation statistics across exons within each cancer type gene selected by the user from the left panel (A). Select the desired cancer type on the table to generate a corresponding bar chart on the right. The statistics could be visualized in 2 ways: by mutation count or by mutation percentage based on user selection.  The colors of bars indicate the impacts of mutation, which comprise three levels, red for high, blue for moderate, green for low (D). The Y-axis presents the rate of mutation. The X-axis shows the protein position of the mutation. More details are shown in the tooltips by moving the mouse over (C). 3.2.3.4. Hotspot 3.2.3.4.1. Visualization of all cancer types This heatmap shows the regions of the protein that are identified as the hotspot mutation regions (HMRs) across several cancer types. The X-axis of the heatmap represents the different position of a chosen protein. The intensity of the yellow/purple colors represents the frequency of mutation. The top graph indicates the cumulative count of mutation in each position. More details are shown in the tooltips by moving the mouse over, including the number of bioinformatic tools that identify the mutation. A: The cumulative counts for the regions identified as HMRs.
B: The regions of the protein identified as HMRs across different cancer types.
C: Domain information with protein coordinates
D: Each color represents an exon and the coordinate represents the location of the exons
3.2.3.4.2. Visualization of each cancer type This bar chart provides a more detailed view of the HMRs identified by four methods in a single cancer type of chosen. Select the desired cancer type on the table to generate the corresponding bar chart on the right. The bioinformatic tools that be conducted to identify the HMRs are oncodriveCLUST, iPAC, MSEA, and E-driver. As always, moving mouse over the square will show more information about the HMRs.

3.3. Gene CNV The Gene CNV section provides a visualization of the Copy Number Variation within a user-selected gene in relation to multiple cancer types. 3.3.1. Copy number variation in all cancer This graph shows the copy number variation (CNV) of a user-selected gene in several cancer types. Each gene is analyzed by two CNV tools: iGC (1) and DIGGIT (2). The sample proportion is shown in the plot when the cancer is identified by one or both of the tools. On top of the bar chart, CNV Driver (A) presents the number of tools that identified the CNV in the specific cancer types and whether it is significant. If it is only identified by iGC, CNV driver will show in light grey, while dark grey implied that it is identified by both tools. The bottom part of the chart indicates the loss/gain copy number status (B). The green color represents loss copy number; red is gain copy number; blue represents no copy number change. More details related to analyses and tools can be found in tooltips when moving the mouse over the bars (C). Reference:
(1) Lai, Y.P., Wang, L.B., Wang, W.A., Lai, L.C., Tsai, M.H., Lu, T.P. and Chuang, E.Y. (2017) iGC-an integrated analysis package of gene expression and copy number alteration. BMC Bioinformatics, 18, 35.
(2) Alvarez, M.J., Chen, J.C. and Califano, A. (2015) DIGGIT: a Bioconductor package to infer genetic variants driving cellular phenotypes. Bioinformatics, 31, 4032-4034.
3.3.2. Visualization of each cancer types This graph provides a combination of scatter plot and boxplots to show a more detailed view of the CNV distribution and correlation in each cancer type. Select the desired cancer type (A) on the table to generate the corresponding graph on the right (B). The Visualization implies the message that whether the expression has a difference if CNV gain or loss (C). The copy number values are transformed into segment mean values (D), which are equal to log2(copy-number/ 2). Diploid regions will have a segment mean of zero (-0.3 – 0.3), amplified regions will have positive values (>0.3), and deletions will have negative values (<-0.3). 3.3.3. CNV Table The gene is analyzed by two CNV tools: iGC and diggit for different cancer types/datasets. The CNV table provides a summarized information, including Cancer type abbreviation, Gene symbol, ENSG, statistic data: (1)iGC - Gain p-value, Gain FDR, Loss p-value, Loss FDR, Gain sample proportion, Normal sample proportion, Loss sample proportion, Gain log2FC, Loss log2FC, (2)DIGGIT - p-value, (3)Segment mean vs. expression - Spearman correlation coefficient, Spearman p-value.

3.4. Gene Methylation 3.4.1. Methylation status in all cancer This graph shows the methylation status of a user-selected gene in several cancer types. Each gene is defined by two methylation tools: methylmix (1) and ELMER (2). The sample proportion is shown in the plot when the cancer is identified by one or both of the tools. On top of the bar chart, Methylation Driver presents the number of tools that identified the hyper/hypomethylation in the specific cancer types and whether it is significant. Dark grey represents that the methylation driver is identified by two tools. Light grey is identified only by methylmix. The bottom of the chart indicates the methylation status. The green color represents hypomethylation; red is hypermethylation; blue represents no methylation. More details related to analyses and tools can be found in tooltips when moving the mouse over the bars. Reference:
(1) Cedoz, P.L., Prunello, M., Brennan, K. and Gevaert, O. (2018) MethylMix 2.0: an R package for identifying DNA methylation genes. Bioinformatics, 34, 3044-3046.
(2) Silva, T.C., Coetzee, S.G., Gull, N., Yao, L., Hazelett, D.J., Noushmehr, H., Lin, D.C. and Berman, B.P. (2019) ELMER v.2: an R/Bioconductor package to reconstruct gene regulatory networks from DNA methylation and transcriptome profiles. Bioinformatics, 35, 1974-1977. 3.4.2. Visualization of each cancer types This graph provides a combination of scatter plot and boxplots to show a more detailed view of the methylation distribution and correlation in each cancer type. Select the desired cancer type on table (A) to generate the corresponding graph on the right (B). Beta values (β) are the ratio of intensities between methylated and unmethylated alleles (D). β are ranged from zero to one, 0 being unmethylated, and 1 fully methylated.
As always, the tooltips provide information, such as maximum, minimum, median, q1 and q3 when moving mouse over (C). 3.4.3. Methylation Table The gene is analyzed by two methylation tools: methylmix and ELMER for different cancer types/datasets. Here is the summarized information, including Cancer type abbreviation, Gene symbol, ENSG, statistic data: (1) MethylMix - Hyper percent, Hypo percent, None percent, (2) ELMER - Probe, Distance, Adjust p-value, Methylation type, (3) Beta-value vs. expression - Spearman correlation coefficient, Spearman p-value.

3.5. Gene Survival The Gene Survival function provides visualizations to illustrate the survival probability of a user-selected gene across multiple cancer types. 3.5.1. Single Gene Survival This table provides detailed survival information of a single gene and also serves as a control panel to produce the Kaplan-Meier plot for visualization. Select the desired cancer type on the table to generate a corresponding Kaplan-Meier plot below. Cancer types are abbreviated according to TCGA shown as below. We also provide 4 types of survival endpoint, including OS (Overall survival), DFI (disease-free interval), PFI (progression-free interval), and DSS (disease-specific survival). The cutoff value is the stratified method, which can be either mean or median for analyses and plots. 3.5.2. Kaplan-Meier plot By selecting the cancer type in the table above, the patient samples are stratified into 2 groups: samples with highly expressed genes (red) and samples with lowly expressed genes (green). Y-axis is survival probability. The X-axis is the months of survival period. Log-Rank P-value and Hazard Ration are provided on the top of plots. 3.5.3. Synergistic effect The Gene Survival network illustrates the synergistic effect of a user-selected gene and its related genes. The synergistic effect is defined by the HR of the user-selected gene and its related genes. If the HR of two genes is greater than 1.5 folds of each, these two genes are identified to have a synergistic effect, and are further divided into 2 levels – 1.5 folds and 2 folds; each with a positive and negative direction of HR. Toggle the options within Gene databases, Synergistic effect, and Direction of HR to filter the network based on selected conditions. To search for specific networks of genes, type gene name in the search box on the top left corner. 3.5.4. Survival of Synergistic effect This section provides all gene pairs (a user-selected genes and its related genes) with HR fold change greater than each single gene in both directions even though only more than 1.5 fold changes are considered to have a synergistic effect between them. For each gene pair, the patients are stratified to two groups based on their expression median. The groups of patients are then compared on their 5-year overall survival (OS) probability over months, as shown in the plots.
A table of cancer types with gene pairs is provided for each direction of HR (>1 or <1). Select the desired gene pairs on the table to generate the corresponding survival plots on the right. Patients are stratified to groups for comparison:
All.high (purple) = patients with high expression in both gene1 and gene2
All.low (red) = patients with low expression in both gene 1 and gene 2
Low.high (green) = patients with low expression in gene 1 but high expression in gene2
High.low (blue) = patients with high expression in gene 1 but low expression in gene 2
Other = patients with other combinations rather than all high
As always, moving the mouse over the survival line, the details of months and survival probabilities will be shown in tooltips. Hazard ratio and Log-Rank P value are also provided on the top of the graphs.

3.6. Gene miRNA The Gene miRNA function provides visualizations to illustrate the relations between user-selected genes and miRNAs across multiple cancer types. 3.6.1. miRNA-gene network The Gene-miRNA network illustrates the relations between a user-selected gene and miRNAs across multiple cancer types. The interactions are defined by 12 prediction tools, including DIANA-microT (2), MicroT4 (3), miRBridge (4), miRDB (5), miRMap (6), PITA (7), RNAhybrid (8), TargetScan (9), PICTAR2 (10), RNA22 (11), miRWalk (12) and miRanda (13), or experimental validations recorded in miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/php/index.php). We incorporated the data from our previous database, YM500v3 (9), to DriverDBv3 to establish the relations with negative correlation coefficients between driver genes and miRNAs. By selecting the validated option, the experimental validation network will be shown in the plot. Validated relations are represented by solid lines; predicted relations are represented by dotted lines. The width of lines indicates the number of cancers supporting the relations of the user-selected gene and miRNAs. Predicted relations can be further filtered by a minimum of 6, 8, or 10 tools. The search bar on the left can directly input the name of gene. By directly clicking on the node, specific networks related to the selected gene can be viewed individually. The green node represents gene. The yellow node represents miRNA. 3.6.2. Gene-miRNA table The Gene-miRNA table provides detailed analytic information that comprises Cancer type abbreviation, miRNA, Gene symbol, ENSG, Validated, Number of tools, Pearson correlation coefficient, Pearson p-value, Spearman correlation coefficient, Spearman p-value, Kendall correlation coefficient, Kendall p-value.

4. Section Three - Customized Analysis

This function assists researchers to investigate and select well-defined cancer samples based on one or multiple clinical criteria, and the selection panel for user-defined samples allows to filter genes, dataset and clinical criterion. The specific group of patients can be analyzed by ‘Driver Gene Identification’ or by ‘Survival Analysis’.

4.1. Survival Analysis Survival Analysis allows researchers to investigate the co-occurring events affecting patient survival by entering single or more targets and defining the subgroup of specific patients. We offer two analysis methods, Survival analysis according to MUTATION and Survival analysis according to EXPRESSION. In addition, we provide three types of stratification methods (All.high vs. Others; High vs. Low; By num. of high) in Survival by EXPRESSION and two types (Mutation vs. Wild type; By num. of mutant gene) in Survival by MUTATION. We also provide four categories of survival endpoint, including overall survival (OS), progression-free interval (PFI), disease-free interval (DFI) and disease-specific survival (DSS), and other critical survival analysis related factors in this function for users to select. 4.1.1. Survival analysis according to MUTATION In this section, a series of customized factors for analyses can be chosen, leading to the results of survival analysis according to MUTATION of single/multiple gene(s) in user-defined samples.
Firstly, up to 5 genes can be typed in the left column. By pressing Check, invalid genes will be removed, and only valid genes will be subsequently analyzed. Then, click NEXT to select a dataset from the drop down menu. Click NEXT to select various types of clinical criteria from drop-down menu. After criteria were chosen, advanced options will appear on the right columns. Users may then click on the options to filter samples (must include at least 20 patients for a successful calculation). More criteria can be added by clicking More Criteria. After samples are selected, click NEXT to enter the result page. After submitting the final quest, users will receive a notification email with a Result ID. This allows users to explore the results of ‘Customized-analysis’ in the ‘Result and Download’ section when the calculation is completed. 4.1.1.1. Results Page Navigation This is the result page of Survival analysis by MUTATION. Base on the input and the selected criteria, we offer a series of Survival analysis, such as Mutation OncoPrint, Survival analysis, Survival table, Kaplan-Meier plot.
Mutation oncoprint introduces the co-occurrence status of the user-assigned gene. The mutation that has a high impact is colored in red. The moderate-impact mutation is colored in blue. Area A that represents samples with both gene mutations appears leftmost in the bar. Area B and C display the proportion of samples that only have one mutation. Area D indicates the summary of co- occurrence. Area E shows the percentage of the impact level of each gene.
The Survival analysis section provides two stratification methods: mutation v.s wildtype and By num. of mutant gene; as well as interval: 5 years and All. In the final two columns of the report, four categories of clinical endpoints (OS, PFI, DFI and DSS) are also presented in the Survival table and Kaplan-Meier plot. Note that the Kaplan-Meier plot cannot be generated if not enough samples/events (either ‘sample’ or ‘events’ does not exceed 0) were selected for the analysis.
4.1.2. Survival analysis according to EXPRESSION With the functions of Survival analysis by expression, users may investigate the co- expression events affecting patient survival by entering more than one targets and defining the subgroup of specific patients. When more than one gene are selected for survival analysis by expression, three stratification methods are available (all high vs others, high vs low, num. of high). For more details of the operation, please refer to section 4.1.1 Survival analysis according to MUTATION . 4.1.2.1. Results Page Navigation This is the result page of Survival analysis by expression. In this section, we offer diverse selected criteria for Survival analysis, such as cutoff (mean, median), stratification method (All high vs others, High vs low, and Num. of high), and interval (5 years, All). The analytical report consists of four categories of clinical endpoints (OS, PFI, DFI and DSS) for Survival table and Kaplan-Meier plot. Note that the Kaplan-Meier plot cannot be generated if not enough samples/events (either ‘sample’ or ‘events’ does not exceed 0) were selected for the analysis. Expression boxplot of the selected genes shows the genes that users input on the first page. The Y-axis is the expression level of selected genes. Hover the mouse over the dots on the plot to view the exact values of expression level.

4.2. DRIVER GENE identification Driver gene identification, which was the “Meta-analysis” function in the previous DriverDBv2 (2), provides visualizations similar with those in ‘The Cancer Mutation’. This function offers visualizations to illustrate the mutation drivers identified by bioinformatics tools in user-defined samples.
Firstly, choose a dataset of cancer type. Then enter preferable clinical criteria, such as pathologic stage, gender, stopped smoking year, and residual tumor. Lastly, press ‘Next’ to enter the result page. As with the operational steps in other customized analysis functions, after submitting the final quest, users will receive a notification email with a Result ID. This allows users to explore the results of ‘Customized-analysis’ in the ‘Result and Download’ tab when the calculation is completed. 4.2.1.1. Results Page Navigation This is the result page of Driver gene identification by mutation profile of a user- defined sample. On the top of the result page, a summary of the user selected samples is shown. Within the report, we provide ‘DRIVER GENE by mutation tools’ and ‘Annotation’, which includes Gene Ontology, Pathway, and ‘Protein/Genetics Interaction’. DRIVER GENE by mutation tools provides ‘Visualization of TOP30 mutation drivers’, ‘mutation driver defined by tools’, and a summary of driver gene tools. On the left-hand side, the figure of visualization of Top 30 genes (A), the x-axis represents samples of cancer patients; the y-axis lists the top 30 genes. The percentages shown on the left of the plot represents the total percentage of mutation in the samples for each mutation drivers (B). The bar charts on the top (C) and right (D) calculate the total of mutation occurrences by column (each sample) and by row (each mutation driver), respectively. The impact is categorized as ‘High’, ‘Moderate’, ‘Low’, ‘Modifier’. This is just pre-defined categories to help users find more significant variants, not to predict which variant is the one producing a phenotype of interest.
The right part shows the driver genes that were identified by tools. The mutation drivers that were defined by more tools indicated higher confidence. The Mutation summary table underlines the driver genes and the tools that identified them. If users click the titles of columns (E), the table will be sorted by descending or ascending order. As an example, only TP53 is identified by 14 tools, and the names of tools are listed on the rightmost column (F). As for the interpretation of the ‘Annotation’ section, Gene Ontology, Pathway, and ‘Protein/Genetics Interaction’,
please see section 2.1.3. Functional Annotation.

5. Section Four - Download

DriverBDv3 provides three summaries of driver genes for users to download: Mutation drivers defined by 15 mutation tools in various cancers, CNV drivers defined by both 2 tools in various cancers, and methylation drivers defined by both 2 tools in various cancers.

6. Section Fifth - Reference


  1. Cheng W-C, Chung I, Chen C-Y, Sun H-J, Fen J-J, Tang W-C, et al. DriverDB: an exome sequencing database for cancer driver gene identification. Nucleic acids research. 2014;42(D1):D1048-D54.
  2. Chung I-F, Chen C-Y, Su S-C, Li C-Y, Wu K-J, Wang H-W, et al. DriverDBv2: a database for human cancer driver gene research. Nucleic acids research. 2015;44(D1):D975-D9.
  3. Cumbo F, Fiscon G, Ceri S, Masseroli M, Weitschek E. TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas. BMC bioinformatics. 2017;18(1):6.
  4. Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic acids research. 2015;44(8):e71-e.
  5. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. A census of human cancer genes. Nature reviews cancer. 2004;4(3):177.
  6. Repana D, Nulsen J, Dressler L, Bortolomeazzi M, Venkata SK, Tourna A, et al. The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. Genome biology. 2019;20(1):1.
  7. Lai Y-P, Wang L-B, Wang W-A, Lai L-C, Tsai M-H, Lu T-P, et al. iGC—an integrated analysis package of gene expression and copy number alteration. BMC bioinformatics. 2017;18(1):35.
  8. Alvarez MJ, Chen JC, Califano A. DIGGIT: a Bioconductor package to infer genetic variants driving cellular phenotypes. Bioinformatics. 2015;31(24):4032-4.
  9. Chung I-F, Chang S-J, Chen C-Y, Liu S-H, Li C-Y, Chan C-H, et al. YM500v3: a database for small RNA sequencing in human cancer research. Nucleic acids research. 2016;45(D1):D925-D31.
  10. Cheng W-C, Chung I-F, Huang T-S, Chang S-T, Sun H-J, Tsai C-F, et al. YM500: a small RNA sequencing (smRNA-seq) database for microRNA research. Nucleic acids research. 2012;41(D1):D285-D94.
  11. Cheng W-C, Chung I-F, Tsai C-F, Huang T-S, Chen C-Y, Wang S-C, et al. YM500v2: a small RNA sequencing (smRNA-seq) database for human cancer miRNome research. Nucleic acids research. 2014;43(D1):D862-D7.

Cancer Type: