Gene Expression Analysis of Project TCGA-CHOL

Qiong Liu

April 7th, 2022

Cholangiocarcinoma (CCA) is aggressive cancer found in the slender tubes that carry the digestive fluid bile through the liver. The Cancer Genome Atlas (TCGA) program contains abundant molecular profilings of over 20,000 primary cancer and matched normal samples spanning 33 cancer types. In this notebook, we demonstrated how to retrieve RNA expression data of project TCGA-CHOL from Genomic Data Commons (GDC) data portal, and perform data analysis and visualization using a pipeline provided by an R package GDCRNATools.

This pipeline was modified based on the manual of GDCRNATools.

References


Contents

Data Preparation

Import R packages

Download GDC data transfer tool

The R package of GDCRNATools uses gdc-client data transfer to download the object file. Run the command below to download and unzip gdc-client.

Data download

Data cleanup

Differential expression analysis

Here, we use RNA-seq quantification data as an example to perform differential gene expression analysis (DE)using GDCRNATools package. The method we're using here is DESeq2, which uses the raw counts and models the normalization inside the Generalized Linear Model (GLM). Users have option to choose other DE analysis tools, including edgeR and limma.

DE analysis visualization

Functional enrichment analysis

The method of gdcEnrichAnalysis is able to take the output of gdcDEReport as input and perform gene enrichment analysis.

Univariate survival analysis