Predicting recurrence in clear cell Renal Cell Carcinoma: Analysis of TCGA data using outlier analysis and generalized matrix LVQ

Gargi Mukherjee; Gyan Bhanot; Kevin Raines; Srikanth Sastry; Sebastian Doniach; Michael Biehl

doi:10.1109/CEC.2016.7743855

Back

Conference proceeding

Predicting recurrence in clear cell Renal Cell Carcinoma: Analysis of TCGA data using outlier analysis and generalized matrix LVQ

Gargi Mukherjee, Gyan Bhanot, Kevin Raines, Srikanth Sastry, Sebastian Doniach and Michael Biehl

2016 IEEE Congress on Evolutionary Computation (CEC), pp.656-661

07/2016

DOI: https://doi.org/10.1109/CEC.2016.7743855

Abstract

Biology

Biomarkers

Cancer

classification

Drugs

gene expression

learning vector quantization

mRNA-Seq

outlier analysis

Prototypes

recurrence risk

supervised learning

Training

Using mRNA-Seq and clinical data for 469 clear cell Renal Cell Carcinoma (ccRCC) samples from The Cancer Genome Atlas (TCGA), we develop a protocol to identify patients likely to have early recurrence of their disease. We first split the data into two sets, with 380 samples in the training set and 89 samples in the test set. Using the training set, we identify genes whose outlier status (high or low mRNA expression) is predictive of recurrence, based on Kaplan-Meier recurrence free survival log-rank p-value. We find a significant overlap among genes identified as predictive biomarkers in Reads per Kilobase Million (RPKM) normalized data and Raw Reads mRNA-Seq data. Using 80 consensus genes predictive in both RPKM and Raw Reads data, we define an outlier-based risk score R to stratify patients into two groups, a high-risk (early recurrence) group (R <; 2) and a low-risk (late recurrence) group (R > 2). The KM recurrence curve using this stratification shows excellent separation in training and test sets. Restricting the analysis to patients who had recurrence within two years (109 cases) and those who had no recurrence in five years (107 cases) we find that the risk predictor achieves ca. 80 percent sensitivity and specificity. The 80 genes identified by the outlier analysis were used to develop a more intuitive classifier based on Generalized Matrix Learning Vector Quantization (GMLVQ). This method stratifies samples into risk classes based on defining prototypes in feature space and an appropriate distance metric. GMLVQ identified a subset of 12 genes that have high accuracy in predicting recurrence, which suggests that an assay with a small number of genes might be able to predict recurrence in ccRCC.

Metrics

13 Record Views

Details

Title: Predicting recurrence in clear cell Renal Cell Carcinoma: Analysis of TCGA data using outlier analysis and generalized matrix LVQ
Creators: Gargi Mukherjee - Dept. of Mol. Biol. & Biochem., Rutgers Univ., Piscataway, NJ, USA
Gyan Bhanot - Dept. of Mol. Biol. & Biochem., Rutgers Univ., Piscataway, NJ, USA
Kevin Raines - Dept. of Phys., Stanford Univ., Stanford, CA, USA
Srikanth Sastry - Jawaharlal Nehru Centre for Adv. Sci. Res., Bangalore, India
Sebastian Doniach - Dept. of Phys., Stanford Univ., Stanford, CA, USA
Michael Biehl - Johann Bernoulli Inst. for Math. & Comput. Sci., Univ. of Groningen, Groningen, Netherlands
Publication Details: 2016 IEEE Congress on Evolutionary Computation (CEC), pp.656-661
Date published: 07/2016
Publisher: IEEE
Academic Unit: Molecular Biology and Biochemistry (SAS)
Language: English
Resource Type: Conference proceeding
Identifiers: 991031665006904646