DeepType: Deep Learning Approach to Identifying Disease Subtypes from High-Dimensional Genomic Data

Cancer subtype classification has great potential value for disease diagnosis and individualized patient management. Current approaches for the derivation of molecular subtypes are limited by the influence of misleading, irrelevant factors, resulting in ambiguous and overlapping subtypes, and by their capacity to handle extremely high-dimensional data. To address the above issues, we propose a novel approach by leveraging the power of deep learning to disentangle and eliminate irrelevant factors. Specifically, we design a deep-learning framework, referred to as DeepType, that jointly performs supervised classification, unsupervised clustering and dimensionality reduction to learn a cancer-relevant data representation with cluster structure. We apply DeepType to the METABRIC breast cancer dataset and compare its performance to the state-of-the-art cancer subtyping methods. DeepType significantly outperforms the existing methods, identifying more robust subtype information by using fewer genes. The new approach provides a framework for the derivation of more accurate and robust molecular cancer subtypes from increasingly complex, multi-source data.

Source code and documentation