mantis-ml Gene Prioritisation Atlas

About

mantis-ml follows and Automated Machine Learning (AutoML) approach for feature extraction (relevant to the disease of interest), feature compilation and pre-processing. The processed feature table is then fed to the main algorithm that lies in the core of mantis-ml: a stochastic semi-supervised learning approach that ranks genes based on their average performance in out-of-bag sets across random balanced datasets from the entire gene pool.

A set of standard classifiers are used as part of the semi-supervised learning task:

Random Forests (RF)
Extremely Randomised Trees (Extra Trees; ET)
Gradien Boosting (GB)
Extreme Gradient Boosting (XGBoost)
Support Vector Classifier (SVC)
Deep Neural Net (DNN)
Stacking (1st layer: RF, ET, GB, SVC; 2nd layer: DNN)

Following the semi-supervised learning step of mantis-ml, predictions may be overlapped with results from cohort-level rare-variant association studies (see Validation with WES).

The final consensus results can then been visualised using a set of dimensionality reduction techniques:

PCA (Principal Component Analysis)
t-SNE (t-distributed Stochastic Neighbor Embedding)
UMAP (Uniform Manifold Approximation and Projection)