mantis-ml follows and Automated Machine Learning (AutoML) approach for feature extraction (relevant to the disease of interest), feature compilation and pre-processing. The processed feature table is then fed to the main algorithm that lies in the core of mantis-ml: a stochastic semi-supervised learning approach that ranks genes based on their average performance in out-of-bag sets across random balanced datasets from the entire gene pool.
A set of standard classifiers are used as part of the semi-supervised learning task:
- Random Forests (RF)
- Extremely Randomised Trees (Extra Trees; ET)
- Gradien Boosting (GB)
- Extreme Gradient Boosting (XGBoost)
- Support Vector Classifier (SVC)
- Deep Neural Net (DNN)
- Stacking (1st layer: RF, ET, GB, SVC; 2nd layer: DNN)
Following the semi-supervised learning step of mantis-ml,
predictions may be overlapped with results from cohort-level rare-variant association studies (see
Validation with WES).
The final consensus results can then been visualised using a set of dimensionality reduction techniques:
- PCA (Principal Component Analysis)
- t-SNE (t-distributed Stochastic Neighbor Embedding)
- UMAP (Uniform Manifold Approximation and Projection)