About

mantis-ml follows and Automated Machine Learning (AutoML) approach for feature extraction (relevant to the disease of interest), feature compilation and pre-processing. The processed feature table is then fed to the main algorithm that lies in the core of mantis-ml: a stochastic semi-supervised learning approach that ranks genes based on their average performance in out-of-bag sets across random balanced datasets from the entire gene pool.

A set of standard classifiers are used as part of the semi-supervised learning task:

  • Random Forests (RF)
  • Extremely Randomised Trees (Extra Trees; ET)
  • Gradien Boosting (GB)
  • Extreme Gradient Boosting (XGBoost)
  • Support Vector Classifier (SVC)
  • Deep Neural Net (DNN)
  • Stacking (1st layer: RF, ET, GB, SVC; 2nd layer: DNN)
Following the semi-supervised learning step of mantis-ml, predictions may be overlapped with results from cohort-level rare-variant association studies (see Validation with WES).

The final consensus results can then been visualised using a set of dimensionality reduction techniques:
  • PCA (Principal Component Analysis)
  • t-SNE (t-distributed Stochastic Neighbor Embedding)
  • UMAP (Uniform Manifold Approximation and Projection)