.. _related_projects:
=====================================
Related Projects
=====================================
Below is a list of sister-projects, extensions and domain specific packages.
Interoperability and framework enhancements
-------------------------------------------
These tools adapt scikit-learn for use with other technologies or otherwise
enhance the functionality of scikit-learn's estimators.
- `sklearn_pandas `_ bridge for
scikit-learn pipelines and pandas data frame with dedicated transformers.
- `Scikit-Learn Laboratory
`_ A command-line
wrapper around scikit-learn that makes it easy to run machine learning
experiments with multiple learners and large feature sets.
- `auto-sklearn `_
An automated machine learning toolkit and a drop-in replacement for a
scikit-learn estimator
- `sklearn-pmml `_
Serialization of (some) scikit-learn estimators into PMML.
- `sklearn2pmml `_
Serialization of a wide variety of scikit-learn estimators and transformers
into PMML with the help of `JPMML-SkLearn `_
library.
Other estimators and tasks
--------------------------
Not everything belongs or is mature enough for the central scikit-learn
project. The following are projects providing interfaces similar to
scikit-learn for additional learning algorithms, infrastructures
and tasks.
- `pylearn2 `_ A deep learning and
neural network library build on theano with scikit-learn like interface.
- `sklearn_theano `_ scikit-learn compatible
estimators, transformers, and datasets which use Theano internally
- `lightning `_ Fast state-of-the-art
linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc...).
- `Seqlearn `_ Sequence classification
using HMMs or structured perceptron.
- `HMMLearn `_ Implementation of hidden
markov models that was previously part of scikit-learn.
- `PyStruct `_ General conditional random fields
and structured prediction.
- `pomegranate `_ Probabilistic modelling
for Python, with an emphasis on hidden Markov models.
- `py-earth `_ Multivariate adaptive
regression splines
- `sklearn-compiledtrees `_
Generate a C++ implementation of the predict function for decision trees (and
ensembles) trained by sklearn. Useful for latency-sensitive production
environments.
- `lda `_: Fast implementation of Latent
Dirichlet Allocation in Cython.
- `Sparse Filtering `_
Unsupervised feature learning based on sparse-filtering
- `Kernel Regression `_
Implementation of Nadaraya-Watson kernel regression with automatic bandwidth
selection
- `gplearn `_ Genetic Programming
for symbolic regression tasks.
- `nolearn `_ A number of wrappers and
abstractions around existing neural network libraries
- `sparkit-learn `_ Scikit-learn functionality and API on PySpark.
- `keras `_ Theano-based Deep Learning library.
- `mlxtend `_ Includes a number of additional
estimators as well as model visualization utilities.
- `kmodes `_ k-modes clustering algorithm for categorical data, and
several of its variations.
- `hdbscan `_ HDBSCAN and Robust Single Linkage clustering algorithms
for robust variable density clustering.
- `lasagne `_ A lightweight library to build and train neural networks in Theano.
- `multiisotonic `_ Isotonic regression on multidimensional features.
Statistical learning with Python
--------------------------------
Other packages useful for data analysis and machine learning.
- `Pandas `_ Tools for working with heterogeneous and
columnar data, relational queries, time series and basic statistics.
- `theano `_ A CPU/GPU array
processing framework geared towards deep learning research.
- `statsmodels `_ Estimating and analysing
statistical models. More focused on statistical tests and less on prediction
than scikit-learn.
- `PyMC `_ Bayesian statistical models and
fitting algorithms.
- `REP `_ Environment for conducting data-driven
research in a consistent and reproducible way
- `Sacred `_ Tool to help you configure,
organize, log and reproduce experiments
- `gensim `_ A library for topic modelling,
document indexing and similarity retrieval
- `Seaborn `_ Visualization library based on
matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
- `Deep Learning `_ A curated list of deep learning
software libraries.
Domain specific packages
~~~~~~~~~~~~~~~~~~~~~~~~
- `scikit-image `_ Image processing and computer
vision in python.
- `Natural language toolkit (nltk) `_ Natural language
processing and some machine learning.
- `NiLearn `_ Machine learning for neuro-imaging.
- `AstroML `_ Machine learning for astronomy.
- `MSMBuilder `_ Machine learning for protein
conformational dynamics time series.
Snippets and tidbits
---------------------
The `wiki `_ has more!