User Guide¶
- 1. Supervised learning
- 1.1. Generalized Linear Models
- 1.1.1. Ordinary Least Squares
- 1.1.2. Ridge Regression
- 1.1.3. Lasso
- 1.1.4. Multi-task Lasso
- 1.1.5. Elastic Net
- 1.1.6. Multi-task Elastic Net
- 1.1.7. Least Angle Regression
- 1.1.8. LARS Lasso
- 1.1.9. Orthogonal Matching Pursuit (OMP)
- 1.1.10. Bayesian Regression
- 1.1.11. Logistic regression
- 1.1.12. Stochastic Gradient Descent - SGD
- 1.1.13. Perceptron
- 1.1.14. Passive Aggressive Algorithms
- 1.1.15. Robustness regression: outliers and modeling errors
- 1.1.16. Polynomial regression: extending linear models with basis functions
- 1.2. Linear and Quadratic Discriminant Analysis
- 1.3. Kernel ridge regression
- 1.4. Support Vector Machines
- 1.5. Stochastic Gradient Descent
- 1.6. Nearest Neighbors
- 1.7. Gaussian Processes
- 1.7.1. Gaussian Process Regression (GPR)
- 1.7.2. GPR examples
- 1.7.3. Gaussian Process Classification (GPC)
- 1.7.4. GPC examples
- 1.7.5. Kernels for Gaussian Processes
- 1.7.6. Legacy Gaussian Processes
- 1.8. Cross decomposition
- 1.9. Naive Bayes
- 1.10. Decision Trees
- 1.11. Ensemble methods
- 1.11.1. Bagging meta-estimator
- 1.11.2. Forests of randomized trees
- 1.11.3. AdaBoost
- 1.11.4. Gradient Tree Boosting
- 1.11.5. VotingClassifier
- 1.12. Multiclass and multilabel algorithms
- 1.13. Feature selection
- 1.14. Semi-Supervised
- 1.15. Isotonic regression
- 1.16. Probability calibration
- 1.17. Neural network models (supervised)
- 1.1. Generalized Linear Models
- 2. Unsupervised learning
- 2.1. Gaussian mixture models
- 2.2. Manifold learning
- 2.2.1. Introduction
- 2.2.2. Isomap
- 2.2.3. Locally Linear Embedding
- 2.2.4. Modified Locally Linear Embedding
- 2.2.5. Hessian Eigenmapping
- 2.2.6. Spectral Embedding
- 2.2.7. Local Tangent Space Alignment
- 2.2.8. Multi-dimensional Scaling (MDS)
- 2.2.9. t-distributed Stochastic Neighbor Embedding (t-SNE)
- 2.2.10. Tips on practical use
- 2.3. Clustering
- 2.3.1. Overview of clustering methods
- 2.3.2. K-means
- 2.3.3. Affinity Propagation
- 2.3.4. Mean Shift
- 2.3.5. Spectral clustering
- 2.3.6. Hierarchical clustering
- 2.3.7. DBSCAN
- 2.3.8. Birch
- 2.3.9. Clustering performance evaluation
- 2.4. Biclustering
- 2.5. Decomposing signals in components (matrix factorization problems)
- 2.5.1. Principal component analysis (PCA)
- 2.5.2. Truncated singular value decomposition and latent semantic analysis
- 2.5.3. Dictionary Learning
- 2.5.4. Factor Analysis
- 2.5.5. Independent component analysis (ICA)
- 2.5.6. Non-negative matrix factorization (NMF or NNMF)
- 2.5.7. Latent Dirichlet Allocation (LDA)
- 2.6. Covariance estimation
- 2.7. Novelty and Outlier Detection
- 2.8. Density Estimation
- 2.9. Neural network models (unsupervised)
- 3. Model selection and evaluation
- 3.1. Cross-validation: evaluating estimator performance
- 3.1.1. Computing cross-validated metrics
- 3.1.2. Cross validation iterators
- 3.1.2.1. K-fold
- 3.1.2.2. Stratified k-fold
- 3.1.2.3. Label k-fold
- 3.1.2.4. Leave-One-Out - LOO
- 3.1.2.5. Leave-P-Out - LPO
- 3.1.2.6. Leave-One-Label-Out - LOLO
- 3.1.2.7. Leave-P-Label-Out
- 3.1.2.8. Random permutations cross-validation a.k.a. Shuffle & Split
- 3.1.2.9. Label-Shuffle-Split
- 3.1.2.10. Predefined Fold-Splits / Validation-Sets
- 3.1.2.11. See also
- 3.1.3. A note on shuffling
- 3.1.4. Cross validation and model selection
- 3.2. Tuning the hyper-parameters of an estimator
- 3.2.1. Exhaustive Grid Search
- 3.2.2. Randomized Parameter Optimization
- 3.2.3. Tips for parameter search
- 3.2.4. Alternatives to brute force parameter search
- 3.2.4.1. Model specific cross-validation
- 3.2.4.1.1. sklearn.linear_model.ElasticNetCV
- 3.2.4.1.2. sklearn.linear_model.LarsCV
- 3.2.4.1.3. sklearn.linear_model.LassoCV
- 3.2.4.1.4. sklearn.linear_model.LassoLarsCV
- 3.2.4.1.5. sklearn.linear_model.LogisticRegressionCV
- 3.2.4.1.6. sklearn.linear_model.MultiTaskElasticNetCV
- 3.2.4.1.7. sklearn.linear_model.MultiTaskLassoCV
- 3.2.4.1.8. sklearn.linear_model.OrthogonalMatchingPursuitCV
- 3.2.4.1.9. sklearn.linear_model.RidgeCV
- 3.2.4.1.10. sklearn.linear_model.RidgeClassifierCV
- 3.2.4.2. Information Criterion
- 3.2.4.3. Out of Bag Estimates
- 3.2.4.3.1. sklearn.ensemble.RandomForestClassifier
- 3.2.4.3.2. sklearn.ensemble.RandomForestRegressor
- 3.2.4.3.3. sklearn.ensemble.ExtraTreesClassifier
- 3.2.4.3.4. sklearn.ensemble.ExtraTreesRegressor
- 3.2.4.3.5. sklearn.ensemble.GradientBoostingClassifier
- 3.2.4.3.6. sklearn.ensemble.GradientBoostingRegressor
- 3.2.4.1. Model specific cross-validation
- 3.3. Model evaluation: quantifying the quality of predictions
- 3.3.1. The scoring parameter: defining model evaluation rules
- 3.3.2. Classification metrics
- 3.3.2.1. From binary to multiclass and multilabel
- 3.3.2.2. Accuracy score
- 3.3.2.3. Cohen’s kappa
- 3.3.2.4. Confusion matrix
- 3.3.2.5. Classification report
- 3.3.2.6. Hamming loss
- 3.3.2.7. Jaccard similarity coefficient score
- 3.3.2.8. Precision, recall and F-measures
- 3.3.2.9. Hinge loss
- 3.3.2.10. Log loss
- 3.3.2.11. Matthews correlation coefficient
- 3.3.2.12. Receiver operating characteristic (ROC)
- 3.3.2.13. Zero one loss
- 3.3.3. Multilabel ranking metrics
- 3.3.4. Regression metrics
- 3.3.5. Clustering metrics
- 3.3.6. Dummy estimators
- 3.4. Model persistence
- 3.5. Validation curves: plotting scores to evaluate models
- 3.1. Cross-validation: evaluating estimator performance
- 4. Dataset transformations
- 4.1. Pipeline and FeatureUnion: combining estimators
- 4.2. Feature extraction
- 4.2.1. Loading features from dicts
- 4.2.2. Feature hashing
- 4.2.3. Text feature extraction
- 4.2.3.1. The Bag of Words representation
- 4.2.3.2. Sparsity
- 4.2.3.3. Common Vectorizer usage
- 4.2.3.4. Tf–idf term weighting
- 4.2.3.5. Decoding text files
- 4.2.3.6. Applications and examples
- 4.2.3.7. Limitations of the Bag of Words representation
- 4.2.3.8. Vectorizing a large text corpus with the hashing trick
- 4.2.3.9. Performing out-of-core scaling with HashingVectorizer
- 4.2.3.10. Customizing the vectorizer classes
- 4.2.4. Image feature extraction
- 4.3. Preprocessing data
- 4.4. Unsupervised dimensionality reduction
- 4.5. Random Projection
- 4.6. Kernel Approximation
- 4.7. Pairwise metrics, Affinities and Kernels
- 4.8. Transforming the prediction target (y)
- 5. Dataset loading utilities
- 5.1. General dataset API
- 5.2. Toy datasets
- 5.3. Sample images
- 5.4. Sample generators
- 5.5. Datasets in svmlight / libsvm format
- 5.6. The Olivetti faces dataset
- 5.7. The 20 newsgroups text dataset
- 5.8. Downloading datasets from the mldata.org repository
- 5.9. The Labeled Faces in the Wild face recognition dataset
- 5.10. Forest covertypes
- 5.11. RCV1 dataset
- 5.12. Boston House Prices dataset
- 5.13. Breast Cancer Wisconsin (Diagnostic) Database
- 5.14. Diabetes dataset
- 5.15. Optical Recognition of Handwritten Digits Data Set
- 5.16. Iris Plants Database
- 5.17. Linnerrud dataset
- 6. Strategies to scale computationally: bigger data
- 7. Computational Performance