sklearn.model_selection.StratifiedKFold¶
- class sklearn.model_selection.StratifiedKFold(n_folds=3, shuffle=False, random_state=None)[source]¶
- Stratified K-Folds cross-validator - Provides train/test indices to split data in train/test sets. - This cross-validation object is a variation of KFold that returns stratified folds. The folds are made by preserving the percentage of samples for each class. - Read more in the User Guide. - Parameters: - n_folds : int, default=3 - Number of folds. Must be at least 2. - shuffle : boolean, optional - Whether to shuffle each stratification of the data before splitting into batches. - random_state : None, int or RandomState - When shuffle=True, pseudo-random number generator state used for shuffling. If None, use default numpy RNG for shuffling. - Notes - All the folds have size trunc(n_samples / n_folds), the last one has the complementary. - Examples - >>> from sklearn.model_selection import StratifiedKFold >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([0, 0, 1, 1]) >>> skf = StratifiedKFold(n_folds=2) >>> skf.get_n_splits(X, y) 2 >>> print(skf) StratifiedKFold(n_folds=2, random_state=None, shuffle=False) >>> for train_index, test_index in skf.split(X, y): ... print("TRAIN:", train_index, "TEST:", test_index) ... X_train, X_test = X[train_index], X[test_index] ... y_train, y_test = y[train_index], y[test_index] TRAIN: [1 3] TEST: [0 2] TRAIN: [0 2] TEST: [1 3] - Methods - get_n_splits([X, y, labels]) - Returns the number of splitting iterations in the cross-validator - split(X[, y, labels]) - Generate indices to split data into training and test set. - get_n_splits(X=None, y=None, labels=None)[source]¶
- Returns the number of splitting iterations in the cross-validator - Parameters: - X : object - Always ignored, exists for compatibility. - y : object - Always ignored, exists for compatibility. - labels : object - Always ignored, exists for compatibility. - Returns: - n_splits : int - Returns the number of splitting iterations in the cross-validator. 
 - split(X, y=None, labels=None)[source]¶
- Generate indices to split data into training and test set. - Parameters: - X : array-like, shape (n_samples, n_features) - Training data, where n_samples is the number of samples and n_features is the number of features. - y : array-like, shape (n_samples,), optional - The target variable for supervised learning problems. - labels : array-like, with shape (n_samples,), optional - Group labels for the samples used while splitting the dataset into train/test set. - Returns: - train : ndarray - The training set indices for that split. - test : ndarray - The testing set indices for that split. 
 
 
         



