What is Skorch?
Skorch is a user-friendly library designed to bridge the gap between two popular tools: PyTorch and Scikit-learn. PyTorch is a leading framework for building deep learning models, while Scikit-learn is a versatile library used for classical machine learning models. Skorch provides a simple interface that makes PyTorch models compatible with Scikit-learn's tools, such as pipelines and model selection, thereby making it easier for developers to integrate deep learning into their existing workflows.
Key Features of Skorch
-
Scikit-learn Compatibility: By using Skorch, PyTorch models can be seamlessly incorporated into Scikit-learn's ecosystem. This compatibility allows users to apply techniques like grid search and pipelines, which are common in machine learning practices, to their neural networks.
-
Easy to Use: Skorch is designed to be as intuitive as Scikit-learn, allowing users to build, train, and evaluate models with ease. The library provides a high-level interface that abstracts the complexity of working directly with PyTorch.
-
Extensive Callback System: Skorch offers a variety of callbacks that enable advanced functionality, such as learning rate scheduling, early stopping to prevent overfitting, and checkpointing to save model states.
-
Integration with Popular Libraries: Skorch integrates with other widely-used libraries including Hugging Face for NLP tasks and GPyTorch for Gaussian Processes, expanding its relevance across various machine learning domains.
-
Support for Common Best Practices: Through features like automatic parameter freezing/unfreezing and an optional progress bar, Skorch supports many best practices in model training and development.
How to Use Skorch
Basic Example
With Skorch, users can create a neural network class using PyTorch and then use Skorch's NeuralNetClassifier
to train the model:
import numpy as np
from sklearn.datasets import make_classification
from torch import nn
from skorch import NeuralNetClassifier
X, y = make_classification(1000, 20, n_informative=10, random_state=0)
X = X.astype(np.float32)
y = y.astype(np.int64)
class MyModule(nn.Module):
def __init__(self, num_units=10, nonlin=nn.ReLU()):
super().__init__()
self.dense0 = nn.Linear(20, num_units)
self.nonlin = nonlin
self.dropout = nn.Dropout(0.5)
self.dense1 = nn.Linear(num_units, num_units)
self.output = nn.Linear(num_units, 2)
self.softmax = nn.Softmax(dim=-1)
def forward(self, X, **kwargs):
X = self.nonlin(self.dense0(X))
X = self.dropout(X)
X = self.nonlin(self.dense1(X))
X = self.softmax(self.output(X))
return X
net = NeuralNetClassifier(
MyModule,
max_epochs=10,
lr=0.1,
iterator_train__shuffle=True,
)
net.fit(X, y)
y_proba = net.predict_proba(X)
In a Pipeline
Skorch can also fit into an Scikit-learn Pipeline
, allowing preprocessing steps to be applied before model fitting:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipe = Pipeline([
('scale', StandardScaler()),
('net', net),
])
pipe.fit(X, y)
y_proba = pipe.predict_proba(X)
Using Grid Search
Skorch supports hyperparameter tuning via Scikit-learn's GridSearchCV
:
from sklearn.model_selection import GridSearchCV
net.set_params(train_split=False, verbose=0)
params = {
'lr': [0.01, 0.02],
'max_epochs': [10, 20],
'module__num_units': [10, 20],
}
gs = GridSearchCV(net, params, refit=False, cv=3, scoring='accuracy', verbose=2)
gs.fit(X, y)
print("best score: {:.3f}, best params: {}".format(gs.best_score_, gs.best_params_))
Installation
Using Conda
To install Skorch using conda with the conda-forge channel:
conda install -c conda-forge skorch
Using Pip
To install Skorch using pip:
python -m pip install -U skorch
From Source
For developers interested in the latest features or contributing to the project, installing from source offers this flexibility:
git clone https://github.com/skorch-dev/skorch.git
cd skorch
conda create -n skorch-env python=3.10
conda activate skorch-env
conda install -c pytorch pytorch
python -m pip install -r requirements.txt
python -m pip install .
Community and Resources
Skorch offers a vibrant community and support through various platforms:
- The Skorch team manages resources like documentation, examples, and source code, which can be accessed on their GitHub repository.
- For community discussions and questions, users can participate in Skorch's GitHub discussions or join the #skorch channel on the PyTorch Slack server.
In summary, Skorch simplifies the process of integrating PyTorch into Scikit-learn workflows, making it a valuable tool for both machine learning practitioners and deep learning enthusiasts.