COR Brief
Data & Analytics

Skrub

Skrub is an open-source Python library designed for data preprocessing within machine learning pipelines that utilize dataframes. It extends popular dataframe libraries such as pandas and polars by providing high-level tools for data exploration, cleaning, and feature engineering without replacing the underlying dataframe structures. Skrub includes components like TableReport for generating data exploration reports, Cleaner for data sanitization, and TableVectorizer for feature engineering tasks. Additionally, it supports complex multi-table scenarios through the MultiTableTransformer, which facilitates pipeline building and validation across multiple dataframes, including hyperparameter tuning. The library targets data scientists and machine learning practitioners who work with Python dataframes and require preprocessing building blocks common in ML workflows. Skrub emphasizes customization through parameters and column selectors, allowing users to tailor transformations to their datasets. It is available for free and can be installed via pip, integrating smoothly into existing pandas or polars workflows.

Updated Jan 15, 2026open-source

Skrub is a free, open-source Python library that enhances dataframe-based machine learning preprocessing with tools for exploration, cleaning, feature engineering, and multi-table pipeline validation.

Pricing
open-source
Category
Data & Analytics
Company
Interactive PresentationOpen Fullscreen ↗
01
Generates comprehensive data exploration reports from dataframes to assist in understanding dataset characteristics.
02
Performs data sanitization tasks to prepare dataframes for machine learning pipelines.
03
Handles feature engineering by transforming dataframe columns into machine learning-ready features.
04
Builds and validates preprocessing pipelines that operate across multiple dataframes, including support for hyperparameter tuning.
05
Allows users to control which columns are transformed and to tweak processing steps through configurable parameters.

Data Exploration

A data scientist needs to quickly generate a report summarizing the characteristics of a new dataset.

Multi-Table Pipeline Validation

A machine learning practitioner works with multiple related dataframes and requires a validated preprocessing pipeline with hyperparameter tuning.

1
Install Skrub
Run pip install skrub to install the library.
2
Import Modules
Import needed components, for example: from skrub import TableReport, Cleaner.
3
Generate Data Exploration Report
Create a report with report = TableReport(df).render() where df is your dataframe.
4
Build Preprocessing Pipeline
Chain Cleaner and TableVectorizer to clean and engineer features from your dataframe.
5
Validate Multi-Table Pipelines
Use MultiTableTransformer to build and validate pipelines involving multiple dataframes with hyperparameter tuning.
📊

Strategic Context for Skrub

Get weekly analysis on market dynamics, competitive positioning, and implementation ROI frameworks with AI Intelligence briefings.

Try Intelligence Free →
7 days free · No credit card
Pricing
Model: open-source

Skrub is free to use and open-source.

Assessment
Strengths
  • Integrates with pandas and polars without replacing them, fitting into existing workflows.
  • Provides end-to-end tools covering data exploration, cleaning, and feature engineering.
  • Supports complex multi-table preprocessing pipelines with hyperparameter tuning.
  • Offers high customization through column selectors and parameters.
Limitations
  • Limited to dataframe-based machine learning preprocessing; does not support low-level array operations.
  • Requires user familiarity with pandas or polars to use effectively.