Data & Analytics

Skrub

Skrub is an open-source Python library designed for data preprocessing within machine learning pipelines that utilize dataframes. It extends popular dataframe libraries such as pandas and polars by providing high-level tools for data exploration, cleaning, and feature engineering without replacing the underlying dataframe structures. Skrub includes components like TableReport for generating data exploration reports, Cleaner for data sanitization, and TableVectorizer for feature engineering tasks. Additionally, it supports complex multi-table scenarios through the MultiTableTransformer, which facilitates pipeline building and validation across multiple dataframes, including hyperparameter tuning. The library targets data scientists and machine learning practitioners who work with Python dataframes and require preprocessing building blocks common in ML workflows. Skrub emphasizes customization through parameters and column selectors, allowing users to tailor transformations to their datasets. It is available for free and can be installed via pip, integrating smoothly into existing pandas or polars workflows.

Updated Jan 15, 2026open-source

Visit Skrub ↗Visual Guide

Overview

Skrub is a free, open-source Python library that enhances dataframe-based machine learning preprocessing with tools for exploration, cleaning, feature engineering, and multi-table pipeline validation.

Pricing

open-source

Data Exploration

A data scientist needs to quickly generate a report summarizing the characteristics of a new dataset.

Multi-Table Pipeline Validation

A machine learning practitioner works with multiple related dataframes and requires a validated preprocessing pipeline with hyperparameter tuning.

Quick Start

Install Skrub

Run pip install skrub to install the library.

Import Modules

Import needed components, for example: from skrub import TableReport, Cleaner.

Generate Data Exploration Report

Create a report with report = TableReport(df).render() where df is your dataframe.

Build Preprocessing Pipeline

Chain Cleaner and TableVectorizer to clean and engineer features from your dataframe.

Validate Multi-Table Pipelines

Use MultiTableTransformer to build and validate pipelines involving multiple dataframes with hyperparameter tuning.

📊

Strategic Context for Skrub

Get weekly analysis on market dynamics, competitive positioning, and implementation ROI frameworks with AI Intelligence briefings.

Try Intelligence Free →

7 days free · No credit card

Assessment

Strengths

Integrates with pandas and polars without replacing them, fitting into existing workflows.
Provides end-to-end tools covering data exploration, cleaning, and feature engineering.
Supports complex multi-table preprocessing pipelines with hyperparameter tuning.
Offers high customization through column selectors and parameters.

Limitations

Limited to dataframe-based machine learning preprocessing; does not support low-level array operations.
Requires user familiarity with pandas or polars to use effectively.