Scanpy
Scanpy is a Python-based toolkit designed for scalable analysis of single-cell gene expression data. It supports datasets exceeding one million cells and integrates tightly with the anndata data structure for efficient data handling. The toolkit offers a comprehensive suite of functionalities including preprocessing, visualization, clustering, trajectory inference, and differential expression testing. Visualization options include embeddings such as PCA, t-SNE, UMAP, force-directed graph drawing, and diffusion maps. Clustering methods include Leiden and hierarchical clustering, while trajectory inference is performed via geodesic distances along graphs. Scanpy also supports marker gene analysis, gene scoring, cell cycle scoring, and simulation of dynamic gene expression data. The project is actively maintained with 94 releases to date, the latest being version 1.11.5 released in October 2025. It is open-source under the BSD-3-Clause license and supported by a community of 157 contributors. Scanpy can be installed via pip or conda, with some features requiring additional dependencies such as leidenalg and python-igraph. The toolkit is part of a broader ecosystem including related tools like Squidpy for spatial data and Muon for multimodal single-cell data.
Scanpy is an open-source Python toolkit for scalable single-cell gene expression data analysis supporting datasets over one million cells.
Single-Cell Transcriptomics Analysis
Researchers analyzing large-scale single-cell RNA sequencing data to identify cell populations and gene expression patterns.
Trajectory and Developmental Pathway Inference
Studying cellular differentiation and lineage trajectories using graph-based trajectory inference methods.
pip install 'scanpy[leiden]' or via conda with conda install -c conda-forge scanpy python-igraph leidenalg.import scanpy as sc.sc.pp.normalize_total) and log transformation (sc.pp.log1p).sc.tl.leiden) and visualize results (e.g., sc.pl.umap).