Key Features

What you can do

Multi-Cloud and On-Premises Support

Supports running AI workloads on over 20 cloud providers and Kubernetes clusters without rewriting job configurations.

Jobs as Code

Users define environments and jobs in YAML or via CLI, enabling portable and reproducible execution across infrastructures.

Automated Resource Management

Automates compute selection, provisioning, and management including spot/preemptible instance usage for cost efficiency.

Job Queuing and Auto-Recovery

Manages multiple jobs with queuing, running, and automatic recovery to handle failures without manual intervention.

Modular Installation

Installable via pip with cloud-specific extras to include only needed providers, e.g., `skypilot[kubernetes,aws,gcp]`.