Explore our credit programs for startups and researchers
Serverless

Bring your code, we’ll handle the hardware.

Skip the infra headaches. Our auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating.

Bring your container.

Deploy any container with full control and flexibility.

Network storage.

Persistent, high-speed storage that scales with your workloads.

Global regions.

Deploy closer to your users with low-latency regions worldwide.
Features

Effortlessly scale
AI inference.

When every element clicks, deploying, scaling, and optimizing becomes pure magic.

Flexible runtimes.

Run AI/ML workloads with support for a wide range of languages, frameworks, and custom configurations.

Zero cold starts.

Pre-warmed functions guarantee an immediate
response, eliminating all initial latency delays.

Create an endpoint fast.

Deploy pre-built AI templates and spin up
your own custom endpoint instantly.

Deploy with GitHub.

Push to GitHub, auto-release to your endpoint. Rollback anytime with ease.
Serverless Pricing

Cost effective for every inference workload.

Save 15% over other Serverless cloud
providers on flex workers alone.

GPU

Per second

Per hour

Flex
Active
$0
$0
80GB
H100
Extreme throughput for big models.
$0
$0
80GB
A100
High throughput GPU, yet still very cost-effective.
$0
$0
48GB
L40, L40S, 6000 Ada
Extreme inference throughput on LLMs like Llama 3 7B.
$0
$0
48GB
A6000, A40
A cost-effective option for running big models.
$0
$0
24GB
4090
Extreme throughput for small-to-medium models.
$0
$0
24GB
L4, A5000, 3090
Great for small-to-medium sized inference workloads.
$0
$0
16GB
A4000, A4500, RTX 4000
The most cost-effective for small models.
FAQs

Questions? Answers.

Serverless, simplified. Clear answers on
running your code without the fuss.
What sets RunPod’s serverless apart from other platforms?
RunPod’s serverless GPUs eliminate cold starts with always-on, pre-warmed instances, ensuring low-latency execution. Unlike traditional serverless solutions, RunPod offers full control over runtimes, persistent storage options, and direct access to powerful GPUs, making it ideal for AI/ML workloads.
What programming languages and runtimes are supported?
RunPod supports Python, Node.js, Go, Rust, and C++, along with popular AI/ML frameworks like PyTorch, TensorFlow, JAX, and ONNX. You can also bring your own custom runtime via Docker containers, giving you full flexibility over your environment.
How does RunPod reduce cold-start delays?
RunPod uses active worker pools and pre-warmed GPUs to minimize initialization time. Serverless instances remain ready to handle requests immediately, preventing the typical delays seen in traditional cloud function environments.
How are deployments and rollbacks managed?
RunPod allows deployments directly from GitHub, with one-click launches for pre-configured templates. For rollback management, you can revert to previous container versions instantly, ensuring a seamless and controlled deployment process.
How does RunPod handle event-driven workflows?
RunPod integrates with webhooks, APIs, and custom event triggers, enabling seamless execution of AI/ML workloads in response to external events. You can set up GPU-powered functions that automatically run on demand, scaling dynamically without persistent instance management.
What tools are available for monitoring and debugging?
RunPod offers a comprehensive monitoring dashboard with real-time logging and distributed tracing for your serverless functions. Additionally, you can integrate with popular APM tools for deeper performance insights and efficient debugging.
Clients

Trusted by today's leaders, built for tomorrow's pioneers.

Engineered for teams building the future.

7,035,265,000

Requests since launch & 300k developers worldwide

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning
models—ready when you are.