Distributed GPU Compute for AI — Powered by QuanticCompute

Real RTX 5090/4090 capacity. Lower cost than hyperscalers. Simple API. Run inference or custom containers on demand with predictable performance.

API‑first Encrypted Containers SLA Options

Why LocalGPU

  • Transparent hourly pricing
  • BYO Docker image
  • Autoscale across nodes
  • Low-latency routing
*Replace placeholders with your real benchmarks.

Features

On‑Demand 5090/4090

Top‑tier consumer GPUs available in minutes for inference and fine‑tuning.

Distributed Inference

Sharded or batched loads across nodes for steady, predictable throughput.

API‑First

Simple REST endpoints. Queue jobs, fetch logs, and retrieve results.

Encrypted Containers

Customer images pulled securely with scoped tokens, zero-trust by default.

Usage Transparency

Per‑job metrics: latency, tokens/s, VRAM, GPU util, and cost.

Vast.ai Compatible

Optionally burst to external capacity pools while keeping cost controls.

Pricing

Starter

From $1.49/hr

  • RTX 4090: $1.49/hr
  • RTX 5090: $2.49/hr
  • Per‑second billing (1‑min minimum)
  • Email support
Get Started

Enterprise

Custom

  • Volume & SLA pricing
  • Hybrid on‑prem + cloud
  • Private region & VPC peering
Talk to Sales

Indicative public rates. Final pricing varies by region, commitment, and availability. Billed per‑second with a 1‑minute minimum.

Docs

REST quick start

Replace YOUR_API_KEY and values as needed.

curl -X POST \
  https://api.localgpu.ai/v1/jobs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "image": "nvcr.io/nvidia/pytorch:24.05-py3",
    "command": ["python","serve.py"],
    "resources": {"gpu": 1, "vram_gb": 24},
    "env": {"MODEL": "llama3-8b"}
  }'

# Poll job status
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://api.localgpu.ai/v1/jobs/JOB_ID

# Stream logs (if enabled)
curl -N -H "Authorization: Bearer YOUR_API_KEY" \
  https://api.localgpu.ai/v1/jobs/JOB_ID/logs

Python

import os, requests

API = "https://api.localgpu.ai/v1"
KEY = os.getenv("LOCALGPU_API_KEY", "YOUR_API_KEY")
headers = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}

job = {
  "image": "nvcr.io/nvidia/pytorch:24.05-py3",
  "command": ["python","serve.py"],
  "resources": {"gpu": 1, "vram_gb": 24},
  "env": {"MODEL": "llama3-8b"}
}

r = requests.post(f"{API}/jobs", json=job, headers=headers)
r.raise_for_status()
job_id = r.json()["id"]

status = requests.get(f"{API}/jobs/{job_id}", headers=headers).json()
print(status)

Endpoints are illustrative. Your account may use different base URLs or versions.

Tech Stack

How it works

  1. Submit a job (REST or dashboard)
  2. Scheduler assigns nodes (latency+cost)
  3. Container runs with isolated volumes
  4. Metrics & results stream back

Security

JWT‑scoped tokens, private registries, per‑job secrets, and optional customer‑managed keys.

Observability

Structured logs, per‑job metrics, and export to your stack.

FAQ

Which GPUs are available?

RTX 4090 and 5090 are our primary tiers. Contact us for A100/H100 availability in select regions.

Can I run my own Docker image?

Yes. Provide a registry URL and we will pull with short‑lived, scoped credentials. Private registries are supported.

How is usage billed?

Per‑second metering with a 1‑minute minimum. Detailed per‑job metrics and cost breakdowns are available via API.

Do you offer SLAs?

Business and Enterprise plans include availability SLAs and priority scheduling. Reach out to tailor an agreement.

Request Access

Prefer email? Write us at hello@localgpu.ai.

Fast track

Tell us which GPU, how many hours, and your container image. We’ll spin up capacity ASAP.

  • GPU model & quantity
  • Container image (optional)
  • Latency/cost preferences

Replace this copy with your exact onboarding steps.