Distributed GPU Compute for AI — Powered by QuanticCompute
Real RTX 5090/4090 capacity. Lower cost than hyperscalers. Simple API. Run inference or custom containers on demand with predictable performance.
Why LocalGPU
- Transparent hourly pricing
- BYO Docker image
- Autoscale across nodes
- Low-latency routing
Features
On‑Demand 5090/4090
Top‑tier consumer GPUs available in minutes for inference and fine‑tuning.
Distributed Inference
Sharded or batched loads across nodes for steady, predictable throughput.
API‑First
Simple REST endpoints. Queue jobs, fetch logs, and retrieve results.
Encrypted Containers
Customer images pulled securely with scoped tokens, zero-trust by default.
Usage Transparency
Per‑job metrics: latency, tokens/s, VRAM, GPU util, and cost.
Vast.ai Compatible
Optionally burst to external capacity pools while keeping cost controls.
Pricing
Starter
From $1.49/hr
- RTX 4090: $1.49/hr
- RTX 5090: $2.49/hr
- Per‑second billing (1‑min minimum)
- Email support
Pro
From $1.29/hr
- 4090 reserved: $1.29/hr
- 5090 reserved: $2.19/hr
- Dedicated 1–4 GPUs, priority queueing
- 99.9% availability
Enterprise
Custom
- Volume & SLA pricing
- Hybrid on‑prem + cloud
- Private region & VPC peering
Indicative public rates. Final pricing varies by region, commitment, and availability. Billed per‑second with a 1‑minute minimum.
Docs
REST quick start
Replace YOUR_API_KEY and values as needed.
curl -X POST \
https://api.localgpu.ai/v1/jobs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"image": "nvcr.io/nvidia/pytorch:24.05-py3",
"command": ["python","serve.py"],
"resources": {"gpu": 1, "vram_gb": 24},
"env": {"MODEL": "llama3-8b"}
}'
# Poll job status
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://api.localgpu.ai/v1/jobs/JOB_ID
# Stream logs (if enabled)
curl -N -H "Authorization: Bearer YOUR_API_KEY" \
https://api.localgpu.ai/v1/jobs/JOB_ID/logs
Python
import os, requests
API = "https://api.localgpu.ai/v1"
KEY = os.getenv("LOCALGPU_API_KEY", "YOUR_API_KEY")
headers = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}
job = {
"image": "nvcr.io/nvidia/pytorch:24.05-py3",
"command": ["python","serve.py"],
"resources": {"gpu": 1, "vram_gb": 24},
"env": {"MODEL": "llama3-8b"}
}
r = requests.post(f"{API}/jobs", json=job, headers=headers)
r.raise_for_status()
job_id = r.json()["id"]
status = requests.get(f"{API}/jobs/{job_id}", headers=headers).json()
print(status)
Endpoints are illustrative. Your account may use different base URLs or versions.
Tech Stack
How it works
- Submit a job (REST or dashboard)
- Scheduler assigns nodes (latency+cost)
- Container runs with isolated volumes
- Metrics & results stream back
Security
JWT‑scoped tokens, private registries, per‑job secrets, and optional customer‑managed keys.
Observability
Structured logs, per‑job metrics, and export to your stack.
FAQ
Which GPUs are available?
RTX 4090 and 5090 are our primary tiers. Contact us for A100/H100 availability in select regions.
Can I run my own Docker image?
Yes. Provide a registry URL and we will pull with short‑lived, scoped credentials. Private registries are supported.
How is usage billed?
Per‑second metering with a 1‑minute minimum. Detailed per‑job metrics and cost breakdowns are available via API.
Do you offer SLAs?
Business and Enterprise plans include availability SLAs and priority scheduling. Reach out to tailor an agreement.
Request Access
Fast track
Tell us which GPU, how many hours, and your container image. We’ll spin up capacity ASAP.
- GPU model & quantity
- Container image (optional)
- Latency/cost preferences
Replace this copy with your exact onboarding steps.