Back home
Design partner program
Design partners for real inference workloads.
We're working with a small number of teams running inference on Kubernetes to validate P95 Labs in read-only mode against real telemetry, real queue behaviour, and real operational workflows.
Who this is for
- Teams running vLLM, Triton, TGI, KServe, Ray Serve, or custom runtimes
- Platform, MLOps, AI infrastructure, and SRE teams
- Teams dealing with p95/p99 latency, queue pressure, GPU utilisation issues, autoscaling lag, or manual runtime tuning
What the alpha does
Discovers Kubernetes workloads
Reads Prometheus telemetry
Stores time-series telemetry in TimescaleDB
Correlates signals using rule-based recommendations
Surfaces ranked recommendations through API and CLI
Runs read-only by default
What the alpha does not require by default
No promptsNo request bodiesNo model outputsNo training dataNo secretsNo request-path proxyingNo autonomous production mutation
What the first call looks like
The first conversation is usually 20 to 30 minutes. We discuss your inference stack, runtime, Kubernetes setup, observability tools, latency issues, GPU utilisation patterns, and how your team diagnoses incidents today. There is no expectation that you share sensitive data in the first conversation.
What a safe pilot looks like
- 1Start with development, staging, synthetic, or scoped production-like environment
- 2Review RBAC and telemetry boundaries
- 3Deploy through Helm
- 4Validate recommendations against existing dashboards
- 5Decide together whether deeper validation makes sense