Design partner program

Design partners for real inference workloads.

We're working with a small number of teams running inference on Kubernetes to validate P95 Labs in read-only mode against real telemetry, real queue behaviour, and real operational workflows.

Who this is for

Teams running vLLM, Triton, TGI, KServe, Ray Serve, or custom runtimes
Platform, MLOps, AI infrastructure, and SRE teams
Teams dealing with p95/p99 latency, queue pressure, GPU utilisation issues, autoscaling lag, or manual runtime tuning

What the alpha does

Discovers Kubernetes workloads

Reads Prometheus telemetry

Stores time-series telemetry in TimescaleDB

Correlates signals using rule-based recommendations

Surfaces ranked recommendations through API and CLI

Runs read-only by default

What the alpha does not require by default

No promptsNo request bodiesNo model outputsNo training dataNo secretsNo request-path proxyingNo autonomous production mutation

What the first call looks like

The first conversation is usually 20 to 30 minutes. We discuss your inference stack, runtime, Kubernetes setup, observability tools, latency issues, GPU utilisation patterns, and how your team diagnoses incidents today. There is no expectation that you share sensitive data in the first conversation.

What a safe pilot looks like

1Start with development, staging, synthetic, or scoped production-like environment
2Review RBAC and telemetry boundaries
3Deploy through Helm
4Validate recommendations against existing dashboards
5Decide together whether deeper validation makes sense

Interested in design partnership?

Reach out directly and tell us what you're running.

prajwal@p95labs.com hello@p95labs.com

Email P95 Labs