P95-QUEUE-001severity: highQueue depth rising while GPU utilisation remains low
query
queue_depth > 0 and gpu_util < 0.5Requests are backing up before they reach the GPU. Compute is available, so the limit is in batching, concurrency, or admission — not capacity.
inspect max concurrent requests and batch settings before adding replicas or GPUs.