Data shows DualPath inference, KV cache bottleneck, RDMA ease cache I/O via a second storage-to-decode path and prefill, lifting throughput, TTFT and latency.