============ Serving Benchmark Result ============
Backend: sglang
Traffic request rate: inf
Max request concurrency: 1
Successful requests: 10
Benchmark duration (s): 4.45
Total input tokens: 1972
Total input text tokens: 1972
Total input vision tokens: 0
Total generated tokens: 2784
Total generated tokens (retokenized): 2770
Request throughput (req/s): 2.25
Input token throughput (tok/s): 442.89
Output token throughput (tok/s): 625.26
Peak output token throughput (tok/s): 635.00
Peak concurrent requests: 4
Total token throughput (tok/s): 1068.16
Concurrency: 1.00
----------------End-to-End Latency----------------
Mean E2E Latency (ms): 443.32
Median E2E Latency (ms): 493.29
---------------Time to First Token----------------
Mean TTFT (ms): 21.59
Median TTFT (ms): 20.89
P99 TTFT (ms): 24.81
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 1.47
Median TPOT (ms): 1.52
P99 TPOT (ms): 1.53
---------------Inter-Token Latency----------------
Mean ITL (ms): 1.52
Median ITL (ms): 1.51
P95 ITL (ms): 1.76
P99 ITL (ms): 1.93
Max ITL (ms): 8.28
==================================================