-
Notifications
You must be signed in to change notification settings - Fork 112
Support llama.cpp's cache_n in timings info #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a new cached token metric across proxy and UI: parses cache_n from response timings, stores it in TokenMetrics as CachedTokens/cache_tokens, records it in middleware, and displays it in Activity with new columns and relative time formatting. No control-flow or error-handling changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant UI as Activity Page (UI)
participant API as Proxy
participant MM as Metrics Middleware
participant Mon as Metrics Monitor
User->>UI: Open Activity
UI->>API: GET /metrics
API->>MM: Handle request
Note over MM: Parse response timings<br/>Read cache_n → cachedTokens (-1 if absent)
MM->>Mon: Record TokenMetrics{..., CachedTokens}
Mon-->>API: Metrics snapshot (includes cache_tokens)
API-->>UI: JSON metrics list
UI->>UI: Render table: Cached, Prompt, Generated, Speed, Duration, Time (relative)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
proxy/metrics_monitor.go (1)
16-16
: Document sentinel semantics for CachedTokensClarify that -1 means “unknown” to keep API consumers aligned and avoid misinterpretation.
type TokenMetrics struct { ID int `json:"id"` Timestamp time.Time `json:"timestamp"` Model string `json:"model"` - CachedTokens int `json:"cache_tokens"` + // CachedTokens is the number of prompt tokens served from the KV cache. -1 = unknown. + CachedTokens int `json:"cache_tokens"` InputTokens int `json:"input_tokens"` OutputTokens int `json:"output_tokens"` PromptPerSecond float64 `json:"prompt_per_second"` TokensPerSecond float64 `json:"tokens_per_second"` DurationMs int `json:"duration_ms"` }ui/src/contexts/APIProvider.tsx (1)
31-31
: Annotate unknown sentinel oncache_tokens
Backend uses -1 for unknown; add a short note so callers don’t assume non-negative only.
model: string; - cache_tokens: number; + // -1 => unknown (timings not provided by upstream) + cache_tokens: number;proxy/metrics_middleware.go (1)
75-76
: Also accept upstream’s legacytokens_cached
(back-compat) and keepcache_n
Several llama.cpp server responses expose a top-level
tokens_cached
instead oftimings.cache_n
. Parse both to cover older and newer servers. Examples withtokens_cached
are present in real server outputs. (github.com)// default values - cachedTokens := -1 // unknown or missing data + cachedTokens := -1 // unknown or missing data @@ - if cachedValue := jsonData.Get("timings.cache_n"); cachedValue.Exists() { - cachedTokens = int(cachedValue.Int()) - } + // newer: timings.cache_n ; older: top-level tokens_cached + if cachedValue := jsonData.Get("timings.cache_n"); cachedValue.Exists() { + cachedTokens = int(cachedValue.Int()) + } else if cachedValue := jsonData.Get("tokens_cached"); cachedValue.Exists() { + cachedTokens = int(cachedValue.Int()) + } } + + // if timings block was absent, still try legacy top-level field + if cachedTokens < 0 { + if cachedValue := jsonData.Get("tokens_cached"); cachedValue.Exists() { + cachedTokens = int(cachedValue.Int()) + } + } @@ rec.metricsMonitor.addMetrics(TokenMetrics{ Timestamp: time.Now(), Model: rec.realModelName, CachedTokens: cachedTokens, InputTokens: inputTokens,Also applies to: 97-100, 102-111
ui/src/pages/Activity.tsx (3)
18-20
: Align copy with intent (“just now”)Text says “just now” in comment; return value was “now”. Low-risk UX consistency fix.
- if (diffInSeconds < 5) { - return "now"; - } + if (diffInSeconds < 5) { + return "just now"; + }
79-81
: Show 0 cached tokens explicitly; reserve “-” for unknownDistinguishes “no cache hits” (0) from “not reported” (-1).
- {metric.cache_tokens > 0 ? metric.cache_tokens.toLocaleString() : "-"} + {metric.cache_tokens >= 0 ? metric.cache_tokens.toLocaleString() : "-"}
101-119
: Make Tooltip keyboard-accessible and screen-reader friendlyAdd focus handling and ARIA to improve a11y without changing visuals.
-const Tooltip: React.FC<TooltipProps> = ({ content }) => { +const Tooltip: React.FC<TooltipProps> = ({ content }) => { return ( - <div className="relative group inline-block"> + <div className="relative group inline-block" tabIndex={0} aria-label={content}> ⓘ <div - className="absolute top-full left-1/2 transform -translate-x-1/2 mt-2 - px-3 py-2 bg-gray-900 text-white text-sm rounded-md - opacity-0 group-hover:opacity-100 transition-opacity - duration-200 pointer-events-none whitespace-nowrap z-50 normal-case" + className="absolute top-full left-1/2 transform -translate-x-1/2 mt-2 + px-3 py-2 bg-gray-900 text-white text-sm rounded-md + opacity-0 group-hover:opacity-100 group-focus:opacity-100 transition-opacity + duration-200 pointer-events-none whitespace-nowrap z-50 normal-case" > {content}
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
proxy/metrics_middleware.go
(2 hunks)proxy/metrics_monitor.go
(1 hunks)ui/src/contexts/APIProvider.tsx
(1 hunks)ui/src/pages/Activity.tsx
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
ui/src/contexts/APIProvider.tsx (1)
ui/src/pages/Models.tsx (5)
totalRequests
(136-145)StatsPanel
(133-173)sum
(143-143)sum
(141-141)sum
(142-142)
proxy/metrics_middleware.go (1)
proxy/metrics_monitor.go (1)
TokenMetrics
(12-22)
ui/src/pages/Activity.tsx (1)
ui/src/pages/Models.tsx (3)
StatsPanel
(133-173)totalRequests
(136-145)sum
(143-143)
Capture prompt cache metrics and surface them on Activities page in UI
* Add a config editor page * Support llama.cpp's cache_n in timings info (mostlygeek#287) Capture prompt cache metrics and surface them on Activities page in UI * Fix mostlygeek#288 Vite hot module reloading creating multiple SSE connections (mostlygeek#290) - move SSE (EventSource) connection to module level - manage EventSource as a singleton, closing open connection before reopening a new one * Add model name copy button to Models UI --------- Co-authored-by: Benson Wong <[email protected]>
Add support for llama.cpp's new prompt metrics (ggml-org/llama.cpp#15827).
Summary by CodeRabbit
New Features
Improvements
Chores