Skip to content

Couch stats resource tracker v3 rebase main #5602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 94 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
e9c4383
Add Couch Stats Resource Tracker (CSRT)
chewbranca Jun 7, 2024
e5835fc
Remove no longer used conf_get fun
chewbranca Mar 25, 2025
d1176b2
Cleanup Dialyzer specs
chewbranca Mar 25, 2025
18d5b0f
Fix type in metric name
chewbranca Apr 8, 2025
eaf2363
Update CSRT tests for ioq parallel read changes
chewbranca Apr 8, 2025
2345968
Add csrt_logger:register_matcher
chewbranca Apr 8, 2025
776d714
Rework changes_processed vs rows
chewbranca Apr 8, 2025
57118ab
Format code
chewbranca Apr 8, 2025
ae0cb07
CI Bump..
chewbranca Apr 9, 2025
9be2a57
Create delta prior to deleting the context
chewbranca Apr 9, 2025
b9a0ccd
Updates based on PR feedback
chewbranca Apr 25, 2025
2cfcf59
Address more PR feedback
chewbranca May 7, 2025
22af987
Fix erlfmt-check
chewbranca May 7, 2025
ad60c3c
Rework and fix csrt_util init_p ini lookup tests
chewbranca May 7, 2025
389c396
Rework delta handling back to normal process_message semantics
chewbranca May 8, 2025
a1ccdbf
More cleanup
chewbranca May 8, 2025
954584b
Erlfmt rexi_tests.erl
chewbranca May 8, 2025
b04c2d8
Revert "CI Bump.."
chewbranca May 17, 2025
3d7e655
More PR cleanup
chewbranca May 17, 2025
e873eb2
More cleanup
chewbranca Jun 2, 2025
ab98716
Cleanup #rctx{} and other reworkings
chewbranca Jun 3, 2025
7f0c41d
make erlfmt-format
chewbranca Jun 3, 2025
d866a61
Cleanup csrt_query:field/2
chewbranca Jun 3, 2025
4d2a5ba
Clarify is_rctx_stat_field
chewbranca Jun 3, 2025
8604aca
Fix csrt:inc/N typespec
chewbranca Jun 4, 2025
2131fb6
Batch accumulate_delta updates in single ets:update_counter call
chewbranca Jun 10, 2025
c9738fe
Fixup csrt_logger report tests
chewbranca Jun 10, 2025
fa6db32
Update deregister logic and testing
chewbranca Jun 11, 2025
64e0e84
make erlfmt-format
chewbranca Jun 11, 2025
9623db8
Assert registered matchers persist a after global reload
chewbranca Jun 11, 2025
92738f9
Cleanup is_enabled settings
chewbranca Jun 12, 2025
c9e3ae1
Rework updated_at logic
chewbranca Jun 12, 2025
a2cd51f
Add csrt_logger:matcher_on_long_reqs
chewbranca Jun 12, 2025
837f6d1
Cleanup Dialyzer findings
chewbranca Jun 12, 2025
a1dc061
make erlfmt-format
chewbranca Jun 12, 2025
5deaa68
Test csrt_util:field for all #rctx{} fields
chewbranca Jun 14, 2025
5110c41
Use dedicated transient CSRT supervisor
chewbranca Jun 14, 2025
113633f
Cleanup delta handling and type specs
chewbranca Jun 14, 2025
7417e2a
Fixup maybe_add_delta type restrucuring
chewbranca Jun 17, 2025
cc0a806
make erlfmt-format
chewbranca Jun 17, 2025
4ef6dd7
Add csrt:proc_window based on recon:proc_window
chewbranca Jun 17, 2025
97ae684
Add dedicated toggle to disable #rpc_worker{} reporting
chewbranca Jun 17, 2025
7153559
make erlfmt-format
chewbranca Jun 17, 2025
0eb72fd
Remove extraneous function head
chewbranca Jun 17, 2025
e64607a
Cleanup instantiation of base #rctx{} match spec
chewbranca Jun 17, 2025
2c9f5d5
Fix csrt_logger dbname io tests
chewbranca Jun 18, 2025
18769c4
make erlfmt-format
chewbranca Jun 19, 2025
c50264f
Cleanup matchers
chewbranca Jun 19, 2025
c4fb87e
Rework csrt_logger:add_matcher error type
chewbranca Jun 19, 2025
6218296
Cleanup Dialyzer and a few other things
chewbranca Jun 19, 2025
8a21722
Simple make_dt time conversions
chewbranca Jun 19, 2025
d4e53b7
Cleanup MatcherGen error handling
chewbranca Jun 19, 2025
fb05f92
make erlfmt-format
chewbranca Jun 19, 2025
b4827b8
Remove debug TODO
chewbranca Jun 19, 2025
5b3167d
Update HTTP API
iilyak Jun 10, 2025
45495fc
Hook in http updates and simple matcher querying
chewbranca Jun 24, 2025
e259e52
Add typespecs to sort_by/group_by/count_by
iilyak Jun 25, 2025
cc7f8ab
Factor out csrt_entry.erl
iilyak Jun 25, 2025
ba42c8a
Support more key types in csrt_entry:key/1
iilyak Jun 25, 2025
8b0147f
Replace csrt_util:map_to_rctx/1 with csrt_entry:from_map/1
iilyak Jun 25, 2025
65b9e40
Swap order of arguments in csrt_entry:value for consistency
iilyak Jun 25, 2025
fa0880d
Move rctx_record_info/0 to csrt_entry
iilyak Jun 25, 2025
b723df5
Move JSON conversion to csrt_entry
iilyak Jun 25, 2025
9947f4d
Rewrite csrt_query to support declarative queries
iilyak Jul 10, 2025
a5d486d
Create csrt_test_helper.erl
iilyak Jun 25, 2025
ae39d22
Add csrt_query_tests.erl suite
iilyak Jul 10, 2025
fb2c8c2
Add csrt_httpd_tests.erl suite
iilyak Jul 10, 2025
6414dc2
Add docs and various cleanup
chewbranca Jul 16, 2025
93bc894
Remove obsolete http logic
chewbranca Jul 18, 2025
e6b0212
Allow undefined in csrt:to_json/1
chewbranca Jul 22, 2025
debe52f
Update CSRT documentation
chewbranca Jul 22, 2025
2357812
Address PR feedback
chewbranca Jul 22, 2025
d757add
Remove config lookups from should_track_init_p
chewbranca Jul 22, 2025
8457fe4
Do not import query DSL functions from csrt
iilyak Jul 22, 2025
b762046
Toggle default enablements
chewbranca Jul 22, 2025
38608f7
Add more docs and additional PR cleanup
chewbranca Jul 22, 2025
38a53a0
Extract CSRT into dedicated couch_srt application
chewbranca Jul 23, 2025
86ba96b
make erlfmt-format
chewbranca Jul 23, 2025
00a02af
Add all_coordinators and all_rpc_workers matchers
chewbranca Jul 28, 2025
d5561f2
Don't report null JSON values
chewbranca Jul 28, 2025
1380b09
Cleanup couch_srt_logger_tests duplication and imports
chewbranca Jul 28, 2025
eb92a4d
Update couch_srt_query:group_by to use ets:select/3
chewbranca Jul 28, 2025
828667e
Rework documentation from PR feedback and move into official docs
chewbranca Jul 28, 2025
1ff2376
make erlfmt-format
chewbranca Jul 28, 2025
711b070
More PR cleanup
chewbranca Jul 28, 2025
d5241e7
Fix Sphinx docs make check
chewbranca Jul 28, 2025
6481338
Cleanup typos found in PR review
chewbranca Jul 28, 2025
71d11ea
Fix few typos in the docs
iilyak Jul 29, 2025
2418297
Return proper error when multiple keys are provided
iilyak Jul 30, 2025
6c0ab49
Add HTTP API docs for CSRT
iilyak Jul 31, 2025
690554d
Lint the CSRT documentation
iilyak Jul 31, 2025
da87fc3
Use json_string() type inside json_spec()
iilyak Aug 6, 2025
4a2af49
Fix typos in CSRT docs
iilyak Aug 6, 2025
13e75ae
Update links in CSRT docs
iilyak Aug 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions rebar.config.script
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ SubDirs = [
"src/couch_mrview",
"src/couch_replicator",
"src/couch_pse_tests",
"src/couch_srt",
"src/couch_stats",
"src/couch_peruser",
"src/couch_tests",
Expand Down
91 changes: 91 additions & 0 deletions rel/overlay/etc/default.ini
Original file line number Diff line number Diff line change
Expand Up @@ -1150,3 +1150,94 @@ url = {{nouveau_url}}
;mem3_shards = true
;nouveau_index_manager = true
;dreyfus_index_manager = true

; Couch Stats Resource Tracker (CSRT)
[csrt]
;enable = false
;enable_init_p = false
;enable_reporting = false
;enable_rpc_reporting = false

; Truncate reports to not include zero values for counter fields. This is a
; simple way to save space and should be left enabled unless you need a fixed
; output structure in the process lifecycle reports.
;should_truncate_reports = true

; Limit queries to a maxinum number of rows
;query_limit = 100
;query_cardinality_limit = 10000

; CSRT Logger Matchers
;
; These matchers are filters to decide whether or not to generate a process
; lifecyle report at the end of an HTTP request with a detailed report
; quantifying the CouchDB resources used to fulfill that request. These filters
; are design to make it easy to log a report for requests that utilize a lot
; of CouchDB resources, take a long time, or use heavy filtering, without having
; to enable report logging for _all_ requests. These reports can be enabled at
; the RPC worker level too, but for view queries and other aggregate operations,
; that can generate a report per shard interacted with to fullfill the request,
; and can generate a lot of data. The logger matchers are a way to dynamically
; control what is being logged on the fly, or to tailor fit the quantity of logs
; generated to store usage information in a predictable manner.
;
; These reports can be used to find potential workloads
; to refactor, but also to retroactively understand the workload that during a
; particular window of time. The node level stats collected and reported can
; inform you that, for example, a great deal of IO operations and database reads
; are being performed, but they do not provide cardinality into the databases
; and requests inducing te resource usage. This is where the process lifecycle
; reports come in: after a request is completed, if it matched a filter, a
; report is logged containing the final quantitative counts of resources usage
; to fulfill that request as well as qualitative information like username,
; dbname, nonce, and more. See CSRT.md for more details.
;
; There are a series of default logger matchers designed to filter for requests
; that surpass a threshold on a particular dimension, for example, when enabled,
; the ioq_calls default matcher filters true for requests that invoke more than
; 10,000 IOQ calls. The default named matchers are enabled by name and boolean
; in the `[csrt_logger.matchers_enabled]` section, and similarly, the Threshold
; value for each of the default matchers is specified in the config section
; `[csrt_logger.matchers_threshold]` by name and integer threshold quantity.
;
; The default loggers above operate against any HTTP requests flowing through
; CouchDB, whereas the `[csrt_logger.dbnames_io]` provides a simple way to
; specify database specific matchers, at the expense of the granularity
; available in the default matchers. The "dbnames_io" logger matcher filters
; for requests against a particular database that induce more than the specified
; threshold of IO operations. This is a generic IO catchall matcher, not
; specific to ioq_calls or docs_read, like the default matchers.
;
; CSRT dbname matchers
; Given a dbname and a positive integer, this will enable an IO matcher
; against the provided db for any requests that induce IO in quantities
; greater than the provided threshold on any one of: ioq_calls, rows_read
; docs_read, get_kp_node, get_kv_node, or changes_processed.
[csrt_logger.dbnames_io]
; For example:
; foo = 100
; _dbs = 123
; _users = 234
; foo/bar = 200

; CSRT default matchers - enablement configuration
; The default CSRT loggers can be individually enabled below
[csrt_logger.matchers_enabled]
;all_coordinators = false
;all_rpc_workers = false
;docs_read = false
;rows_read = false
;docs_written = false
;long_reqs = false
;changes_processed = false
;ioq_calls = false

; CSRT default matchers - threshold configuration
; This specifies the integer Threshold for the various builtin matchers
[csrt_logger.matchers_threshold]
;docs_read = 1000
;rows_read = 1000
;docs_written = 500
;long_reqs = 60000
;changes_processed = 1000
;ioq_calls = 10000
2 changes: 2 additions & 0 deletions rel/reltool.config
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
couch_log,
couch_mrview,
couch_replicator,
couch_srt,
couch_stats,
couch_event,
couch_peruser,
Expand Down Expand Up @@ -103,6 +104,7 @@
{app, couch_log, [{incl_cond, include}]},
{app, couch_mrview, [{incl_cond, include}]},
{app, couch_replicator, [{incl_cond, include}]},
{app, couch_srt, [{incl_cond, include}]},
{app, couch_stats, [{incl_cond, include}]},
{app, couch_event, [{incl_cond, include}]},
{app, couch_peruser, [{incl_cond, include}]},
Expand Down
10 changes: 10 additions & 0 deletions src/chttpd/src/chttpd.erl
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,10 @@ handle_request_int(MochiReq) ->
% Save client socket so that it can be monitored for disconnects
chttpd_util:mochiweb_client_req_set(MochiReq),

%% This is probably better in before_request, but having Path is nice
couch_srt:create_coordinator_context(HttpReq0, Path),
couch_srt:set_context_handler_fun({?MODULE, ?FUNCTION_NAME}),

{HttpReq2, Response} =
case before_request(HttpReq0) of
{ok, HttpReq1} ->
Expand Down Expand Up @@ -369,6 +373,7 @@ handle_request_int(MochiReq) ->

before_request(HttpReq) ->
try
couch_srt:set_context_handler_fun({?MODULE, ?FUNCTION_NAME}),
chttpd_stats:init(),
chttpd_plugin:before_request(HttpReq)
catch
Expand All @@ -388,6 +393,8 @@ after_request(HttpReq, HttpResp0) ->
HttpResp2 = update_stats(HttpReq, HttpResp1),
chttpd_stats:report(HttpReq, HttpResp2),
maybe_log(HttpReq, HttpResp2),
%% NOTE: do not set_context_handler_fun to preserve the Handler
couch_srt:destroy_context(),
HttpResp2.

process_request(#httpd{mochi_req = MochiReq} = HttpReq) ->
Expand All @@ -400,6 +407,7 @@ process_request(#httpd{mochi_req = MochiReq} = HttpReq) ->
RawUri = MochiReq:get(raw_path),

try
couch_srt:set_context_handler_fun({?MODULE, ?FUNCTION_NAME}),
couch_httpd:validate_host(HttpReq),
check_request_uri_length(RawUri),
check_url_encoding(RawUri),
Expand All @@ -425,10 +433,12 @@ handle_req_after_auth(HandlerKey, HttpReq) ->
HandlerKey,
fun chttpd_db:handle_request/1
),
couch_srt:set_context_handler_fun(HandlerFun),
AuthorizedReq = chttpd_auth:authorize(
possibly_hack(HttpReq),
fun chttpd_auth_request:authorize_request/1
),
couch_srt:set_context_username(AuthorizedReq),
{AuthorizedReq, HandlerFun(AuthorizedReq)}
catch
ErrorType:Error:Stack ->
Expand Down
2 changes: 2 additions & 0 deletions src/chttpd/src/chttpd_db.erl
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@

% Database request handlers
handle_request(#httpd{path_parts = [DbName | RestParts], method = Method} = Req) ->
couch_srt:set_context_dbname(DbName),
case {Method, RestParts} of
{'PUT', []} ->
create_db_req(Req, DbName);
Expand All @@ -103,6 +104,7 @@ handle_request(#httpd{path_parts = [DbName | RestParts], method = Method} = Req)
do_db_req(Req, fun db_req/2);
{_, [SecondPart | _]} ->
Handler = chttpd_handlers:db_handler(SecondPart, fun db_req/2),
couch_srt:set_context_handler_fun(Handler),
do_db_req(Req, Handler)
end.

Expand Down
1 change: 1 addition & 0 deletions src/chttpd/src/chttpd_httpd_handlers.erl
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ url_handler(<<"_utils">>) -> fun chttpd_misc:handle_utils_dir_req/1;
url_handler(<<"_all_dbs">>) -> fun chttpd_misc:handle_all_dbs_req/1;
url_handler(<<"_dbs_info">>) -> fun chttpd_misc:handle_dbs_info_req/1;
url_handler(<<"_active_tasks">>) -> fun chttpd_misc:handle_task_status_req/1;
url_handler(<<"_active_resources">>) -> fun couch_srt_httpd:handle_resource_status_req/1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an API documentation section, we should probably document what the requests look like there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got the documentation reworked into the formal Sphinx docs and now we've got a CSRT working set of docs to extend from, I'm hoping @iilyak can update those docs with his HTTP API updates. I was also wondering if there's a simple way to get the inline docs from the couch_srt_query module, as those have a lot of great info, where do those end up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping @iilyak can update those docs with his HTTP API updates.

Docs are added.

url_handler(<<"_scheduler">>) -> fun couch_replicator_httpd:handle_scheduler_req/1;
url_handler(<<"_node">>) -> fun chttpd_node:handle_node_req/1;
url_handler(<<"_reload_query_servers">>) -> fun chttpd_misc:handle_reload_query_servers_req/1;
Expand Down
20 changes: 20 additions & 0 deletions src/couch/priv/stats_descriptions.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,10 @@
{type, counter},
{desc, <<"number of couch_server LRU operations skipped">>}
]}.
{[couchdb, couch_server, open], [
{type, counter},
{desc, <<"number of couch_server open operations invoked">>}
]}.
{[couchdb, query_server, vdu_rejects], [
{type, counter},
{desc, <<"number of rejections by validate_doc_update function">>}
Expand Down Expand Up @@ -422,6 +426,22 @@
{type, counter},
{desc, <<"number of legacy checksums found in couch_file instances">>}
]}.
{[couchdb, btree, get_node, kp_node], [
{type, counter},
{desc, <<"number of couch btree kp_nodes read">>}
]}.
{[couchdb, btree, get_node, kv_node], [
{type, counter},
{desc, <<"number of couch btree kv_nodes read">>}
]}.
{[couchdb, btree, write_node, kp_node], [
{type, counter},
{desc, <<"number of couch btree kp_nodes written">>}
]}.
{[couchdb, btree, write_node, kv_node], [
{type, counter},
{desc, <<"number of couch btree kv_nodes written">>}
]}.
{[pread, exceed_eof], [
{type, counter},
{desc, <<"number of the attempts to read beyond end of db file">>}
Expand Down
1 change: 1 addition & 0 deletions src/couch/src/couch.app.src
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
couch_log,
couch_event,
ioq,
couch_srt,
couch_stats,
couch_dist,
couch_quickjs
Expand Down
2 changes: 2 additions & 0 deletions src/couch/src/couch_btree.erl
Original file line number Diff line number Diff line change
Expand Up @@ -472,6 +472,7 @@ reduce_tree_size(kp_node, NodeSize, [{_K, {_P, _Red, Sz}} | NodeList]) ->

get_node(#btree{fd = Fd}, NodePos) ->
{ok, {NodeType, NodeList}} = couch_file:pread_term(Fd, NodePos),
couch_stats:increment_counter([couchdb, btree, get_node, NodeType]),
{NodeType, NodeList}.

write_node(#btree{fd = Fd, compression = Comp} = Bt, NodeType, NodeList) ->
Expand All @@ -480,6 +481,7 @@ write_node(#btree{fd = Fd, compression = Comp} = Bt, NodeType, NodeList) ->
% now write out each chunk and return the KeyPointer pairs for those nodes
ToWrite = [{NodeType, Chunk} || Chunk <- Chunks],
WriteOpts = [{compression, Comp}],
couch_stats:increment_counter([couchdb, btree, write_node, NodeType]),
{ok, PtrSizes} = couch_file:append_terms(Fd, ToWrite, WriteOpts),
{ok, group_kps(Bt, NodeType, Chunks, PtrSizes)}.

Expand Down
6 changes: 6 additions & 0 deletions src/couch/src/couch_query_servers.erl
Original file line number Diff line number Diff line change
Expand Up @@ -614,6 +614,12 @@ filter_docs(Req, Db, DDoc, FName, Docs) ->
end.

filter_docs_int(Db, DDoc, FName, JsonReq, JsonDocs) ->
%% Count usage in _int version as this can be repeated for OS error
%% Pros & cons... might not have actually processed `length(JsonDocs)` docs
%% but it certainly undercounts if we count in `filter_docs/5` above
%% TODO: replace with couchdb.query_server.*.ddoc_filter stats once we can
%% funnel back the stats used in the couchjs process to this caller process
couch_srt:js_filtered(length(JsonDocs)),
[true, Passes] = ddoc_prompt(
Db,
DDoc,
Expand Down
1 change: 1 addition & 0 deletions src/couch/src/couch_server.erl
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ sup_start_link(N) ->
gen_server:start_link({local, couch_server(N)}, couch_server, [N], []).

open(DbName, Options) ->
couch_stats:increment_counter([couchdb, couch_server, open]),
try
validate_open_or_create(DbName, Options),
open_int(DbName, Options)
Expand Down
10 changes: 9 additions & 1 deletion src/couch_log/src/couch_log_formatter.erl
Original file line number Diff line number Diff line change
Expand Up @@ -470,7 +470,12 @@ format_meta(Meta) ->
lists:sort(
maps:fold(
fun(K, V, Acc) ->
[to_str(K, V) | Acc]
case to_str(K, V) of
"" ->
Acc;
Str ->
[Str | Acc]
end
end,
[],
Meta
Expand All @@ -487,6 +492,9 @@ format_meta(Meta) ->
%% - maps
%% However we are not going to try to distinguish lists from string
%% Atoms would be printed as strings
%% `null` JSON values are skipped
to_str(_K, null) ->
"";
to_str(K, _) when not (is_list(K) or is_atom(K)) ->
"";
to_str(K, Term) when is_list(Term) ->
Expand Down
2 changes: 2 additions & 0 deletions src/couch_log/test/eunit/couch_log_formatter_test.erl
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,13 @@ format_report_etoolong_test() ->

format_report_test() ->
{ok, Entry} = couch_log_formatter:format_report(self(), report123, #{
empty => null,
foo => 123,
bar => "barStr",
baz => baz
}),
% Rely on `couch_log_formatter:format_meta/1` to sort keys
% `empty` is missing as `null` values are skipped
Formatted = "[bar=\"barStr\" baz=\"baz\" foo=123]",
?assertEqual(Formatted, lists:flatten(Entry#log_entry.msg)).

Expand Down
Loading