-
Notifications
You must be signed in to change notification settings - Fork 226
[work in progress] reporting query result statistics #525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/client.rs
Outdated
@@ -1675,6 +1678,20 @@ where | |||
Some(message) => message, | |||
None => &self.buffer, | |||
}; | |||
// TODO query sample rate from config | |||
let should_parse_query = pool.settings.query_parser_enabled | |||
&& !server.in_transaction() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also make sure it's not a begin statement we're caching or a set statement? Ideally we just check for select from
@@ -3,6 +3,9 @@ name = "pgcat" | |||
version = "1.1.1" | |||
edition = "2021" | |||
|
|||
[profile.release] | |||
debug = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The executable is gonna be quite large. Not an issue for production imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really required, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I committed it unnecessarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great for debugging production issues, otherwise the backtraces are pretty slim. This just attaches debug symbols, but still compiles it with all the optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dat2 Thanks for the PR. This is certainly exciting.
@@ -3,6 +3,9 @@ name = "pgcat" | |||
version = "1.1.1" | |||
edition = "2021" | |||
|
|||
[profile.release] | |||
debug = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really required, though?
@@ -48,6 +51,8 @@ itertools = "0.10" | |||
clap = { version = "4.3.1", features = ["derive", "env"] } | |||
tracing = "0.1.37" | |||
tracing-subscriber = { version = "0.3.17", features = ["json", "env-filter", "std"]} | |||
pg_query = "0.7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're adding pg_query
, shouldn't we also remove sqlparser
(and replace its usages with pg_query
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! feel free to do it. I wont' be able to get to it for a month or so
@@ -40,3 +41,10 @@ pub fn format_duration(duration: &chrono::Duration) -> String { | |||
days, hours, minutes, seconds, milliseconds | |||
) | |||
} | |||
|
|||
pub fn hash_string(hash: &[u8]) -> String { | |||
// TODO assert length == 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO assert length == 256 | |
debug_assert_eq!(hash.len(), 256) |
Is that enough?
|
||
pub fn is_select(&self) -> bool { | ||
pg_query::parse(&self.text) | ||
.map(|tree| tree.select_tables() == tree.tables()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map(|tree| tree.select_tables() == tree.tables()) | |
.map(|tree| !tree.select_tables().is_empty()) |
Is it really necessary to check the equality? It should be enough if tree.select_tables()
isn't empty, right?
QueryResultStats { | ||
statistics: LruCache::new(NonZeroUsize::new(1000).unwrap()), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid code duplication, I'd suggest:
QueryResultStats { | |
statistics: LruCache::new(NonZeroUsize::new(1000).unwrap()), | |
} | |
QueryResultStats::new() |
impl QueryResultStats { | ||
pub(crate) fn new() -> QueryResultStats { | ||
QueryResultStats { | ||
statistics: LruCache::new(NonZeroUsize::new(1000).unwrap()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth making this 1000
configurable? At least, it might be worth creating a named const
for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I had initially made a config option but reverted it to simplify it locally
in order to discover what queries are cacheable, we need to know roughly how many times we get the same result for a given query. this implements that by hashing a specific query text, hashing the results we get back, and storing some metadata such as
first_seen
,last_seen
, andcount
. we can compute a rough TTL for cacheable queries usinglast_seen
-first_seen
to see how often query results change (for the exact same query text).on our databases, we have some tables that see thousands of writes / week, but billions of reads / week, so we're trying to estimate how long we can cache those rows to try and take the load off for those billions of reads.
i'm able to make this work with
LOG_LEVEL=debug cargo run -- pgcat.minimal.toml
PGPASSWORD=postgres psql -p 6432 -h localhost -d pgml -U postgres -w -c 'select 1'
PGPASSWORD=postgres psql -p 6432 -h localhost -d pgml -U postgres -w -c 'select 2'
PGPASSWORD=postgres psql -p 6432 -h localhost -d pgml -U postgres -w -c 'select 3'
PGPASSWORD=postgres psql -p 6432 -h localhost -d pgml -U postgres -w -c 'select 1'
PGPASSWORD=pgcat psql -p 6432 -h localhost -d pgcat -U pgcat -w -c 'show query_result_stats'