-
Notifications
You must be signed in to change notification settings - Fork 100
Meta without dets #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Meta without dets #548
Conversation
9746b5a
to
6c5117f
Compare
218a8a7
to
4fa272f
Compare
Until now, we persisted metadata for all servers in a given Ra system in a single DETS file, owned by ra_log_meta gen batch server. This was a source of bottlenecks, so we move each servers metadata to a simple binary file. If the file doesn't exist, we try to fetch metadata from ra_log_meta and immediately persist it, so that we don't need ra_log_meta going forward.
Now that meta_fd is a part of the state, repeated calls to ra_server:init() lead to different initial states, so the test is adjusted not to expect the same result from both calls.
In this test we end up in update_last_applied with an undefined meta_fd set in base_state, so let's mock it as a no-op.
4fa272f
to
186d9f2
Compare
src/ra_server_meta.erl
Outdated
NodeAtomBin = atom_to_binary(NodeAtom, utf8), | ||
NameSize = byte_size(NameAtomBin), | ||
NodeSize = byte_size(NodeAtomBin), | ||
<<NameSize:8/unsigned, NameAtomBin/binary, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noting that although an atom can only contain max 255 (possibly unicode) characters, the binary representation might require more than 255 bytes
1> Atom = list_to_atom(lists:duplicate(255, 256)).
'ĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀ'
2> byte_size(atom_to_binary(Atom)).
510
term_to_binary stores the atom size in 1 or 2 bytes https://www.erlang.org/doc/apps/erts/erl_ext_dist.html#atom_utf8_ext
sorry if this was considered already, and storing the size in 1 byte is intentional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not considered.
I thought it would still limit utf8 atoms to 255 bytes. This is slighlty awekward as we'd theoretically need 4096 bytes just for the two atoms which would then take us over the standard block size.
We could consider compressing but we'd still need to ensure we're below 4000 or so bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it's late and I clearly cant math. We'd just need 2052 bytes to encode the atoms so we're still well below the block size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for pointing this out. I've modified the format and code to use 2 bytes for atom sizes. Technically, if both atoms were full of UTF-8, we could still fail, but what are the odds?! :) Based on our testing, node names can't be UTF-8 (and the would need to be full of them for this to be a problem)
While an atom cannot be longer than 255 characters, its binary representation can be longer, if UTF-8 characters are present.
Deprecation of ra_log_meta with its DETS-based metadata persistence.
Instead, each server saves its own state in a simple binary file.
Of course for upgradeability, if a file doesn't exist, we take values from
ra_log_meta.