-
-
Notifications
You must be signed in to change notification settings - Fork 346
gix free pack verify --statistics
uses ambiguous "KB" for SI kilobyte
#1947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
`gix free pack verify --statistics` formerly used "KB" for kilobytes (i.e., SI decimal kilobytes, units of 1000 bytes). This was somewhat ambiguous because it is occasionally also used for kibibytes (i.e., IEC binary kibibytes, units of 1024 bytes). Kilobytes and kibibytes can be more precisely distinguished by using kB for kilobytes (since "k" is the SI prefix for "kilo") and KiB for kibibytes (since decimal kilobytes are never written KiB). This adapts `gitoxide-core` to changes in `bytesize` and, in so doing, allows the SI unit symbol "kB" to be used. Fixes GitoxideLabs#1947
Thanks for bringing this up! If a In any case, I think this should be fixed and I am happy to help with |
#1949 included an upgrade to So 77a3a1b in #1949 fixed this issue, at least so long as we construe this issue narrowly to be only about the behavior of As detailed above, it seems to me that, for amounts of data in units that are ultimately based on bytes, we shouldn't use SI decimal units like "kB" (formerly "KB") here at all, but that we should instead use IEC binary units like "KiB". This is subjective. Switching to binary IEC units would have come free (other than updating the journey tests) with upgrading The reason I went with keeping decimal SI units instead (with the less ambiguous "kB" symbol) is that I wanted to make the smallest change to observable behavior that fixed the bug and allowed all tests to pass in #1949. A bigger concern is that there may be other areas where the observable behavior has also changed already due to #1949 upgrading In addition, the crates for which
How do I determine whether a change to prodash itself requires a major version bump? As far as I can tell, it doesn't re-export
It's likely, though not certain, that I may be able to open a PR for that tonight. |
And adapt `Bytes::format_bytes()` to the API change. The old `bytesize::to_string()` function is confusing in its second parameter `si_prefix: bool`, which is why it was removed. But it looks like a value of `false`, as we were passing, actually caused decimal SI units (rather than binary IEC units) to be used. It's not obvious which units `prodash` intended to use `bytesize` to convert to and display in. But this seems to be a minimal change to adapt to the new major version. In spite of the name `si_prefix` for the old parameter, experiments show that setting it to `true` caused values to be converted to and presented in binary IEC units. Although this attempts not to change the units that are used, is is expected to produce observable differneces for some of them, in how they are presented. In particular, a decimal SI kilobyte is least ambiguously abbreviated "kB" (because "k" is an SI unit symbol prefix for "kilo" in its meaning of 1000), but it was previously written as "KB". It is now expected to be writen as "kB". See also: GitoxideLabs/gitoxide#1947 (comment)
And adapt `Bytes::format_bytes()` to the API change. The old `bytesize::to_string()` function is confusing in its second parameter `si_prefix: bool`, which is why it was removed. But it looks like a value of `false`, as we were passing, actually caused decimal SI units (rather than binary IEC units) to be used. It's not obvious which units `prodash` intended to use `bytesize` to convert to and display in. But this seems to be a minimal change to adapt to the new major version. In spite of the name `si_prefix` for the old parameter, experiments show that setting it to `true` caused values to be converted to and presented in binary IEC units. Although this attempts not to change which actual units are used, it does produce observable differneces for some of them, in how they are presented. In particular, a decimal SI kilobyte is least ambiguously abbreviated "kB" (because "k" is an SI unit symbol prefix for "kilo" in its meaning of 1000), but it was previously written as "KB". It is now expected to be writen as "kB". Tests catch this distinction, and are updated here accordingly to assert that the generally preferable "kB" symbol for decimal SI kilobyte is used. See also: GitoxideLabs/gitoxide#1947 (comment)
This upgrades the `bytesize` dependency in `Cargo.toml` from version 1.0.1 (which was usually selecting 1.3.3) to 2.0.1 (which is the latest version). There were some nontrivial API changes from major version 1 to 2. Accordingly, this adapts `Bytes::format_bytes()` to the API change. The old `bytesize::to_string()` function was confusing in its second parameter `si_prefix: bool`, which is why it was removed. But it looks like a value of `false`, as we were passing, actually caused decimal SI units (rather than binary IEC units) to be used. It's not obvious which units `prodash` intended to use `bytesize` to convert to and display in. But this seems to be a minimal change to adapt to the new major version. In spite of the name `si_prefix` for the old parameter, experiments show that setting it to `true` caused values to be converted to and presented in binary IEC units. Although this attempts not to change which actual units are used, it does produce observable differneces for some of them, in how they are presented. In particular, a decimal SI kilobyte is least ambiguously abbreviated "kB" (because "k" is an SI unit symbol prefix for "kilo" in its meaning of 1000), but it was previously written as "KB". It is now expected to be writen as "kB". Tests catch this distinction, and are updated here accordingly to assert that the generally preferable "kB" symbol for decimal SI kilobyte is used. See also: GitoxideLabs/gitoxide#1947 (comment)
This upgrades the `bytesize` dependency in `Cargo.toml` from version 1.0.1 (which was usually selecting 1.3.3) to 2.0.1 (which is the latest version). There were some nontrivial API changes from major version 1 to 2. Accordingly, this adapts `Bytes::format_bytes()` to the API change. The old `bytesize::to_string()` function was confusing in its second parameter `si_prefix: bool`, which is why it was removed. But it looks like a value of `false`, as we were passing, actually caused decimal SI units (rather than binary IEC units) to be used. It's not obvious which units `prodash` intended to use `bytesize` to convert to and display in. But this seems to be a minimal change to adapt to the new major version. In spite of the name `si_prefix` for the old parameter, experiments show that setting it to `true` caused values to be converted to and presented in binary IEC units. Although this attempts not to change which actual units are used, it does produce observable differences for some of them, in how they are presented. In particular, a decimal SI kilobyte is least ambiguously abbreviated "kB" (because "k" is an SI unit symbol prefix for "kilo" in its meaning of 1000), but it was previously written as "KB". It is now expected to be writen as "kB". Tests catch this distinction, and are updated here accordingly to assert that the generally preferable "kB" symbol for decimal SI kilobyte is used. See also: GitoxideLabs/gitoxide#1947 (comment)
No worries at all - I missed it too, but admittedly I also didn't think much about. The system as is (without automation) is prone to failure when faced with very fallible humans.
I never considered the display for humans could be a breaking change, only breaking the build and in the worst case, behaviour. Thanks again for all your help with this, it's much appreciated and I am definitely feeling like I should rush things a little less. |
I'm not sure how big of a change is needed before that would qualify as breaking. One factor that made me wonder about it is that the exact output of |
Right, the journey tests. These are from a time when I nailed the output 'just because', and before it was clear that |
In this case, the initial change before I did anything to adapt to the new version of -compressed entries size : 51.8 KB
-decompressed entries size : 103.7 KB
-total object size : 288.7 KB
-pack size : 51.9 KB
+compressed entries size : 50.5 KiB
+decompressed entries size : 101.3 KiB
+total object size : 281.9 KiB
+pack size : 50.7 KiB When that appears as part of larger output, or if it is formatted in a non-monospace font, it is not immediately obvious that what has happened is that the units being used have changed from units of 1000 bytes to units of 1024 bytes. So, from my perspective, the journey test was helpful to me here. Without it, I would not have noticed this issue, and it might not have occurred to me that there could be skew in the behavior of To be clear, I am not saying this to argue that more journey tests be created for |
In a way, the current suite of journey-tests is overly detailed where it does exist for However, doing so is probably rarely necessary if there are enough unit tests, and adding more can always be done if coverage is too low in some places. |
Current behavior 😯
As attested by the current journey test snapshots,
gix free pack verify --statistics
outputs compression-related sizes in units of "KB":gitoxide/tests/snapshots/plumbing/no-repo/pack/verify/index-with-statistics-success
Lines 17 to 21 in 79dabb0
But it is not immediately clear what unit that actually is. Is it…
It turns out that, in this case, 1 KB = 1 kB, but it is not obvious.
Expected behavior 🤔
It is also not obvious what unit is intended.
gitoxide-core
uses thebytesize
library to display the units:gitoxide/gitoxide-core/src/pack/verify.rs
Lines 235 to 244 in 79dabb0
What unit is displayed and what symbol is used to represent it varies across major versions of
bytesize
. In current (i.e. recent stable) releases, the default unit is the IEC binary kibibyte, which it abbreviates KiB; while one can explicitly request the SI decimal kilobyte, which it abbreviates kB. But old versions ofbytesize
behave differently, defaulting to the SI decimal kilobyte, and also abbreviating it with the non-SI symbol KB. The new behavior came in as ofbytesize
2.0.0. Butgitoxide-core
depends on:gitoxide/gitoxide-core/Cargo.toml
Line 61 in 79dabb0
I suggest upgrading to
bytesize
2.0.* and deciding whether we actually want……units of 1000 bytes abbreviated kB, in which case the above would be changed to:
gitoxide/gitoxide-core/src/pack/verify.rs
Lines 235 to 244 in 46df372
…or units of 1024 bytes abbreviated KiB, in which case no change would be needed to that source code file.
(Though it could, if desired, be made explicit by calling
iec()
where the SI alternative callssi()
.)Git behavior
I'm not sure if there's a Git behavior that should be considered to correspond exactly to this, since
gix
doesn't have or aim for the same interface asgit
, and sincegit verify-pack
does not show file sizes in its statistics:But some other
git
commands do show sizes of things in "human" units. For example:When it is run without
-H
thesize-pack
value is shown with no unit, but git-count-objects(1) documents it as being in units of KiB. With neither-v
nor-H
, one gets:There, "kilobytes" is ambiguous. One might think it means decimal SI kilobytes (1000 bytes). But actually that occurrence of "kilobytes" means binary IEC kibibytes, as revealed by:
I don't think any of this has much bearing on what we should do, since it's about display behavior that makes no effort to be similar to Git. However, it may be that the preference in Git for using binary IEC units--rather than decimal SI units--reflects a preference for those units, or would lead users to expect that gitoxide use such units. (My personal preference is also for binary IEC units.)
Any change here, especially if it includes upgrading
bytesize
, should be fairly convenient for me to include in a larger PR that I am already working on. (Although the above-linked code currently fixes this by keeping them SI decimal kilobytes and changing the unit to "kB", I did that because it is closer to the current behavior, not to express a preference for that approach.)Steps to reproduce 🕹
Check the journey test snapshot file shown above and observe that the journey tests are passing. Alternatively, run:
This shows the following, which are "KB" units where by "KB" it means what would be less ambiguously called "kB":
Broadening the scope: Other uses of ambiguous units
I've framed this in terms of
gix free pack verify --statistics
because that's what I stumbled upon first (EliahKagan#18 (comment)), and because the exact way that formats sizes is under test, and because I didn't really think through the full scope this issue should have. But this should very possibly be construed more broadly: some other places also show ambiguous units, and also show what I believe to be decimal SI units when they might perhaps better show binary IEC units.In contrast, Git uses binary IEC units:
Upgrading
bytesize
in all gitoxide crates'Cargo.toml
does not affect that. I'm not sure if that's becauseprodash
depends onbytesize
1.3.3, or for some other reason.The text was updated successfully, but these errors were encountered: