Skip to content

support torrenting for fetching and serving packages #23236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andrewrk opened this issue Mar 13, 2025 · 1 comment
Open

support torrenting for fetching and serving packages #23236

andrewrk opened this issue Mar 13, 2025 · 1 comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. zig build system std.Build, the build runner, `zig build` subcommand, package management
Milestone

Comments

@andrewrk
Copy link
Member

This idea builds upon zig build --watch mode where the build runner stays running for a long time, keeping a project's build graph up-to-date.

The idea is to add a new flag, -ftorrent, which makes the build runner additionally fetch and serve packages in a decentralized manner.

We could easily get away with using direct fetching only, so why bother with this?

  1. It could be made seamless for the user, transparently making the user experience better via faster downloads and reducing package hosting costs, for those who self-host their own packages. Sure, the build runner has to grow some features internally, but the user doesn't have to do anything different.
  2. Lean harder into decentralization by depending less on charity from Microsoft and other large corporations. This is like maintaining a disaster kit for earthquakes or whatnot. You could probably get away with not dealing with it, but when shit hits the fan, it can save your life. There are intangible benefits from not depending on corpo charity for critical infrastructure.

Given there may be multiple instances of zig build --watch running, there are two ways this could work:

  • One of the processes wins the race to obtain a lock, and be the singleton doing the torrenting.
  • Each process participates as an independent node.

Either way, all packages that are present on the system (i.e. in ~/.cache/zig/p) would be seeded, unless the package has private: true in the build manifest. There is precedent for this in npm.

One neat feature this brings to the table is that the most popular packages which are being used the most and therefore need to be fetched the most, end up being seeded the most. This also provides a nice popularity metric. To game the metric means donating extra bandwidth to the network, so go ahead and game the metric - everybody wins.

There are some settings which might be valuable to be stored persistently, such as:

  • default mode (-ftorrent, -fno-torrent).
    • Unfortunately even with ability to mark packages private: true, having default on would be too aggressive. However, it would be nice if users could decide once to enable it and not have to remember to enable it with every CLI invocation.
  • bandwidth limiting.
  • zig cache file size limit. This is independent from this issue, I'm just pointing out we already have a relevant use case for persistent settings.

Figuring out whether and how to store persistent settings such as those is a separate issue.

Here are some ways this could work:

  • Keep unpacked packages in global zig cache. Data is transferred uncompressed.
    • Downside: more bandwidth usage than direct fetching of compressed artifact
  • Keep unpacked packages in global zig cache. Data is packed and compressed into a canonical format on the fly for transfer.
    • Downside: unpredictable CPU usage spikes when such artifacts are created
    • Downside: more garbage in the cache
  • Revisit use an additional, project-local copy of dependency trees #14283. After fetching and unpacking, data is re-packed and compressed into a canonical format for long-term storage in global zig cache. When fetching, cache hits from global cache are unpacked directly into project-local cache.
    • Downside: more work when fetching
    • Downside: more copies of package data
    • Upside: project-local package data can be hacked up freely with no consequences for other projects
    • Upside: less likely for global cache to become corrupted and invalidate nodes in the network
    • Upside: canonical repackings omit filtered out files, resulting in less bandwidth usage than direct fetching
    • Upside: provides opportunity to commit dependencies to source control, which some people like to do

When nothing needs to be fetched, it might be worth doing a timeout before starting seeding since there could be a high likelihood of users starting and killing the process rapidly. On the other hand, the state could be maintained on disk such that doing so doesn't "reset" but rather only introduces latency.

@andrewrk andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. zig build system std.Build, the build runner, `zig build` subcommand, package management labels Mar 13, 2025
@andrewrk andrewrk added this to the unplanned milestone Mar 13, 2025
@LordMZTE
Copy link
Contributor

Neat concept, here's an idea I'd just like to throw in: What about IPFS? I think it's uniquely well suited to this due to its content-addressed nature (i.e. no extra hashing/verification necessary).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. zig build system std.Build, the build runner, `zig build` subcommand, package management
Projects
None yet
Development

No branches or pull requests

2 participants