Skip to content

wget transport doesn't support range requests #3799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hvr opened this issue Sep 7, 2016 · 15 comments
Closed

wget transport doesn't support range requests #3799

hvr opened this issue Sep 7, 2016 · 15 comments

Comments

@hvr
Copy link
Member

hvr commented Sep 7, 2016

wget own logic can't cope with a partial 206 response when a range header is injected via e.g.

--header=Range: bytes=49859152-49932995

instead it assumes the request to be faulty and retries; see also this SO question.

Instead, wget supports a --start-pos= flag for range requests, however it doesn't allow to specify an end-position (nor a length). And our CDN doesn't seem to like open ended range requests of the style

Range: bytes=49859152-

This appears to make wget currently unusable for performing incremental downloads of 01-index.tar.gz (NB: curl and plain-http both work fine).

IMO, cabal should either refuse to use wget for range-requests or fallback to full downloads and emit nasty warnings about wasting network bandwidth...

/cc @dcoutts @gbaz

@ezyang
Copy link
Contributor

ezyang commented Sep 8, 2016

Is there a reason why wget is default? Can we paper this over by just changing the default?

@hvr
Copy link
Member Author

hvr commented Sep 8, 2016

iirc the defaulting goes like this (not sure where powershell fits in):

  1. use explicitly requested http-transport mode if specified via config-file or CLI flag
  2. if there's curl, use curl
  3. if there's wget, use wget
  4. as last resort, use http-plain

@23Skidoo
Copy link
Member

23Skidoo commented Sep 8, 2016

@hvr

not sure where powershell fits in

After curl and wget: https://github.com/haskell/cabal/blob/master/cabal-install/Distribution/Client/HttpUtils.hs#L252

@dcoutts
Copy link
Contributor

dcoutts commented Sep 8, 2016

We could consider adjusting the default order when using the secure downloads, since we don't need https at all. So we could use an order of: explicitly-specified, http-plain. The motivation for curl/wget is https uploads, and https downloads for the classic 00-index, but it's not need for the secure 01-index stuff.

@hvr
Copy link
Member Author

hvr commented Sep 8, 2016

@dcoutts While I generally agree that for secure repos, http-plain is desirable (avoids the need to fork/exec external processes -> less moving pieces & latency) there's one caveat though: the retry logic differs significantly between http-plan/curl/wget currently, see also #3386

@23Skidoo
Copy link
Member

23Skidoo commented Sep 8, 2016

The motivation for curl/wget is https uploads, and https downloads for the classic 00-index, but it's not need for the secure 01-index stuff.

HTTPS downloads are still a good idea for proprietary hackages.

@ezyang
Copy link
Contributor

ezyang commented Sep 8, 2016

OK, so it sounds like this ticket should be solved with a warning instead.

@omefire
Copy link
Collaborator

omefire commented Sep 14, 2016

I'm looking for a way to test some patch I've written.

Is there a way to specify the range request from the command line ?
i.e: cabal --http-transport=wget --headers='Range: 1-20' update ?

If not, which component actually sends these range headers ?
Here's the link to the patch: #3841

@dcoutts
Copy link
Contributor

dcoutts commented Sep 14, 2016

HTTPS downloads are still a good idea for proprietary hackages.

@23Skidoo it's a good idea if they're not yet using the secure mode. So it's a property of secure / classic rather than central vs other.

@23Skidoo
Copy link
Member

23Skidoo commented Sep 14, 2016

@dcoutts

it's a good idea if they're not yet using the secure mode. So it's a property of secure / classic rather than central vs other.

The idea is that when you're not using HTTPS someone sniffing the traffic can get the source code of your proprietary packages.

@ezyang
Copy link
Contributor

ezyang commented Sep 15, 2016

#3841 fixed us to now warn if this happens. But the underlying bug still exists.

@23Skidoo
Copy link
Member

But the underlying bug still exists.

I'm not sure what else we can do here besides maybe dropping support for wget.

@omefire
Copy link
Collaborator

omefire commented Sep 15, 2016

Yep, nothing else for us to do here, as the bug is with wget.

@hvr
Copy link
Member Author

hvr commented Sep 16, 2016

@23Skidoo sure, but if you specify a public http:// hackage repo, you clearly don't need confidentiality; btw, does cabal even support using credentials (user/pass and/or ssl client certs) for private repos?

So right now, I'd say that it's preferable to skip wget when downloading the index & packages if you don't need confidentiality (i.e. a plain http:// url was specified), and rather use curl or http-plain which do support range requests.

@23Skidoo
Copy link
Member

23Skidoo commented Sep 16, 2016

btw, does cabal even support using credentials (user/pass and/or ssl client certs) for private repos?

You can use https://user:pass@hackage URLs, it'll work with the wget transport (but not other ones, it looks like). #2761 and #2763 are relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants