-
Notifications
You must be signed in to change notification settings - Fork 229
flutter packages get repeatedly tries to fetch packages if a packages in pubspec does not exist #2242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think the cause here is that Pub uses the same exit code (69) for "server unavailable" and "this package does not exist", and Flutter assumes it's always the first. I think the best fix would be for pub to have separate error codes for these, then Flutter can just retry for the first - however I don't know if changing the error code would be breaking. (This issue might be better raised at https://github.com/flutter/flutter and https://github.com/dart-lang/pub - I thought it was a timeout issue when I first pasted the link to this repo :-)) |
Traced this to flutter/flutter#11882 any chance @Hixie has insights on why Flutter retries Side note, while exit code
This file references BSDs sysexits manpage which specifies:
Thus, it it's probably fine to use it as a permanent error for the case where a package does not exist on the remote server, or even a catch-all as is the case in command_runner.dart. If we did want to provide a separate error code for temporary failures I would think |
Note. I was able to reproduce this on Flutter master channel. |
Pub has temporary failures very frequently (many times a day in our CI alone), so we try to wrap every invocation of pub with infinite retry and truncated exponential backoff. (We actually have a bunch more places where we call It would be ideal for the |
@Hixie, Thanks for the insight.. who should we talk to get some of those logs. We're are very keen to improve reliability on both pub client and server. Maybe the retry logic doesn't handle all network failures, or we have too few retries. |
Just run pub get in a loop, it'll fail sooner or later... |
Running
So presumably if my network wobbled legitimately, this would happen. Should it be retrying? |
See also flutter/flutter#10764 from 2017. |
I ran import 'dart:io';
import 'dart:async';
main() async {
while (true) {
try {
Directory('test/cache').deleteSync(recursive: true);
File('test/pubspec.lock').deleteSync();
File('test/.poackages').deleteSync();
} on FileSystemException {
// Ignore.
}
final version = ((await Process.run('pub', ['--version'])).stdout as String).trim();
final stopwatch = Stopwatch()..start();
final r = await Process.run('pub', ['get', '--no-precompile'],
environment: {'PUB_CACHE': 'cache'}, workingDirectory: 'test');
stopwatch.stop();
final s = File('logs').openWrite(mode: FileMode.append);
if (r.exitCode == 0) {
s.writeln(
'**** Succeded pub get ${DateTime.now().toIso8601String()} $version ${stopwatch.elapsed}');
} else {
s.writeln(
'**** Failed pub get ${DateTime.now().toIso8601String()} $version Error code ${r.exitCode} ${stopwatch.elapsed}');
s.write('stdOut:\n${r.stdout}');
s.write('stdErr:\n${r.stderr}');
}
s.close();
await Future.delayed(Duration(seconds: 10));
}
} test/pubspec.yaml contains: name: test_app
dev_dependencies:
test: It now had: 12715 runs in a row with no failures. |
Can you reproduce #2242 (comment)? That fails with no retries for me (it's contrived, but I don't think it's impossible that legit network failures could fail in the same way, and it seems like a path that isn't retrying). |
I only have a wired network on my workstation, I tried to disable that, and I got a few retries and a message:
|
What if you force it to fail on DNS by giving a bad hostname like this?
That fails in the same way for me without having to kill my network. |
Is that hitting the network? You probably need a pubspec.yaml that involves interacting with the server. |
I think travis runs in GCP, from looking at a few of their IPs. But it's certainly possible that DNS lookup failures are hard to produce this way. I can't find any docs about GCP, but in EC2 dns lookup failures are to be retried. And I suspect we don't attempt this, as we have some undocumented os error code we do not retry. We tried to do some archaeology on why we don't retry these specific error codes, but all we could find was something about migrating from Lines 75 to 87 in d150679
I suspect one of these could be DNS lookup related. In any cases, retrying any I/O related error from the OS is not a risk. The worst that can happen is that it takes longer time before the end-user is informed that: (A) you don't have internet; (B) the server you specified doesn't exist. |
By coincidence, I just had this in Dart-Code's integration tests on Travis. It's the same failure as my contrived example above, but real (a DNS failure, and not retried):
I don't know the underlying cause (whether it's Travis side of the DNS servers for pub), but I do suspect it had retried a few times with a delay it probably would've subsequently worked. |
Also, I think there's something missing from that error. The full job log is at https://travis-ci.org/Dart-Code/Dart-Code/jobs/610241513. |
This is exactly the error codes I propose we don't handle in #2254. I probably still have to make it more graceful, but DNS lookups should be retried. It seems a bit unintuitive because the OS does DNS resolution for us, but I certainly recall having this issue in EC2 a few years ago. |
Hi, is there a way to limit the number of retries? And maybe the time between retries, too? I don't want our build server to try the operation an unlimited number of times. The last mistyped URL for example caused it to retry 232 times before I killed it. |
this is still an issue, the flutter action will run indefinitely and thereby eating up run time. any way to limit this? |
It would be really cool to set a limit on retries for pub get. Using VSCode extensions I can only stop with |
Yep, this cost me my 3000 monthly CI minutes. That's not good. This has to be fixed by including a default timeout, that could be changed if necessary. |
Facing the same issue, costs a lot of CI minutes. |
We run our own CI environment for around 300 developers. Having builds that are not failing in a timely manner and keep occupying build nodes when they hit an error on pub get are creating bottlenecks and accumulate unnecessary cost for us. |
Same issue in our CI. Add a flag to set limit on retries of pub get would be necessary. |
I fixed this issue by using the GitHub actions' CI feature |
That's my solution as well. By the way, depending on project size, 1 minute might be a bit low. |
We removed the infinite retries in |
Hi,
just stumbled upon this here when trying to add a new package to the project and made a typo. Instead of just getting the message that the package doesn't exist I got this here
and this repeats forever.
I'm on
The text was updated successfully, but these errors were encountered: