-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Analysis server hangs with Linux 5.5 and IntelliJ based IDEs #40589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Dart also hangs while compiling Flutter project. Dart process was stuck while generating kernel snapshot. Running on 5.4 LTS kernel everything works again.... |
Yeah this is probably more general than just the analyzer. |
Based on the symptoms this is Dart VM related - potentially a dead lock in the VM or a bug in dart:io implementation. If you could attach GDB and provide stack traces for all threads that would help before we get access to a machine with new Kernel which might take some time. |
I'm on Backtrace when the analysis server hangs
Backtrace when
|
@simolus3 The fact that it only affects IJ, not VSCode perhaps points to the issue with the plugin itself? @DanTup @devoncarew might suggest how to troubleshoot the plugin, enable logging of the interaction between IJ plugin and analysis server |
Mine was stuck at gradle build, nothing to do with IntelliJ. This was on a relative big Flutter project. Small empty flutter project compiles just fine. But when switching kernel to 5.4 there are no issues... Will see if I can add GDB thread dump soon |
The stack trace posted above does not seem to indicate any kind of locking deadlock in the VM, it is possible some IO event was missed and the event handler is not waking up indicating a potential issue with dart:io. |
I spinned up a cloud instance with 5.5.3 kernel. Going to run tests to see if it reproduces the issue. |
This appears to be affecting Flutter CLI users: flutter/flutter#49185 |
The following simple program would reproduce the hang on a compute instance with 5.5.x linux kernel. import 'dart:io' as io;
import 'dart:convert';
void main(List<String> args) async {
if (args.length != 1) {
print('Usage: run.dart child|parent');
return;
}
if (args[0] == 'child') {
for (int i = 0; i < 30000; i++) {
print('line $i');
}
print('done');
return;
} else if (args[0] == 'parent') {
final p = await io.Process.start(io.Platform.executable,[io.Platform.script.toFilePath(), 'child']);
p.stdout.transform(utf8.decoder).listen((x) => print('stdout: $x'));
p.stderr.transform(utf8.decoder).listen((x) => print('stderr: $x'));
final exitCode = await p.exitCode;
print('process exited with ${exitCode}');
}
}
The parent process would then hang. Based on my cursory analysis (I have never looked at this part of the code before) I think this is a bug in our code - we don't seem to be using I am not sure why this problem only surfaces now - but I see that there were some changes to the Kernel around epoll, so this might have caused it to surface. We should either stop using ET mode or we should fix our code to follow man page guidelines. Assigning to @zichangg for actual implementation work. |
The Fuchsia eventhandler is based on a mechanism that only has level-triggering, so that could serve as a guide for that route. However, I'd recommend trying to follow the guidance for edge-triggering first to see if that is a simpler change. |
Thanks for reproduction! @mraleph
I think the reason we used edge-triggered is to be able to controll the amount of bytes for each Yes, we didn't follow the rules. But I tried a c++ program locally, even if
I suspect this is relative to the amount of bytes to be read in the pipe. For your sample program, if the iteration down to 3000, it won't hang. Here is a similar one I found. https://bugzilla.kernel.org/show_bug.cgi?id=205933 |
@zichangg I just copied the changes in your hotfix in to |
That would not change anything. That file is not read by the VM. You would need to rebuild flutter engine artifacts (flutter host dart-sdk specifically) from source. |
Interesting - can you paste your reproduction here? I would be curious to look at it.
Yeah, I have tried this as well while reproducing the issue. It does fix my particular reproduction indeed though it still does not entirely follow the man page recommendations |
We should do a proper fix and backport it to stable. |
This is program I used locally. #define MAX_EVENTS 5
#define READ_SIZE 5
#include <stdio.h> // for fprintf()
#include <unistd.h> // for close(), read()
#include <sys/epoll.h> // for epoll_create1(), epoll_ctl(), struct epoll_event
#include <string.h> // for strncmp
int main()
{
int running = 10, event_count, i;
printf("Stdin will be added into ET epoll, each read only take 5 bytes at most\n");
size_t bytes_read;
char read_buffer[1000];
struct epoll_event event, events[MAX_EVENTS];
int epoll_fd = epoll_create1(0);
if(epoll_fd == -1)
{
fprintf(stderr, "Failed to create epoll file descriptor\n");
return 1;
}
event.events = EPOLLIN | EPOLLET;
event.data.fd = 0;
if(epoll_ctl(epoll_fd, EPOLL_CTL_ADD, 0, &event))
{
fprintf(stderr, "Failed to add file descriptor to epoll\n");
close(epoll_fd);
return 1;
}
while(running)
{
printf("\nPolling for input...\n");
event_count = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
printf("%d ready events\n", event_count);
for(i = 0; i < event_count; i++)
{
printf("Reading file descriptor '%d' -- ", events[i].data.fd);
bytes_read = read(events[i].data.fd, read_buffer, READ_SIZE);
printf("%zd bytes read.\n", bytes_read);
read_buffer[bytes_read] = '\0';
printf("Read %s \n", read_buffer);
}
running--;
}
if(close(epoll_fd))
{
fprintf(stderr, "Failed to close epoll file descriptor\n");
return 1;
}
return 0;
}
I think we should definitely follow man page. I was curious on why this problem only surfaces now.
The most straight-forward solution is to drain |
@zichangg If I test your program like so
It hangs even on my workstation with an older Kernel without reading the input properly. |
Same here. I'm on archlinux with latest linux kernel (5.5+) and |
On Kernel update on LTS (5.4.20-1 today) linux is down. |
I downgraded my Arch Linux from 5.5.4 earlier in the week to 5.4.15, which was the latest 5.4.x kernel in my cache, and it works. I got help from the Arch Linux forums and here is the post where I documented what I did incase it would help anyone else. As it turns out what I didn't wasn't the best method, jlg pointed out that using LTS maybe the best method, see here and the following post. |
A quick update: |
if NativeSocket.read() has not specified `len`, I will expect caller wants all bytes(this is not documented, correct me if I'm wrong). Check the available bytes after read(), if there are leftover, issue another read(). This will eventually drain the pipe. Bug: #40589 Change-Id: I3b0be9ef8258defd1f71f85b3154702401e0f31c Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/136322 Reviewed-by: Siva Annamalai <[email protected]> Reviewed-by: Lasse R.H. Nielsen <[email protected]> Commit-Queue: Zichang Guo <[email protected]>
The linked fix seems to be a Dart only change - did we not need to change the way we're polling in the C++ code? |
As I understand it, the Dart-side draining the socket is sufficient to satisfy the (newly enforced) epoll edge-triggering preconditions. |
Will there be a problem running Logs
|
The fixes in master haven't solved the problem yet. In Intellij much functionality seems to work once, then never again. No autocomplete. Corrected syntax errors remain as errors, which requires restarting intellij to make them go away. Oddly enough. dartfmt was consistently failing but seems to be working now. [✓] Flutter (Channel master, v1.15.4-pre.170, on Linux, locale en_US.UTF-8) There were several updates to various kernel epoll code recently in kernel 5.6 but they haven't changed the dart problems. 5.6.0-0.rc3.git0.1.fc33.x86_64 Flutter plugin 43.0.2 on AS 3.6 from Feb 12. |
@everyone You have to wait until next dart sdk release that contains the merged fix. flutter upgrade from master branch won't fix the issue for you now. |
So can we somehow compile the SDK ourselfs? Or is a hotfix sdk version being released in the next couple of days? Because Dart SDK releases are usually only every couple of months right? |
We will update flutter/flutter#49185 when this rolls into flutter. No estimates yet about when it will appear on Flutter stable. |
Since the change is landed, I'll close the issue. |
…in pipe Cherry-pick https://dart-review.googlesource.com/c/sdk/+/136322 and https://dart-review.googlesource.com/c/sdk/+/138909. Since several other cls has not been landed, this change only applies some necessary parts to fix the problem. Bug: #40589 Change-Id: I3b0be9ef8258defd1f71f85b3154702401e0f31c Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/136322 Reviewed-by: Siva Annamalai <[email protected]> Reviewed-by: Lasse R.H. Nielsen <[email protected]> Commit-Queue: Zichang Guo <[email protected]>
After upgrading to Linux 5.5 (I'm on
Linux 5.5.2-1-MANJARO
), analysis in IntelliJ stops working after a while (analysis errors view doesn't update anymore, formatting times out, no autocomplete, ...).The analyzer diagnostics page is still reachable after that happens, but it appears that the analysis server just stops serving regular requests. Interestingly, I can't reproduce this with VS Code. I tried deleting
~/.dartServer/
, but that didn't fix the problem.I first thought this was an IntelliJ problem, but flutter/flutter#49185 (comment) makes me think that this is analyzer related. I wanted to open another issue here because for me that also happens on the non-Flutter Dart SDK (both stable & latest dev).
I can reproduce this consistently and on different projects, so I'd be glad to provide more information if that's necessary.
The text was updated successfully, but these errors were encountered: