-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Java QpsWorker produces many exceptions #2372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree that log file is very big, but I don't see the exceptions you are talking about. Could you paste a bit of it here or say which scenarie it is in? The most output for a test I saw was:
|
@ejona86 Its a huge number of "half closed" already called exceptions. |
This doesn't fix and failure in the qps test, but causes it to produce less useless log output of the form: java.lang.IllegalStateException: call was half-closed at com.google.common.base.Preconditions.checkState(Preconditions.java:174) at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:380) at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:299) at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:406) at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:400) at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:382) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessageRead.runInContext(ClientCallImpl.java:473) ... 7 more Fixes grpc#2372
This doesn't impact test behavior per-se, but causes it to produce less useless log output of the form: java.lang.IllegalStateException: call was half-closed at com.google.common.base.Preconditions.checkState(Preconditions.java:174) at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:380) at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:299) at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:406) at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:400) at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:382) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessageRead.runInContext(ClientCallImpl.java:473) ... 7 more Fixes grpc#2372
This doesn't impact test behavior per-se, but causes it to produce less useless log output of the form: java.lang.IllegalStateException: call was half-closed at com.google.common.base.Preconditions.checkState(Preconditions.java:174) at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:380) at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:299) at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:406) at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:400) at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:382) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessageRead.runInContext(ClientCallImpl.java:473) ... 7 more Fixes grpc#2372
This doesn't impact test behavior per-se, but causes it to produce less useless log output of the form: java.lang.IllegalStateException: call was half-closed at com.google.common.base.Preconditions.checkState(Preconditions.java:174) at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:380) at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:299) at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:406) at io.grpc.benchmarks.driver.LoadClient$AsyncPingPongWorker$1.onNext(LoadClient.java:400) at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:382) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessageRead.runInContext(ClientCallImpl.java:473) ... 7 more Fixes #2372
Thanks! Now the log went from 90MB to 60MB and there are still some unwanted log messages:
|
Reopened this issue. |
@jtattermusch, I'm finding the performance tests far too hard to use. All I want to do is run it locally to test. I've spent hours trying to find the right script and am still pretty discontent.
I remember at one point you had to use environment variables to drive some of this, but I can't find a script to get me going further in that direction. |
Not sure how running performance tests could be any easier. If you want to run just one scenario, just run:
in the local mode, that just builds java and c++ (c++ needed because you need qps_json_driver) and runs the scenario with the right parameters. If you really DO want to use qps_json_driver directly (your absolutely shouldn't need to), you can run |
@jtattermusch In general, every script inside the tools/ directory doesn't work off the bat. They all require extra setup. Running the command you just posted results in a compilation error:
|
@carl-mastrangelo I didn't have that problem. Maybe you need a |
Nope, that has already been done. |
It looks like the build process prefers the system protobuf instead of the one included in the submodule. Forcing it to use the repo one (why isn't this the default?) Gets a little further:
|
@jtattermusch, I don't require the process to be super-easy and I expect bumps, but I write below to give you a glimpse that I tried. This is unlikely exhaustive, but it should be clear I'm suffering from death of a thousand cuts. Many of these issues seem addressable. And all of this is with having a little bit of knowledge of the overall architecture. If I didn't have that, I'd have been even more hosed.
Is there any documentation, if I don't already know the proper script? There are lots of scripts. It's really hard to find the "top level" ones. And while digging through it's not clear what the prerequisites are to these scripts, which I will grant is partially because I don't know what they do or when they are used or what their purpose is. I'm okay with digging in some, but some hints go a long way. Maybe I just don't know where the hints are. My # 1 goal was to find a command to run to reproduce results. I saw What are the workers? I only see one java worker in the logs. Surely there is more than one worker. It seemed C++ was in the mix because I saw C core-style log statements. But is it the client, or the server? Is C++ always used? I still am not quite certain. It seems maybe only Java is used. If I grep for java_protobuf_async_streaming_ping_pong_secure I don't find anything. But now that I read scenario_config.py I see that if I would have searched for java_protobuf_async_streaming_ping_pong_ then I'd have found it. I see a call to If I have to know something out-of-band, like QPS_WORKERS, then drop me a bone if it is missing. Don't try to connect to corrupt hostname strings. In this case, I knew that some scripts used a magical environment variable, but couldn't figure out its name or format. I didn't know that particular script needed it, but if I could have found more information I would have tried it just to see if it fixed my problem. It doesn't seem like the format of QPS_WORKERS is documented anywhere (not even a comment in the code). The fact that the order matters is only clear if you read the majority of RunScenario in driver.cc. That isn't going well, so I start at the top of the logs and see run_performance_tests and run it and it runs things locally. Unfortunately, it is taking forever. I used The log output was really hard for me to get through. It was a long time before I found where a scenario started ("RUNNING SCENARIO"), because I stopped looking higher when I saw "PASSED: qps_worker_node_0". But then I couldn't find the scenario name. So I look higher and I see things like "START: qps_json_driver.quit". "Well, that certainly seems like it would be part of the previous test." But no. And I see stuff like "QPS worker "grpc-performance-server-8core:10400" is still running", as part of qps_json_driver.quit. Which worries me because that contribute to the errors I'm seeing, and I also think "You had one job!" Even now I have trouble finding RUNNING SCENARIO, because it blends in with the JSON; I have to Ctrl+F to find it. Also, the RUNNING SCENARIO is after the command I need to run to produce that scenario. I expected the highest level script to demarcate its work. This log snippet is obviously supposed to deal with Java because it has java in the name, but there is no Java log output. Wat? You can see the
I'd love not to need to, but I didn't want to run a full benchmark. I wanted to just run it for a little while to test. I just want warmup_seconds = 1 and benchmark_seconds = 1. If that was an argument that'd be fine. Since I also hadn't figured out |
I have no idea why, but it seems like the log spam dropped 10x. The change I made should have only affected shutdown scenarios, not the invalid data frames. Since it isn't reproducing anymore, I am not sure what else there is to do here. |
On Fri, Oct 28, 2016 at 8:40 PM, Eric Anderson [email protected]
I wrote this one a while ago, to make identifying the top level scripts I agree there's a lot of scripts in the tools/run_tests.py. That's
It uses an env variable, which I think is confusing. Of course this is
The capabilities of jobset.py are a bit limited so I think I've implemented
Thanks for the feedback!
|
We've recently enabled printing output from QPS workers in the benchmark suite, but the logs are 93MB big, which makes them completely useless.
It seems that it's mostly because java worker is super chatty (tons of exceptions get printed to the console)
https://grpc-testing.appspot.com/view/Performance/job/gRPC_performance_experiment/3192/consoleFull
The text was updated successfully, but these errors were encountered: