-
Notifications
You must be signed in to change notification settings - Fork 14.7k
KAFKA-4049: Fix transient failure in RegexSourceIntegrationTest #1746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-4049: Fix transient failure in RegexSourceIntegrationTest #1746
Conversation
ping @dguy @enothereska @mjsax @ijuma @ewencp for reviews. |
} | ||
|
||
if (!testCondition.conditionMet()) { | ||
if (!conditionMet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change really required? It seems like pretty much any test that requires it is going to be fragile as it implies that the condition may only be met for < 100ms. Normally I would expect anything using waitForCondition
to expect that condition to switch to true and stay in that state until the test returns from waitForCondition
and takes further action such that the condition could change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
If we did hit this problem, then perhaps we should throw an exception if the condition changes between both checks with a message about fixing the condition to make the tests less fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm convinced, will change it back.
Jenkins failed on |
Discussed with @hachikuji about this issue, and it may be the result of KAFKA-3949, but it is a bit hard to confirm as I have not been able to re-produce locally. So I'll hold on this PR until that issue is fixed to see if it does resolve the transient failure. |
Waiting for consecutive jenkins builds on #1762 now. |
…into K4049-RegexSourceIntegrationTest-failure
#1762 seems not the root cause of this failure, will continue investigating in this PR. |
Triggered Jenkins build 5181 - 5185 for this commit. |
Apache Jenkins builds are not from expected commit, using https://jenkins.confluent.io/job/kafka-branch-builder/ instead, with consecutive 74 - 78 |
Consecutive tests in https://jenkins.confluent.io/job/kafka-branch-builder/ from 78 to 82. @ijuma mind taking a look again? |
public boolean conditionMet() { | ||
List<String> assignedTopics = testStreamThread.assignedTopicPartitions.get(SECOND_UPDATE); | ||
return assignedTopics != null && !assignedTopics.contains("TEST-TOPIC-A") && assignedTopics.contains("TEST-TOPIC-B"); | ||
System.out.println("deleted: " + testStreamThread.assignedTopicPartitions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was probably left by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm aware of the changes and I intended to delete them before merging after your reviews. I should be more explicit on these two prints.
@guozhangwang it's worth noting that this test never failed in Confluent Jenkins (as far as I could see), only Apache Jenkins. Apache Jenkins runs the tests with fork=1 while Confluent uses a higher fork count. Usually this makes Apache Jenkins more stable, but it may be having the opposite effect in this case (or there's another reason why the test fails in Apache Jenkins). If you want to be confident that this change fixes the issue, we'd want to run it in Apache Jenkins. |
This failure became quite rare at some point, last failure in trunk jdk7 build was a while back. Since this PR seems like an improvement anyway, LGTM (assuming the tests pass). If the test fails again, we can reopen the JIRA. |
Transient failure in Jenkins is not relevant; created https://issues.apache.org/jira/browse/KAFKA-4075 for it. |
Thanks @ijuma , merged to trunk. |
No description provided.