Skip to content

Conversation

hzh0425
Copy link
Member

@hzh0425 hzh0425 commented Jun 11, 2022

Make sure set the target branch to develop

What is the purpose of the change

tracking issue: #4330

Brief changelog

Use confirm offset in ReputMessageService

Verifying this change

XXXX

Follow this checklist to help us incorporate your contribution quickly and easily. Notice, it would be helpful if you could finish the following 5 checklist(the last one is not necessary)before request the community to review your PR.

  • Make sure there is a Github issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a Github issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Format the pull request title like [ISSUE #123] Fix UnknownException when host config not exist. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test(over 80% coverage) to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add integration-test in test module.
  • Run mvn -B clean apache-rat:check findbugs:findbugs checkstyle:checkstyle to make sure basic checks pass. Run mvn clean install -DskipITs to make sure unit-test pass. Run mvn clean test-compile failsafe:integration-test to make sure integration-test pass.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Copy link
Contributor

@RongtongJin RongtongJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在doReut方法中当已controller方式启动时没有开启ConfirmOffset的能力

Comment on lines 223 to 234
if (this.defaultMessageStore.getMessageStoreConfig().getBrokerRole() == BrokerRole.SYNC_MASTER) {
final Set<String> currentSyncStateSet = getSyncStateSet();
long confirmOffset = this.defaultMessageStore.getMaxPhyOffset();
for (HAConnection connection : this.connectionList) {
if (currentSyncStateSet.contains(connection.getClientAddress())) {
confirmOffset = Math.min(confirmOffset, connection.getSlaveAckOffset());
}
}
return confirmOffset;
} else {
if (this.haClient != null) {
return this.haClient.getConfirmOffset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以做一个优化。。把confirmOffset记录下来,利用空间换时间,再每次slave ack的时候更新。。不然每次reput一条消息都会调用一下,比较消耗资源。

@hzh0425
Copy link
Member Author

hzh0425 commented Jun 11, 2022

问题场景: 网络分区下, master 与 slave, master 与 controller 都分区了, 但是没有和 client 分区. client 继续往 master 发送消息. 虽然在 allAckSyncStateSet = true 下, 不会发送成功, 但是这些消息仍然会被 reput by reputMessageService.
后续切换 master 的时候, 就会出现消息丢失的现象.
解决办法就是让 reputMessageService 利用 autoSwitchHaSerivce 中的 ConfirmOffset 作为 reput 边界, 这样当消息发送失败时 (allAckSyncStateSet = true), confirmOffset 不会增大, reputMessageService 就不会 reput 这些失败的消息.

@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2022

Codecov Report

Merging #4449 (679d268) into 5.0.0-beta-dledger-controller (81dc938) will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@                         Coverage Diff                         @@
##             5.0.0-beta-dledger-controller    #4449      +/-   ##
===================================================================
+ Coverage                            43.17%   43.28%   +0.10%     
- Complexity                            6343     6369      +26     
===================================================================
  Files                                  841      841              
  Lines                                59824    59857      +33     
  Branches                              8152     8161       +9     
===================================================================
+ Hits                                 25830    25909      +79     
+ Misses                               30643    30594      -49     
- Partials                              3351     3354       +3     
Impacted Files Coverage Δ
...org/apache/rocketmq/store/DefaultMessageStore.java 55.85% <100.00%> (+1.97%) ⬆️
...cketmq/store/ha/autoswitch/AutoSwitchHAClient.java 75.93% <100.00%> (+0.46%) ⬆️
...mq/store/ha/autoswitch/AutoSwitchHAConnection.java 73.51% <100.00%> (-0.20%) ⬇️
...ketmq/store/ha/autoswitch/AutoSwitchHAService.java 65.06% <100.00%> (+7.82%) ⬆️
...org/apache/rocketmq/common/stats/StatsItemSet.java 41.79% <0.00%> (-8.96%) ⬇️
...ache/rocketmq/common/stats/MomentStatsItemSet.java 39.13% <0.00%> (-4.35%) ⬇️
...rocketmq/common/message/MessageClientIDSetter.java 71.42% <0.00%> (-2.39%) ⬇️
...va/org/apache/rocketmq/logging/inner/Appender.java 30.33% <0.00%> (-2.25%) ⬇️
...q/controller/impl/manager/ReplicasInfoManager.java 63.22% <0.00%> (-1.29%) ⬇️
...a/org/apache/rocketmq/store/StoreStatsService.java 49.17% <0.00%> (-1.11%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81dc938...679d268. Read the comment docs.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 47.298% when pulling 5882be7 on hzh0425:feature/use-confirmOffset into 6645d0a on apache:5.0.0-beta-dledger-controller.

@coveralls
Copy link

coveralls commented Jun 11, 2022

Coverage Status

Coverage increased (+0.2%) to 47.503% when pulling 679d268 on hzh0425:feature/use-confirmOffset into 6645d0a on apache:5.0.0-beta-dledger-controller.

@RongtongJin RongtongJin merged commit f038cf9 into apache:5.0.0-beta-dledger-controller Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants