HDFS-17599. Fix the mismatch between locations and indices for mover #6979
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of PR
JIRA: HDFS-16557.
We set the EC policy to (6+3) and also have nodes that were in state
ENTERING_MAINTENANCE
.When we move the data of some directories from SSD to HDD, some blocks move fail due to disk full, as shown in the figure below (blk_-9223372033441574269).
We tried to move again and found the following error "
Replica does not exist
".Observing the information of fsck, it can be found that the wrong blockid(blk_-9223372033441574270) was found when moving block.
Mover Logs:

FSCK Info:

Root Cause:
Similar to this HDFS-16333, when mover is initialized, only the
LIVE
node is processed. As a result, the datanode in theENTERING_MAINTENANCE
state in the locations is filtered when initializingDBlockStriped
, but the indices are not adapted, resulting in a mismatch between the location and indices lengths. Finally, ec block calculates the wrong blockid when getting internal block (seeDBlockStriped#getInternalBlock
).Solution:
When initializing
DBlockStriped
, if any location is filtered out, we need to remove the corresponding element in the indices to do the adaptation.How was this patch tested?
Pass the unit test.