Skip to content

Conversation

liubingxing
Copy link
Contributor

@liubingxing liubingxing commented Jan 20, 2022

JIRA: HDFS-16432

image

In our cluster, namenode block report will held write lock for a long time if the storage block number more than 100000. So we want to add a yield mechanism in block reporting process to avoid holding write lock too long.

  1. Ensure that the processing of the same block is in the same write lock.
  2. Because StorageInfo.addBlock will moves the block to the head of blockList, so we can collect blocks that have not been reported by delimiter block.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 46s trunk passed
+1 💚 compile 1m 25s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 25s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 3s trunk passed
+1 💚 mvnsite 1m 34s trunk passed
+1 💚 javadoc 1m 4s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 35s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 12s trunk passed
+1 💚 shadedclient 23m 22s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 21s the patch passed
+1 💚 compile 1m 24s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 24s the patch passed
+1 💚 compile 1m 14s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 14s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 50s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 284 unchanged - 0 fixed = 286 total (was 284)
+1 💚 mvnsite 1m 18s the patch passed
+1 💚 javadoc 0m 54s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 27s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 19s the patch passed
+1 💚 shadedclient 22m 18s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 226m 22s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 45s The patch does not generate ASF License warnings.
327m 10s
Reason Tests
Failed junit tests hadoop.hdfs.TestRollingUpgrade
hadoop.tools.TestHdfsConfigFields
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3907/1/artifact/out/Dockerfile
GITHUB PR #3907
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 5ac6200321be 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0561a74c3f8adbb6c3df8d727f9bf3bf765ad401
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3907/1/testReport/
Max. process+thread count 3741 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3907/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 32s trunk passed
+1 💚 compile 1m 26s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 19s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 2s trunk passed
+1 💚 mvnsite 1m 31s trunk passed
+1 💚 javadoc 1m 3s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 14s trunk passed
+1 💚 shadedclient 22m 13s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 16s the patch passed
+1 💚 compile 1m 18s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 18s the patch passed
+1 💚 compile 1m 13s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 13s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 52s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 284 unchanged - 0 fixed = 286 total (was 284)
+1 💚 mvnsite 1m 20s the patch passed
+1 💚 javadoc 0m 58s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 31s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 57s the patch passed
+1 💚 shadedclient 24m 2s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 225m 59s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
326m 37s
Reason Tests
Failed junit tests hadoop.hdfs.TestRollingUpgrade
hadoop.tools.TestHdfsConfigFields
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3907/2/artifact/out/Dockerfile
GITHUB PR #3907
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux f446d3465fe1 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 09824a5f7cf93f8f544a0e6fdadac4ece694ec25
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3907/2/testReport/
Max. process+thread count 3802 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3907/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@liubingxing
Copy link
Contributor Author

liubingxing commented Jan 22, 2022

@tasanuma @Hexiaoqiao Please take a look.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 33m 26s trunk passed
+1 💚 compile 1m 27s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 20s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 0s trunk passed
+1 💚 mvnsite 1m 29s trunk passed
+1 💚 javadoc 1m 5s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 32s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 28s trunk passed
+1 💚 shadedclient 23m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 compile 1m 23s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 23s the patch passed
+1 💚 compile 1m 18s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 18s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 55s the patch passed
+1 💚 mvnsite 1m 23s the patch passed
+1 💚 javadoc 0m 53s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 25s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 25s the patch passed
+1 💚 shadedclient 22m 47s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 233m 43s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 43s The patch does not generate ASF License warnings.
335m 36s
Reason Tests
Failed junit tests hadoop.tools.TestHdfsConfigFields
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3907/3/artifact/out/Dockerfile
GITHUB PR #3907
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 89dfda1372da 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 2367054
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3907/3/testReport/
Max. process+thread count 3928 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3907/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@jojochuang
Copy link
Contributor

Thanks for opening a PR.
However, I am not so sure if this is the right approach. The whole design namenode assumes a global namespace lock and within it, data structure access is consistent. Once you yield in the middle, there's no such guarantee and it could run into data races.

As for the block report taking time -- is it still a problem after removing the folded tree set structure (HDFS-13671)?

@liubingxing
Copy link
Contributor Author

Thanks @jojochuang for your comment. The block report taking time we reported in this PR was obtained after merging HDFS-13671. I think that the block report time will become longer after merging HDFS-13671 because FoldedTreeSet have a better performance in dealing FBR.

@sodonnel
Copy link
Contributor

sodonnel commented Feb 8, 2022

I am also concerned about yielding the lock in the middle of processing a storage. I would be interested to understand where the time is spend doing the block report processing inside the namenode, to see if we can improve the performance of the code and avoid yielding the lock.

Have you tried reproducing this problem on a small otherwise idle cluster and attaching the async profiler to the namenode process to see where the code is taking time during block reporting? I think that analysis would be valuable before we change the locking, as there may be something which can be improved around the performance.

@liubingxing
Copy link
Contributor Author

liubingxing commented Feb 9, 2022

Hi @sodonnel, I have tried reproducing this problem on a test cluster and the cpu profile of block report process look like as follow.
image
image
image

It looks like most of the time were taking by the #processReportedBlock and #moveBlockToHead, and the #moveBlockToHead is necessary after removing the folded tree set structure.
If you have any good suggestions, please let me know.

@sodonnel
Copy link
Contributor

sodonnel commented Feb 9, 2022

@liubingxing Could you attach the flame chart svg file to the Jira so I can look at it in more detail please?

@liubingxing
Copy link
Contributor Author

@sodonnel I attached the flame chart svg file to the Jira.

@yuanboliu
Copy link
Member

@liubingxing sorry to interrupt, Have you ever tried this patch in your production clusters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants