Skip to content

Conversation

virajjasani
Copy link
Contributor

No description provided.

@virajjasani virajjasani force-pushed the HADOOP-17115-trunk branch from c6b6e5b to 0db00ee Compare May 6, 2021 18:08
@virajjasani
Copy link
Contributor Author

@aajisaka If you would like to take a look while QA is in progress (QA here is full build, hence will take 17+ hr for sure).

@hadoop-yetus

This comment has been minimized.

@virajjasani
Copy link
Contributor Author

virajjasani commented May 8, 2021

So far only one build was able to successfully post QA results back to PR, rest have been aborted (I think it is expected when full build is run).
However, I have taken screenshots of many of aborted builds QA results (that were not posted on this PR):

Screenshot 2021-05-07 at 7 53 44 PM

Screenshot 2021-05-07 at 7 54 08 PM

Screenshot 2021-05-07 at 7 54 35 PM

Screenshot 2021-05-08 at 5 25 44 PM

Screenshot 2021-05-08 at 5 26 12 PM

Screenshot 2021-05-08 at 5 26 48 PM

Screenshot 2021-05-09 at 3 48 11 PM

Screenshot 2021-05-09 at 3 48 55 PM

Screenshot 2021-05-09 at 3 49 18 PM

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

Copy link
Contributor

@busbey busbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. A few minor issues noted below.

I'd be more comfortable with this patch if we got solid runs out of QA, especially since there's a banned import rule added. I think to do that we'll need this broken up into multiple PRS. It'd be best to do import bans for the parts of the project that have been converted as we go.

  • first the utility definition and changes to hadoop-common and hadoop-tools
  • dependent on the first PR changes to yarn
  • dependent on the first PR changes to HDFS
  • dependent on the first PR changes to mapreduce
  • after the rest, add the top-level banned import rule.

This also minimizes the amount of churn if nightly should fail and we need to back out a particular PR from branch(es).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this in org.apache.hadoop.util instead of referencing a thirdparty in the package? Doing so also avoids us needing a package-info.java for the new package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep nonguava package so that we have a clear view of what has been implemented to replace guava. Otherwise, it will be not straightforward to see which classes were coming from hadoop.util vs which were added recently.
Later, when moving out of guava is complete, we can move those classes back into util if necessary.

avoids us needing a package-info.java for the new package.

package-info is not a big deal to add.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already classes in o.a.hadoop.util with an origin in guava. (i e. LimitInputStream) why the distinction?

Copy link
Contributor Author

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure let me restrict this PR to only hadoop-common and hadoop-tools only and once merged, will create dependent PRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@amahussein
Copy link
Contributor

Sure let me restrict this PR to only hadoop-common and hadoop-tools only and once merged, will create dependent PRs

@virajjasani thanks for the changes.

I agree with @busbey that the PR should be split into modules.
However, since you already got an idea of what wrappers need to be implemented to replace the {{Guava.sets}}, it may be better to create a PR only for the wrappers (including TreeSets), then once that code is merged, you can replace the guava.sets in every module.

I am a little bit sure I understand why Sets.newHashSet(E element...) should be used. This is coming from guava.

@amahussein
Copy link
Contributor

So far only one build was able to successfully post QA results back to PR, rest have been aborted (I think it is expected when full build is run).
However, I have taken screenshots of many of aborted builds QA results (that were not posted on this PR):

@virajjasani Thanks for making sure there is record of all the unit tests results. I can imaging how much time and effort it took to take a snapshot and upload it.
For next times, can you please use text format (i.e., copy and paste). This will enable searching to find a specific class in those test results.

@virajjasani
Copy link
Contributor Author

@busbey There is one problem here though. Adding maven-enforcer-plugin to ban imports is somehow not working at top level module (e.g hadoop-common-project), it works only in sub-modules (e.g hadoop-common). Hence, we will have to add it in all sub-modules: hadoop-common, hadoop-auth, hadoop-annotations etc under hadoop-common project.
Do you have any recommendation here? Perhaps we can add in all sub-modules for now and remove from all when we do final PR and keep only in central project pom?

@virajjasani virajjasani force-pushed the HADOOP-17115-trunk branch from 0db00ee to 5cfb6e8 Compare May 10, 2021 17:58
@virajjasani virajjasani changed the title HADOOP-17115. Replace Guava Sets usage by Hadoop's own Sets HADOOP-17115. Replace Guava Sets usage by Hadoop's own Sets in hadoop-common and hadoop-tools May 10, 2021
@virajjasani
Copy link
Contributor Author

However, since you already got an idea of what wrappers need to be implemented to replace the {{Guava.sets}}, it may be better to create a PR only for the wrappers (including TreeSets), then once that code is merged, you can replace the guava.sets in every module.

I just saw this comment when page refreshed. I think this idea also looks nice and clean, but I believe if we cover hadoop-common and hadoop-tools with initial change, that one also looks clean and we have few usage in place for first commit. WDYT?

@virajjasani virajjasani force-pushed the HADOOP-17115-trunk branch from 5cfb6e8 to ce13623 Compare May 10, 2021 18:22
@virajjasani virajjasani requested a review from busbey May 10, 2021 18:23
@busbey
Copy link
Contributor

busbey commented May 10, 2021

There is one problem here though. Adding maven-enforcer-plugin to ban imports is somehow not working at top level module (e.g hadoop-common-project), it works only in sub-modules (e.g hadoop-common). Hence, we will have to add it in all sub-modules: hadoop-common, hadoop-auth, hadoop-annotations etc under hadoop-common project.
Do you have any recommendation here? Perhaps we can add in all sub-modules for now and remove from all when we do final PR and keep only in central project pom?

Describe the failure? Is it not complaining when a banned import is present?

@virajjasani
Copy link
Contributor Author

virajjasani commented May 10, 2021

Describe the failure? Is it not complaining when a banned import is present?

That's correct. It doesn't complain. We can proceed with either of these options:

  1. Keep the plugin present in affected sub-modules (as is the current state of this PR), or
  2. Do not use plugin for now and use it in the last PR with the only change to update main pom.

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@virajjasani
Copy link
Contributor Author

virajjasani commented May 12, 2021

@busbey I guess we can't do much about this Javac warning:

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Sets.java:80:47:[unchecked] Possible heap pollution from parameterized vararg type E

Copy link
Contributor

@busbey busbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to see if I can reproduce the enforcer failure and see what's up. Fix for the Javac warnings below.

@virajjasani virajjasani force-pushed the HADOOP-17115-trunk branch from ce13623 to a314771 Compare May 12, 2021 19:13
@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@virajjasani virajjasani force-pushed the HADOOP-17115-trunk branch from a314771 to 92a7595 Compare May 14, 2021 13:40
@hadoop-yetus

This comment has been minimized.

@virajjasani virajjasani force-pushed the HADOOP-17115-trunk branch from 92a7595 to 0005b92 Compare May 14, 2021 20:02
@hadoop-yetus

This comment has been minimized.

Copy link
Contributor

@busbey busbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe the failure? Is it not complaining when a banned import is present?
That's correct. It doesn't complain. We can proceed with either of these options:

  • Keep the plugin present in affected sub-modules (as is the current state of this PR), or
  • Do not use plugin for now and use it in the last PR with the only change to update main pom.

I had things work fine for me by adding the plugin to the build section for the hadoop-project pom. That's where we should eventually put the definition. Until all modules are ready for it though we either need to do specific modules like you've got here or put it into a profile that modules can opt-in or opt-out of. requiring opt-out for modules means you could mark the module we know will be done as follow-ups to this PR and avoid folks introducing a new use in some module that currently doesn't have any. I'm fine with either approach; I think the opt-out provides better coverage but the current definition per module is easier for folks to reason about if they're not already as familiar with the behavior of maven profiles.

I think this is close to landing. I have one correctness comment below, a minor license tracking concern, and some javadoc cleanup.

@amahussein please follow up here if there's anything you feel strongly about in reviewing that isn't covered in the current patch + feedback.

Copy link
Contributor Author

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Addressed all concerns and haven't squashed commit to identify clear difference.

I think the opt-out provides better coverage but the current definition per module is easier for folks to reason about if they're not already as familiar with the behavior of maven profiles.

True, I think we can continue with current behaviour as is and as part of last PR, remove from all sub-modules and just keep on high level pom.

@hadoop-yetus

This comment has been minimized.

@virajjasani virajjasani force-pushed the HADOOP-17115-trunk branch from 7602bb6 to 7c63f9a Compare May 16, 2021 12:32
@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@virajjasani
Copy link
Contributor Author

Thanks @busbey, let me squash all commits into single one.

@virajjasani virajjasani force-pushed the HADOOP-17115-trunk branch from 67685bd to c5ccb5c Compare May 18, 2021 13:51
@busbey
Copy link
Contributor

busbey commented May 18, 2021

Next time I'd recommend just having the committer take care of squashing. That'll save waiting to get QA results on the new commit ref.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 1s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 8 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 16m 10s Maven dependency ordering for branch
+1 💚 mvninstall 20m 21s trunk passed
+1 💚 compile 23m 33s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 compile 18m 37s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 checkstyle 3m 46s trunk passed
+1 💚 mvnsite 5m 31s trunk passed
+1 💚 javadoc 4m 29s trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 5m 2s trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 7m 41s trunk passed
+1 💚 shadedclient 14m 42s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 3m 14s the patch passed
+1 💚 compile 20m 5s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javac 20m 5s the patch passed
+1 💚 compile 18m 4s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 javac 18m 4s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 45s root: The patch generated 0 new + 81 unchanged - 1 fixed = 81 total (was 82)
+1 💚 mvnsite 5m 29s the patch passed
+1 💚 xml 0m 7s The patch has no ill-formed XML file.
+1 💚 javadoc 4m 30s the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 5m 1s the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 💚 spotbugs 8m 48s the patch passed
+1 💚 shadedclient 14m 48s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 16m 59s hadoop-common in the patch passed.
+1 💚 unit 3m 39s hadoop-kms in the patch passed.
+1 💚 unit 19m 13s hadoop-distcp in the patch passed.
+1 💚 unit 11m 43s hadoop-dynamometer-infra in the patch passed.
+1 💚 unit 13m 3s hadoop-dynamometer in the patch passed.
+1 💚 unit 2m 13s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 58s The patch does not generate ASF License warnings.
276m 37s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2985/39/artifact/out/Dockerfile
GITHUB PR #2985
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell xml spotbugs checkstyle
uname Linux 5494fc3cda95 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c5ccb5c
Default Java Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2985/39/testReport/
Max. process+thread count 1251 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-tools/hadoop-distcp hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-infra hadoop-tools/hadoop-dynamometer hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2985/39/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@busbey busbey merged commit e4062ad into apache:trunk May 20, 2021
@virajjasani virajjasani deleted the HADOOP-17115-trunk branch May 20, 2021 18:16
kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
DremioQA pushed a commit to dremio/hadoop that referenced this pull request Jun 14, 2023
… own Sets in hadoop-common and hadoop-tools (apache#2985)

Signed-off-by: Sean Busbey <[email protected]>
(cherry picked from commit e4062ad)

Change-Id: Id7c3b050d4179fea9b6d5cd904f9b680d55eab3e
DremioQA pushed a commit to dremio/hadoop that referenced this pull request Feb 2, 2024
… own Sets in hadoop-common and hadoop-tools (apache#2985)

Signed-off-by: Sean Busbey <[email protected]>
(cherry picked from commit e4062ad)

Change-Id: Id7c3b050d4179fea9b6d5cd904f9b680d55eab3e
DremioQA pushed a commit to dremio/hadoop that referenced this pull request Apr 5, 2024
This list captures the current state of non-upstream changes in our branch
that are not in the public repo.

---Changes cherry-picked to branch-3.3.6-dremio from branch-3.3.2-dremio---
The below changes were on branch-3.3.2-dremio and needed to be brought to
branch-3.3.6-dremio to prevent regressing scenarios these changes addressed.

HADOOP-18928: S3AFileSystem URL encodes twice where Path has trailing / (proposed)
DX-69726: Bumping okie from 1.6.0 to 3.4.0 (CVE-2023-3635)
DX-69726: Bumping okie from 1.6.0 to 3.4.0 (CVE-2023-3635)
DX-66470: Allow for custom shared key signer for ABFS
DX-66673: Backport HADOOP-18602. Remove netty3 dependency
DX-66673: Backport MAPREDUCE-7434. Fix ShuffleHandler tests. Contributed by Tamas Domok
DX-66673: Backport MAPREDUCE-7431. ShuffleHandler refactor and fix after Netty4 upgrade. (apache#5311)
DX-66673: Backport HADOOP-15327. Upgrade MR ShuffleHandler to use Netty4 apache#3259. Contributed by Szilard Nemeth.
DX-66673: Backport HADOOP-17115. Replace Guava Sets usage by Hadoop's own Sets in hadoop-common and hadoop-tools (apache#2985)
HADOOP-18676. jettison dependency override in hadoop-common lib
DX-52816: Downgrade azure-data-lake-store-sdk to 2.3.3 to support dremio version.
DX-52701: Remove node based module by Naveen Kumar
DX-32012: Adding BatchList Iterator for ListFiles by “ajmeera.nagaraju”
DX-18552: Make file status check optional in S3AFileSystem create()
Add flag to skip native tests by Laurent Goujon
DX-21904: Support S3 requester-pays headers by Brandon Huang
DX-21471: Fix checking of use of OAuth credentials with AzureNativeFileSystem
DX-19314: make new kms format configurable
DX-17058 Add FileSystem to META-INF/services
DX-17317 Fix incorrect parameter passed into AzureADAuthenticator-getTokenUsingClientCreds by TiffanyLam
DX-17276 Azure AD support for StorageV1 by James Duong
DX-17276 Add Azure AD support in Dremio's hadoop-azure library for Storage V1 support
unwraps BindException in HttpServer2

---Changes picked up by moving to 3.3.6---
The below changes were changes on branch-3.3.2-dremio that did not need to
come to branch-3.3.6-dremio as the public 3.3.6 branch contained the fixes
already.

DX-67500: Backport HADOOP-18136. Verify FileUtils.unTar() handling of missing .tar files.
DX-66673: Backport HADOOP-18079. Upgrade Netty to 4.1.77. (apache#3977)
DX-66673: Backport HADOOP-11245. Update NFS gateway to use Netty4 (apache#2832) (apache#4997)
DX-64051: Bump jettison from 1.1 to 1.5.4 in hadoop/branch-3.3.2-dremio
DX-64051: Bump jettison from 1.1 to 1.5.4 in hadoop/branch-3.3.2-dremio
DX-63800 Bump commons-net from 3.6 to 3.9.0 to address CVE-2021-37533
DX-27168: removing org.codehaus.jackson

Change-Id: I6cdb968e33826105caff96e1c3d2c6313a550689
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants