Skip to content

Conversation

cnauroth
Copy link
Contributor

@cnauroth cnauroth commented Aug 13, 2025

Description of PR

Add native support for GCS connector

How was this patch tested?

The new module contains integration tests that we've run against a live GCS bucket.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 3s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 21 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 10m 23s Maven dependency ordering for branch
+1 💚 mvninstall 19m 13s trunk passed
+1 💚 compile 8m 20s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 7m 31s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 1m 59s trunk passed
+1 💚 mvnsite 1m 26s trunk passed
+1 💚 javadoc 2m 10s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 50s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+0 🆗 spotbugs 0m 32s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 24m 11s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 24m 27s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 2m 53s Maven dependency ordering for patch
+1 💚 mvninstall 11m 44s the patch passed
+1 💚 compile 8m 5s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 8m 5s the patch passed
+1 💚 compile 7m 19s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 7m 19s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 59s the patch passed
+1 💚 mvnsite 1m 50s the patch passed
+1 💚 javadoc 2m 34s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 2m 18s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+0 🆗 spotbugs 0m 27s hadoop-project has no data from spotbugs
+1 💚 shadedclient 24m 25s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 25s hadoop-project in the patch passed.
+1 💚 unit 0m 31s hadoop-gcp in the patch passed.
+1 💚 unit 71m 26s hadoop-tools in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
221m 1s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7869/1/artifact/out/Dockerfile
GITHUB PR #7869
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname Linux c884f9f4e391 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / faff8a4
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7869/1/testReport/
Max. process+thread count 1028 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-tools/hadoop-gcp hadoop-tools U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7869/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to remove the JUnit4 dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! I sent up #7872 for this.

@@ -108,7 +108,7 @@
<findbugs.version>3.0.5</findbugs.version>
<dnsjava.version>3.6.1</dnsjava.version>

<guava.version>27.0-jre</guava.version>
<guava.version>33.1.0-jre</guava.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we update the version of the JAR package, the LICENSE-binary file should also be updated accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching import com.google.common returns no results, just wondering how upgrading Guava is related to this PR?

No offense, but given that there were many painful experiences with Guava for Hadoop ecosystem projects, I think we'd better be careful to introduce new components that hardly depend on Guava, especially one that requires a specific version of Guava.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guava is a dependency of the GCS SDK. Without this change, there is a dependency convergence problem:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.5.0:enforce (depcheck) on project hadoop-gcp: 
[ERROR] Rule 0: org.apache.maven.enforcer.rules.dependency.DependencyConvergence failed with message:
[ERROR] Failed while enforcing releasability.
[ERROR] 
[ERROR] Dependency convergence error for org.codehaus.mojo:animal-sniffer-annotations:jar:1.17 paths to dependency are:
[ERROR] +-org.apache.hadoop:hadoop-gcp:jar:3.5.0-SNAPSHOT
[ERROR]   +-com.google.cloud:google-cloud-storage:jar:2.52.0:compile
[ERROR]     +-com.google.guava:guava:jar:27.0-jre:compile
[ERROR]       +-org.codehaus.mojo:animal-sniffer-annotations:jar:1.17:compile
[ERROR] and
[ERROR] +-org.apache.hadoop:hadoop-gcp:jar:3.5.0-SNAPSHOT
[ERROR]   +-com.google.cloud:google-cloud-storage:jar:2.52.0:compile
[ERROR]     +-org.codehaus.mojo:animal-sniffer-annotations:jar:1.24:compile

However, we don't necessarily need to upgrade it project-wide. I sent up #7883 to revert this change in hadoop-project/pom.xml and mention the versioning needs entirely within hadoop-gcp/pom.xml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up needing to revert #7883, so as it stands in the feature branch right now, it still has Guava upgrade in hadoop-project/pom.xml. Details here:

#7883 (comment)

arunkumarchacko and others added 17 commits August 29, 2025 16:36
Closes apache#7761

Co-authored-by: Chris Nauroth <[email protected]>
Signed-off-by: Chris Nauroth <[email protected]>
Closes apache#7779

Co-authored-by: Chris Nauroth <[email protected]>
Signed-off-by: Chris Nauroth <[email protected]>
Closes apache#7877

Signed-off-by: Ayush Saxena <[email protected]>
Reviewed-by: Arunkumar Chacko <[email protected]>
Closes apache#7874

Signed-off-by: Steve Loughran <[email protected]>
Reviewed-by: Arunkumar Chacko <[email protected]>
Reviewed-by: Cheng Pan <[email protected]>
… and mark exclusion in hadoop-tools-dist.

Closes apache#7904

Signed-off-by: Shilun Fan <[email protected]>
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 20s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 22 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 9m 44s Maven dependency ordering for branch
+1 💚 mvninstall 19m 11s trunk passed
+1 💚 compile 8m 26s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 7m 24s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 2m 1s trunk passed
+1 💚 mvnsite 15m 50s trunk passed
+1 💚 javadoc 5m 30s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 5m 1s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+0 🆗 spotbugs 0m 15s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+0 🆗 spotbugs 0m 16s branch/hadoop-tools/hadoop-tools-dist no spotbugs output file (spotbugsXml.xml)
+0 🆗 spotbugs 0m 15s branch/hadoop-cloud-storage-project/hadoop-cloud-storage no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 38m 27s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 38m 40s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 3m 4s Maven dependency ordering for patch
+1 💚 mvninstall 35m 53s the patch passed
+1 💚 compile 8m 14s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 8m 14s the patch passed
-1 ❌ compile 7m 56s /patch-compile-root-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt root in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
-1 ❌ javac 7m 56s /patch-compile-root-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt root in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 56s the patch passed
+1 💚 mvnsite 12m 1s the patch passed
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 javadoc 5m 31s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 5m 7s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+0 🆗 spotbugs 0m 15s hadoop-project has no data from spotbugs
+0 🆗 spotbugs 0m 20s hadoop-cloud-storage-project/hadoop-cloud-storage has no data from spotbugs
+0 🆗 spotbugs 0m 21s hadoop-tools/hadoop-tools-dist has no data from spotbugs
+1 💚 shadedclient 20m 31s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 631m 7s /patch-unit-root.txt root in the patch passed.
+1 💚 asflicense 1m 7s The patch does not generate ASF License warnings.
877m 56s
Reason Tests
Failed junit tests hadoop.yarn.server.router.subcluster.fair.TestYarnFederationWithFairScheduler
hadoop.yarn.server.router.webapp.TestFederationWebApp
hadoop.yarn.server.router.webapp.TestRouterWebServicesREST
hadoop.mapreduce.v2.TestUberAM
hadoop.yarn.sls.appmaster.TestAMSimulator
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7869/2/artifact/out/Dockerfile
GITHUB PR #7869
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint shellcheck shelldocs
uname Linux f5f8b991a287 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ba3a887
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7869/2/testReport/
Max. process+thread count 4266 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-gcp hadoop-tools . hadoop-cloud-storage-project/hadoop-cloud-storage hadoop-tools/hadoop-tools-dist U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7869/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2 shellcheck=0.7.0
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@cnauroth
Copy link
Contributor Author

cnauroth commented Sep 3, 2025

Something odd happened in the last pre-commit run. I re-triggered it manually:

https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7869/4/

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+0 🆗 shelldocs 0m 1s Shelldocs was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 22 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 8m 37s Maven dependency ordering for branch
+1 💚 mvninstall 19m 29s trunk passed
+1 💚 compile 8m 24s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 7m 17s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 3m 9s trunk passed
+1 💚 mvnsite 12m 42s trunk passed
+1 💚 javadoc 5m 27s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 5m 3s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+0 🆗 spotbugs 0m 15s branch/hadoop-project no spotbugs output file (spotbugsXml.xml)
+0 🆗 spotbugs 0m 27s branch/hadoop-tools/hadoop-tools-dist no spotbugs output file (spotbugsXml.xml)
+0 🆗 spotbugs 0m 16s branch/hadoop-cloud-storage-project/hadoop-cloud-storage no spotbugs output file (spotbugsXml.xml)
+1 💚 shadedclient 38m 2s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 38m 16s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 2m 35s Maven dependency ordering for patch
+1 💚 mvninstall 36m 38s the patch passed
+1 💚 compile 8m 12s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 8m 12s the patch passed
+1 💚 compile 7m 24s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 7m 24s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 2m 2s the patch passed
+1 💚 mvnsite 9m 13s the patch passed
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 javadoc 5m 19s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 5m 7s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+0 🆗 spotbugs 0m 15s hadoop-project has no data from spotbugs
+0 🆗 spotbugs 0m 19s hadoop-cloud-storage-project/hadoop-cloud-storage has no data from spotbugs
+0 🆗 spotbugs 0m 19s hadoop-tools/hadoop-tools-dist has no data from spotbugs
+1 💚 shadedclient 20m 59s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 654m 53s /patch-unit-root.txt root in the patch passed.
+1 💚 asflicense 0m 55s The patch does not generate ASF License warnings.
896m 22s
Reason Tests
Failed junit tests hadoop.yarn.server.router.subcluster.fair.TestYarnFederationWithFairScheduler
hadoop.yarn.server.router.webapp.TestFederationWebApp
hadoop.yarn.server.router.webapp.TestRouterWebServicesREST
hadoop.mapreduce.v2.TestUberAM
hadoop.yarn.sls.appmaster.TestAMSimulator
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7869/4/artifact/out/Dockerfile
GITHUB PR #7869
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint shellcheck shelldocs
uname Linux 0f4d9b52cf08 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ff4d997
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7869/4/testReport/
Max. process+thread count 3991 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-common-project/hadoop-common hadoop-tools/hadoop-gcp hadoop-tools . hadoop-cloud-storage-project/hadoop-cloud-storage hadoop-tools/hadoop-tools-dist U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7869/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2 shellcheck=0.7.0
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@cnauroth
Copy link
Contributor Author

cnauroth commented Sep 4, 2025

The last pre-submit run had a few unrelated test failures. Otherwise, it was clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants