Skip to content

Conversation

ctubbsii
Copy link
Member

Description of PR

  • Use Yetus 0.14.1 from downloads.apache.org in yetus-wrapper
  • Use Maven 3.8.8 from downloads.apache.org in Win 10 Dockerfile
  • Point users to downloads.apache.org for JVSC
  • Use Solr 8.11.2 from downloads.apache.org in YARN Dockerfile

How was this patch tested?

Download links verified to work.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?

* Use Yetus 0.14.1 from downloads.apache.org in yetus-wrapper
* Use Maven 3.8.8 from downloads.apache.org in Win 10 Dockerfile
* Point users to downloads.apache.org for JVSC
* Use Solr 8.11.2 from downloads.apache.org in YARN Dockerfile
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. The solr update in the yarn application catalog is a little concerning to me because it's a major version update, and I am not sure if there are tests covering the Docker image usage. Too bad, it seems the no one's taking care of the YARN application catalog code.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I guess we want to this change in all active branches? i.e. branch-3.3 branch-3.2 and branch-2.10.

@ctubbsii
Copy link
Member Author

Also, I guess we want to this change in all active branches? i.e. branch-3.3 branch-3.2 and branch-2.10.

As a stop-gap, it would be good to prevent any downstream users from hitting this on all new releases, as it can trigger a site-wide ban if they've used the archives too much. Building Hadoop releases from source shouldn't cause users to be banned from access to the ASF.

But, this is only a stop-gap solution, regardless of where it is applied. Once these dependencies roll over to the archives, then you'll have the problem of users being unable to build Hadoop releases from source without first patching its build in some way. So, a more complete solution still needs to be created.

The Dockerfile changes are probably okay (as I imagine those are optional, and users can derive their own Dockerfiles easily enough from these as reference), and the generic message about JVSC is certainly okay. You could probably get away with just applying those changes to the trunk.

The main problem to address across all branches is the yetus-wrapper's use of the archives. Perhaps one of the following would work?

  1. yetus can be made an optional part of the build (a dev-only profile that is inactive by default when users build from source)?
  2. Or you can bundle yetus into the release as a build tool so it doesn't need to go to the archives (might be against ASF policy for source releases, but perhaps there's an exception to the rule)?
  3. Or perhaps yetus is stable enough that you can just reference downloads.apache.org/yetus/latest instead of a specific version?
  4. Or maybe the build instructions should just tell the user that they need to download or install it as a build prerequisite, rather than have the Hadoop scripts download it?
  5. Or perhaps Yetus can publish its releases to Maven Central or another place, from which these can be downloaded?
    I'm not sure what the best solution is, but I definitely think this part should be fixed in all branches, somehow.

@steveloughran
Copy link
Contributor

  • solr change should be pulled out as its own thing, as it includes a version change.
  • yetus in source tarballs is interesting. I'm happy for a source bundle to include it as it makes for more of a "all the tools you need are here" world. (ignoring maven artifacts...)

@ctubbsii
Copy link
Member Author

  • yetus in source tarballs is interesting. I'm happy for a source bundle to include it as it makes for more of a "all the tools you need are here" world. (ignoring maven artifacts...)

Unfortunately, I'm pretty sure it's against policy. This is the same issue that people face when they want to include the gradlew or mvnw jars in their sources.

@steveloughran
Copy link
Contributor

yetus is special compared to gradle as it is asf, but then so would thriftc and mvn be.

@ctubbsii
Copy link
Member Author

ctubbsii commented Jun 29, 2023

yetus is special compared to gradle as it is asf, but then so would thriftc and mvn be.

From https://www.apache.org/legal/release-policy.html#source-packages: A source release SHOULD not contain compiled code

So, it's fine if you're just bundling and redistributing Yetus scripts, which are also source. I'm not sure if Yetus contains compiled code, or if it's just scripts, and I'm not sure which parts are used by Hadoop's build.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fair enough to do, but I don't think we should mix this with version upgrades, should do them separately

@ayushtkn ayushtkn changed the title [HADOOP-18786] Use CDN instead of ASF archive HADOOP-18786. Use CDN instead of ASF archive Jun 29, 2023
@ctubbsii
Copy link
Member Author

The version bumps are merely incidental to the actual issue this PR and JIRA are intending to expose: that the use of the archives should be avoided.

If committers want to bump the versions first, I can rebase this PR once that is done. However, these changes are pretty trivial, so I'm not really needed for that part. Once the committers decide to act on this, they can rebase or merge however they see fit. Up to now, these "should" comments about what to do about the version bumps have been very unclear to me... it looks like discussion among yourselves... but if it's a request for me to change something in this PR, please state the request clearly.

@steveloughran
Copy link
Contributor

reviewing this.

@ctubbsii do we need to bump up the versions for the move to downloads to work?

@ctubbsii
Copy link
Member Author

@steveloughran wrote:

@ctubbsii do we need to bump up the versions for the move to downloads to work?

The old versions only exist in the archives, so yes. But, as previously said, this is really only a temporary fix, as it forces the build to depend on a transient state on downloads.a.o. The Hadoop devs need to figure out a more permanent solution. This is just a stop-gap.

@ctubbsii
Copy link
Member Author

ctubbsii commented May 8, 2024

This was previously approved, and I've answered all the questions raised. I just resolved the merge conflicts from upstream, where some lines got moved around in the Dockerfile for Windows. Is anybody willing to merge this?

@steveloughran
Copy link
Contributor

The main thing I want to be sure is from this build, what gets into the distro? only stuff from the maven repo, right? that is: this PR MUST NOT force updates in the binaries we ship.

@ctubbsii
Copy link
Member Author

The main thing I want to be sure is from this build, what gets into the distro? only stuff from the maven repo, right? that is: this PR MUST NOT force updates in the binaries we ship.

I don't quite understand the question. The premise seems to be that the current build is only grabbing stuff from the Maven repo. However, that's not true currently, and that's part of the problem. The build currently grabs stuff from the archives, and not just from the Maven repo. Those are the URLs that this PR changes... to use the ASF CDN instead of the archives. The only change that might affect the distro is a couple of tools do not have that version available in the CDN anymore, so a version bump was necessary to be able to grab it from the CDN instead of from the archives. However, I don't know if those affect the binaries in the distro either, or if those are only used as unshipped build tools. But even if it does change the binaries in some way, the current situation of automatically going to the ASF archives cannot continue... it makes offline builds very hard, and the download of things from the archives causes frequent builds to trigger automated bans of ASF services, because the archives aren't meant to be used this way (for routine builds).

@steveloughran
Copy link
Contributor

ok, let's merge and see what happens.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
+1

@steveloughran steveloughran merged commit 2e77b7b into apache:trunk May 14, 2024
@steveloughran
Copy link
Contributor

thanks, in trunk. can you do a pr cherrypicking to branch-3.4 so we can keep that in sync. No need for more reviews, just a yetus test run.

K0K0V0K pushed a commit to K0K0V0K/hadoop that referenced this pull request May 17, 2024
* Use Yetus 0.14.1 from downloads.apache.org in yetus-wrapper
* Use Maven 3.8.8 from downloads.apache.org in Win 10 Dockerfile
* Point users to downloads.apache.org for JVSC
* Use Solr 8.11.2 from downloads.apache.org in YARN Dockerfile

Contributed by Christopher Tubbs
K0K0V0K pushed a commit to K0K0V0K/hadoop that referenced this pull request May 17, 2024
* Use Yetus 0.14.1 from downloads.apache.org in yetus-wrapper
* Use Maven 3.8.8 from downloads.apache.org in Win 10 Dockerfile
* Point users to downloads.apache.org for JVSC
* Use Solr 8.11.2 from downloads.apache.org in YARN Dockerfile

Contributed by Christopher Tubbs
@ctubbsii
Copy link
Member Author

thanks, in trunk. can you do a pr cherrypicking to branch-3.4 so we can keep that in sync. No need for more reviews, just a yetus test run.

I'd be hesitant to backport this to previous versions. I'd feel more comfortable establishing it as the norm going forward, than risk changing the build process for previous release versions.

@ctubbsii ctubbsii deleted the HADOOP-18786 branch May 28, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants