Skip to content

x/pkgsite: invalid links for internal v2+ github enterprise modules #61404

Open
@redloaf

Description

@redloaf

What is the URL of the page with the issue?

This is a bug report about a v2+ version of a non-public repository on self-hosted pkgsite. The URL would be for this form:
https://my-pkgsite.mycompany.internal/github.mycompany.internal/myorg/myrepo/v2

What is your user agent?

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36

Screenshot

As this is an internal code-base, we would prefer to not share a screenshot.
The primary bug here is also not in the rendering of the page, it is with the link to the code associated with a symbol.
For example, with the hypothetical repo view above:
The pages would include a link to https://github.mycompany.internal/myorg/myrepo.git/blob/v2.0.0/v2/mypkg/myfile.go

Note specifically the .git and /v2/ in this link. The latter is included unconditionally, regardless of whether it uses a major-branch or a major subdirectory pattern.

What did you do?

Running a local instance of pkgsite, I requested indexing of a v2 package on a private github repository of the form github.mycompany.internal/myorg/myrepo/v2 that follows the major branch layout. The links to the code were broken, following a pattern such as https://github.mycompany.internal/myorg/myrepo.git/blob/v2.0.0/v2/mypkg/myfile.go, which has some problems:

  • GitHub Enterprise has a redirect for repo URLs (e.g. URLs of the form https://github.mycompany.internal/myorg/myrepo.git) but issues 404s for any subpaths, including the /blob/ subpath.
  • The URLs imply that pkgsite expects the repo to follow the major submodule layout (v2/go.mod), thus expecting to find mypkg/myfile.go in a directory v2/mypkg/myfile.go, even though the repo uses the major branch layout.
  • Transient errors may result in pkgsite caching incorrect results because adjustVersionedModuleDirectory treats any error as if it were a 404.
  • When fetching the go.mod BlobURL to distinguish between major branch vs. major subdirectory layouts, pkgsite does not verify that the content at the subdirectory is a valid go.mod file for the package being queried, treating any 200 response as confirmation that the go.mod file exists in the major version subdirectory (even if the 200 response is as a result of a redirect to a login page).

In our internal deployment, pkgsite pulls Go modules from an internal deployment of Athens and does not have direct access to the internal git repositories. We would be happy to give it this access, but that does not appear to be an option today.

What did you expect to see?

FileURL should link to https://github.mycompany.internal/myorg/myrepo/blob/v2.0.0/mypkg/myfile.go

What did you see instead?

https://github.mycompany.internal/myorg/myrepo.git/blob/v2.0.0/v2/mypkg/myfile.go

Analysis

It appears the regular expressions intended to match internal github (and gitlab?) instances do not match module paths that have a major version >= 2. Therefore the repository metadata is fetched dynamically and not stripped of its vcs suffix. ModuleInfo then calls adjustVersionedModuleDirectory to perform a HEAD request on the go.mod file and considers any 200 response successful, even if it is a login page (after a redirect). The repository layout (major branch vs. major subdirectory) question must be resolved by querying the repository itself. For private repositories, there does not seem to be a means to configure authentication so that pkgsite can accurately derive these answers.

Proposed solutions

There are many ways to proceed here, one of which is to permit pkgsite to be configured with specific "code hosts."
In this case, we would configure a "code host" for github.mycompany.internal and its configuration would supercede the regular expression matching. This "code host" could be configured with its type (e.g. GitHub Enterprise) and API credentials, which would let pkgsite query the API directly (to see if the file v2/go.mod exists) rather than relying on HTTP. Some code hosts may offer a standard means of serving a single file.

Another approach would be to use the existing RawURL, if available, so that pkgsite can affirmatively parse the go.mod file (after redirects) to ensure it is a valid go.mod for the package being fetched. However, this would still require solving the authentication issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.pkgsite

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions