-
Notifications
You must be signed in to change notification settings - Fork 20
Spec: How to handle submodules #36
Comments
This is why I think we need to think outside the repo box, and why SPDX’ definition of a “Package” could come in handy. IMO, we should concentrate on what’s being published by the person in charge of the package/repo/tarball/… In the case a repo wants to be REUSE compliant, if the Git submodule in question is a repository under control of the person who is in charge of the “main” repo, they should fix it in both repos. If it’s not, they should (ask) upstream (to) fix. (i.e. ignore the subrepo) In the case from that repo and submodule(s) a source code tarball is created and that one should be REUSE compliant, the repo itself is not of direct importance, and its submodule even less so. But then they should make sure that everything in the tarball is REUSE compliant – what got in there from the original repo, all its submodules and also all extra added files of whichever origin (1st or 3rd party). (i.e. ignore the subrepo) |
I principally like this, but I think that this complicates REUSE by a lot. It's the difference between:
and
Moreover it's much more technologically challenging to alter the structure of the resulting tarball, rather than the structure of the repo. Anybody with very slight technical skill knows how to manipulate files in a directory (the repo). Not everybody knows how to alter the build process to change what files go into a tarball and how. So I'm kind of stuck on this, because REUSE needs to actually be easy to implement if it wants to gain any traction. And the easiest target is the repo, not the tarball. |
@carmenbianca I see your concern, but there is a reason why SPDX is looking at different types of Packages. If we want to limit ourselves to VCS repositories (and its forges), we should acknowledge we are tackling the issue from only one specific front (and Git is just one of VCS used). If we do that, we should make double-sure to explicitly state this limitation and work with SPDX and (other) tooling projects, including build systems, on how to reuse REUSE data in final source code distribution. If, though, we want to cover source code regardless of in which step it is in the development and what its system of choice for development is (e.g. some still contribute In any case, I would rather not have a helper tool direct and limit the scope of the specification. As cool as the IMHO we should keep the Spec wide-reaching, but limit the Tutorial and Tool (initially) to just repos. |
I don't think this characterisation is entirely accurate. It's not exactly limiting oneself to VCS, but to "the canonical source", or "the development source", instead of the tarball or package, which is a derivative of the aforementioned source.
This makes some sense to me. Technically there is nothing that prevents the spec from applying to tarballs and repos alike. But even though the mechanics of REUSE compliance are the same, the way to get there differs a lot between the two. But I suppose I have two unrelated questions:
If the answer to question 1 is either the repo or both, then we probably still need to handle submodules somehow. Possibly. |
I think that if someone wants to have REUSE-compliant both their development repo as well as source code tarballs, we should not prevent them from it. One use case I see is to have packages in NPM, PyPI, etc. scripting language repositories also REUSE-compliant.
I would take the molecular approach, where for a “project” each repo/tarball/package/… (i.e. Package according to SPDX) is a separate object to be considered REUSE compliant or not. That would also ease adoption, as if a Git repo is REUSE compliant, but its source tarball or its NPM package is not, it does not take away the REUSE compliance of the project’s repo. Also, if someone where to e.g. create source packages (tarballs or otherwise) and made sure that they were REUSE compliant, but the upstream repository was not (perhaps the project is dead or not interested), those packages/tarballs would also still be compliant and provide additional value to its downstream.
I do not see the problem here. Same rules apply to files whether in a repository or in an archive. In the end it, when you clone it to your disk or unarchive it, you are left with files (and directories) which need to include appropriate license and copyright info. |
+1. Does this need mentioning somewhere?
Technically yes, but the steps to get there are more difficult. Is this just a case of "figure it out yourself"? |
I think the FAQ would be a good target. IMHO it can already be understood that way in the Spec, especially if we take on the SPDX Package definition.
I think it’s more a case of “figure it out yourself”. Or rather, the Tutorial can easily concentrate on repos, and if people come ask, we can write a separate short tutorial or question in the FAQ to tackle this and other use cases. |
Seems good to me. Then back on topic to the thread: Should the spec deal with submodules somehow, or just ignore the topic and leave it to the tool to deal with them? The tool could do three things, I think:
One of these could be the default behaviour, and the other two could be added as optional flags. |
Correct, the second option is the most logical behaviour, and the current behaviour of the tool. |
As I said, I would say that submodules are not something the Spec itself needs to handle. |
+1 to the SPDX approach of defining a package and what @silverhook wrote in his first post here. Regarding the default treatment of submodules, I tend towards ignoring them, but mentioning them in the output explicitely. Otherwise, it would be highly demotivating for larger projects using a lot of submodules to become REUSE compliant. However, I like the optional flags, especially for the recursive check. |
I also think that the Spec should not handle submodules. If external projects being part of a project it may contain licencing information. This must not necessarily be a VCS submodule. Can dep5 be used to locate the license text? |
Not really. DEP5 only really marks the copyright of files. As far as the spec contains, the directory project that contains submodules is a single Project/Package, and should be checked as such. A project does not support having multiple LICENSES/ directories or .reuse/dep5 files. And the spec very likely won't be altered to support such a configuration either, because it's a bit of an esoteric requirement. Instead, you can do the following things:
|
From recent discussions, the summary seems to be:
|
The tool command is already there, and it automatically excludes submodules by default. However, we need to add this in the spec, ideally under this part:
|
The chosen language should be broad enough to also include Meson subprojects → fsfe/reuse-tool#496 |
@carmenbianca Hm, so you think the current proposal in #99 does not match this criteria? If not, how should we express it? |
See also: fsfe/reuse-tool#29
The spec currently doesn't really cover the scenario of having a Git submodule. My instinct is to ignore them, but I'm not sure if this is the correct approach. Because the submodule could initially be REUSE-compliant, but then it stops being compliant, possibly indirectly meaning that neither is your project.
Alternatively, if someone included a carbon copy of a REUSE-compliant project in a subdirectory (i.e., not a submodule), the LICENSES directory and
.reuse/dep5
file of that carbon copy would not be detected. I am not sure if we should support this scenario, though, because it's kind of esoteric.The text was updated successfully, but these errors were encountered: