Skip to content

Allow '+' in a repository name (e.g. dbus-c++, canl-c++, biosig4c++, lv2-c++-tools) #13943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 6 tasks
davidlt opened this issue Dec 11, 2020 · 15 comments
Closed
2 of 6 tasks
Labels
reviewed/wontfix The problem described in this issue/fixed in this pull request is not a problem we will fix type/feature Completely new functionality. Can only be merged if feature freeze is not active.

Comments

@davidlt
Copy link

davidlt commented Dec 11, 2020

  • Gitea version (or commit ref): 1.12.4 (downloaded from GitHub)
  • Git version: Go1.14.8
  • Operating system: Fedora 31
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
  • Log gist:

Description

I am trying to migrated some repositories and noticed that repositories that contain "+" in the name are rejected. I am getting:

Repository name should contain only alphanumeric, dash ('-'), underscore ('_') and dot ('.') characters.

Would it be possible to allow "+" character in the repository name? I would like to keep the same name for repository (i.e. avoid renaming it to e.g. dbus-cpp) as that could cause name collisions and would break some scripts.

Example repository: https://src.fedoraproject.org/rpms/dbus-c++.git

Screenshots

@zeripath
Copy link
Contributor

zeripath commented Dec 11, 2020

There are three problems here:

  1. Gitea is descended from Gogs and Gogs decided to store its repositories on the filesystem as owner/reponame.git.
  2. Therefore usernames and repository names need to be POSIX fully portable filenames (matching regexp: [A-Za-z0-9\._-]+).
  3. The decision to restrict repository and user names to this very limited set of characters that do not need escaping mean that repository and user names are not escaped throughout Gogs and Gitea.

Ultimately the correct solution is to move Gitea's repositories to repository_id.git however, we would have to ensure that all uses of the repository names - and usernames are all escaped correctly. That would allow us to allow any unicode character in the username and repository name but then we would probably need someway of coping with unicode homographs - likely similar to the Firefox solution.

So as you can see allowing + would not be a simple endeavour. Ultimately one we should do but not one we are ready to do as yet.

@zeripath zeripath added type/feature Completely new functionality. Can only be merged if feature freeze is not active. type/question Issue needs no code to be fixed, only a description on how to fix it yourself. labels Dec 11, 2020
@silverwind
Copy link
Member

silverwind commented Dec 11, 2020

+ is forbidden on GitHub and probably others too and I think it's our best interest to keep repo names portable between hosts.

@clarfonthey
Copy link
Contributor

Longer-term I think that what would make the most sense is to have three tiers of configured settings, namely:

POSIX: Only allow the existing set of characters.
ASCII: Permissive mode, but restricted to ASCII characters only (e.g. what the OP is suggesting)
Unicode: Extended unicode identifiers, e.g. what's listed in UAX31-D2: ttps://www.unicode.org/reports/tr31/#R8

IMHO whatever is allowed for repos should also be the requirements for usernames as well. They shouldn't have separate requirements, so extending one should extend the other.

@andrewlukoshko
Copy link

BTW gitlab now allows plus sign in repo names.
https://gitlab.com/gitlab-org/gitlab/-/merge_requests/67997

@techknowlogick
Copy link
Member

We need to update how we store repos on disk to allow for this, Gitlab changed to using sharded uuids on disk, rather than user/repo combo.

@silverwind
Copy link
Member

silverwind commented Oct 8, 2021

Therefore usernames and repository names need to be POSIX fully portable filenames (matching regexp: [A-Za-z0-9._-]+)

Actually I don't think there is any OS that actually forbids a +, see https://github.com/sindresorhus/filename-reserved-regex/blob/main/index.js for a regex that matches filenames with non-portable characters in it. Most of these restricted characters come from macOS and Windows, with a few additional static forbidden names on Windows. On Linux file systems, one can pretty much assume only / is forbidden.

@zeripath
Copy link
Contributor

zeripath commented Oct 9, 2021

The definition of POSIX fully portable filenames comes from:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_281


On the filesystem I remain convinced that there would be problems on Windows (FAT32 for example), SMB/NFS and other weird filesytems. Because of this the only safe solution is to move to repo ID based git directory names. On most linux based filesystems the filenames are simply a string of bytes which disalllow / and \x00. The determination that any string of bytes maps to a character is based on the character encoding. Once you start allowing general unicode characters as a filename you are demanding that the filesystem is mounted with a unicode encoding - something you cannot be sure of. You also fall victim to encoding issues due to change of encoding.

However, even if you are happy about the filesystem, it should be noted that if you use characters outside of the POSIX fully portable filename list we will have to urlencode and urldecode reponames.

Every place that takes reponames would have to be checked for safety and encoding. Not one single endpoint would be left untouched and the SDK would need to be changed similarly. It is not difficult to imagine a situation whereby there is significant security hole introduced by this. We would also need to think about all of the attacks using visually-alike unicode homoglyphs. (see punycode attacks)

This is a substantial amount of work and whilst I would be willing to work on it, I would have to be paid a non-insignificant amount to do it. Someone else could attempt to implement this in another PR, however, they would have to pay serious consideration to the above problems. Simply just allowing + and/or more characters isn't enough.

@clarfonthey
Copy link
Contributor

I think that in the short-term, my suggestion to add a configuration option that lets you decide what level of filename restrictions you want would be a decent idea. That way, we would still have the POSIX restriction by default, but folks who understand the restrictions of their current systems and would like to change that can do so. Additionally, I think that even if we move to an alternative way of storing the repos on disk, we should still have the ability to restrict repo names to, for example, the POSIX restrictions, since I feel that a lot of folks will want to be able to still enforce that.

The specific options would be likely something along the lines of:

  1. POSIX: Only allow the existing set of characters.
  2. ASCII: Extends to allow additional ASCII characters (right now plus is the only one I can think of)
  3. Unicode: Extended unicode identifiers, e.g. what's listed in UAX31-D2: ttps://www.unicode.org/reports/tr31/#R8
  4. Maybe custom allowed/disallowed lists of characters (probably listed as unicode ranges) would also be a nice addition.

I personally have a feeling that this would be substantially easier to implement than refactoring repo storage on disk, and wouldn't require any special migration logic.

@delvh
Copy link
Member

delvh commented Oct 9, 2021

I personally have a feeling that this would be substantially easier to implement than refactoring repo storage on disk, and wouldn't require any special migration logic.

There are some problems with this approach I noticed while thinking about it for a second:
First of all, that way you are forever bound to your current filesystem architecture (or one allowing at least the same chars), as you cannot copy repos that are non-conformant to an architecture that does not support this. In practice, this means that you'll be "stuck" with a Unix system and cannot move to a Windows system anymore.
Second and more importantly, how do you migrate repos that do not conform then? Sure, you can simply strip the non-conformant chars, but then again this is extra code you have to write. Especially as there is more than one instance where this applies to. Right now, I can think of adding this logic to repos and their corresponding wikis, but I'm almost certain more problems might arise eventually.

@clarfonthey
Copy link
Contributor

There have been loads of ways of renaming things that don't conform to filename requirements over the years. For migrating repos, it seems easy enough to do a lossy conversion and then keep the original name somewhere, or even prompt the user depending on how many repos are being migrated.

@zeripath
Copy link
Contributor

Look I think we're going round in circles.

The time I have spent repeatedly explaining why it is not simple is time I could have spent fixing other things in Gitea.

The issue is not that this problem is insurmountable but that it is a lot of work, with a lot of risks of introducing subtle security problems. The issues on the filesystem are just one part of the problem - the solution there is to clearly just use repoids.

I have explained how it could be done, what would need to happen to make me do it, and where the security problems are likely to be, but one last time:

There are essentially two options:

  1. Move to allow any unicode character and store things natively in utf8/unicode.
  2. Keep the fully posix portable filenames internally but render (and convert passed in) names with punycode with IDN prefix (Or some other mapping - if + can be handled then punycode may be best though, + doesn't normally get converted here so if there is no way to encode this may well be dead in the water.)

Move to allow almost any unicode character in reponames (and usernames - with the likely exception of '@')

If you want to move forward with making this possible, then several PRs would be needed that:

  • Ensure that all uses of reponame and username in urls are urlencoded and decoded properly.
  • Consider how to encode emails
  • Consider how this affects webhooks - it may not but I'm not certain.
  • Double-check that there is always correct decoding of ssh passed username/reponames
  • Do the same thing to the SDK and Tea.
  • Ensure that the PAM, SMTP and LDAP authentication sources tolerate this correctly.
  • Sort out the filesystem - the answer here has to be move to use repo ids for the repo names - I'm sorry but that's the only solution that is going to definitely be safe.
  • Explain how to prevent homograph attacks or make homograph attacks more obvious.
  • Consider how to provide settings to enable disable this.
  • Convince some maintainers to spend time looking through each and every line of the code to ensure that security leak hasn't been added.

It's highly likely that this solution would still introduce subtle sec holes.

Keep the fully posix portable filenames internally but render (and convert passed in) names with punycode with IDN prefix or some other mapping

This is a somewhat safer alternative but would forever preclude a move to 1. It would however mean that repos etc could be moved to github, eg. A user or repo named 機動戦士ガンダム would be mapped to xn--mckxbvf4c359tdwhuyqdqp and either could be used. It's unlikely that there would be unintended collisions here the xn-- prefix is so unusual that anyone using it almost certainly means punycode. The flaw here is the + symbol doesn't get punycoded by default and I don't know if + can be encoded in punycode.

You'd still need to map any non-fully portable posix name to its mapping whenever it is passed in as an url (or elsewhere) but if you miss any those would just be bugs rather than full blown security holes. You'd need to add renderers on the UI - again if you miss any it's just a bug.

You'd still need to explain how to prevent homograph attacks or make them obvious and describe settings to turn on/off this functionality.

Reviewers would still need to go through each and every endpoint looking for sec holes.


I won't pretend to declare which is better right now but there are considerable risks with both. However, please stop the discussions about allowing filenames with + in them - yes they generally work but there are going to be subtle cases when things don't work - moving to repo-id directories is the solution here. The on-disk repo name is just one part of the problem and not even the place where the most pain is: The URLs are the problem for example + means in URLs.

If you really want this either provide the PRs or fund someone to do it. It would require changes at every level in Gitea to make work and we're looking at multiple PRs. URL encoding and decoding the names when going into URLs would be the first step for the first option.

@clarfonthey
Copy link
Contributor

Was this fixed or just closed?

@techknowlogick
Copy link
Member

I've been going through and cleaning up proposals that have just remained open for sometime. Not saying that this won't be implemented at some time, but keeping many issues (1.8k at the start of the cleanup) open is preventing users from finding other issues.

@wxiaoguang wxiaoguang added the reviewed/wontfix The problem described in this issue/fixed in this pull request is not a problem we will fix label Apr 15, 2022
@wxiaoguang
Copy link
Contributor

wxiaoguang commented Apr 15, 2022

zeripath has explained very clearly, I would treat it as a 'won't-be-fixed' problem at the moment.

If Gitea changes a lot (more than a lot) in future, this proposal might be feasible then, but not now.

@wxiaoguang wxiaoguang removed the type/question Issue needs no code to be fixed, only a description on how to fix it yourself. label Apr 15, 2022
@clarfonthey
Copy link
Contributor

Ah, makes sense. Thanks for clarifying!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
reviewed/wontfix The problem described in this issue/fixed in this pull request is not a problem we will fix type/feature Completely new functionality. Can only be merged if feature freeze is not active.
Projects
None yet
Development

No branches or pull requests

8 participants