Skip to content

Commit a160445

Browse files
committed
fix
1 parent cb10f27 commit a160445

File tree

17 files changed

+292
-51
lines changed

17 files changed

+292
-51
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ _test
1111
.idea
1212
# Goland's output filename can not be set manually
1313
/go_build_*
14+
/gitea_*
1415

1516
# MS VSCode
1617
.vscode

cmd/doctor_convert.go

+2-2
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ func runDoctorConvert(ctx *cli.Context) error {
3737

3838
switch {
3939
case setting.Database.Type.IsMySQL():
40-
if err := db.ConvertUtf8ToUtf8mb4(); err != nil {
41-
log.Fatal("Failed to convert database from utf8 to utf8mb4: %v", err)
40+
if err := db.ConvertDatabaseTable(); err != nil {
41+
log.Fatal("Failed to convert database & table: %v", err)
4242
return err
4343
}
4444
fmt.Println("Converted successfully, please confirm your database's character set is now utf8mb4")

custom/conf/app.example.ini

+2
Original file line numberDiff line numberDiff line change
@@ -351,6 +351,7 @@ NAME = gitea
351351
USER = root
352352
;PASSWD = ;Use PASSWD = `your password` for quoting if you use special characters in the password.
353353
;SSL_MODE = false ; either "false" (default), "true", or "skip-verify"
354+
;CHARSET_COLLATION = ; Empty as default, Gitea will try to find a case-sensitive collation
354355
;;
355356
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
356357
;;
@@ -382,6 +383,7 @@ USER = root
382383
;NAME = gitea
383384
;USER = SA
384385
;PASSWD = MwantsaSecurePassword1
386+
;CHARSET_COLLATION = ; Empty as default, Gitea will try to find a case-sensitive collation
385387
;;
386388
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
387389
;;

docs/content/help/faq.en-us.md

-18
Original file line numberDiff line numberDiff line change
@@ -371,24 +371,6 @@ If you are receiving an error line containing `Error 1071: Specified key was too
371371
then you are attempting to run Gitea on tables which use the ISAM engine. While this may have worked by chance in previous versions of Gitea, it has never been officially supported and
372372
you must use InnoDB. You should run `ALTER TABLE table_name ENGINE=InnoDB;` for each table in the database.
373373

374-
If you are using MySQL 5, another possible fix is
375-
376-
```mysql
377-
SET GLOBAL innodb_file_format=Barracuda;
378-
SET GLOBAL innodb_file_per_table=1;
379-
SET GLOBAL innodb_large_prefix=1;
380-
```
381-
382-
## Why Are Emoji Broken On MySQL
383-
384-
Unfortunately MySQL's `utf8` charset does not completely allow all possible UTF-8 characters, in particular Emoji.
385-
They created a new charset and collation called `utf8mb4` that allows for emoji to be stored but tables which use
386-
the `utf8` charset, and connections which use the `utf8` charset will not use this.
387-
388-
Please run `gitea doctor convert`, or run `ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;`
389-
for the database_name and run `ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;`
390-
for each table in the database.
391-
392374
## Why are Emoji displaying only as placeholders or in monochrome
393375

394376
Gitea requires the system or browser to have one of the supported Emoji fonts installed, which are Apple Color Emoji, Segoe UI Emoji, Segoe UI Symbol, Noto Color Emoji and Twemoji Mozilla. Generally, the operating system should already provide one of these fonts, but especially on Linux, it may be necessary to install them manually.

docs/content/help/faq.zh-cn.md

-19
Original file line numberDiff line numberDiff line change
@@ -375,25 +375,6 @@ Gitea 提供了一个子命令`gitea migrate`来初始化数据库,然后您
375375
的错误行,则表示您正在尝试在使用 ISAM 引擎的表上运行 Gitea。尽管在先前版本的 Gitea 中可能是凑巧能够工作的,但它从未得到官方支持,
376376
您必须使用 InnoDB。您应该对数据库中的每个表运行`ALTER TABLE table_name ENGINE=InnoDB;`
377377

378-
如果您使用的是 MySQL 5,另一个可能的修复方法是:
379-
380-
```mysql
381-
SET GLOBAL innodb_file_format=Barracuda;
382-
SET GLOBAL innodb_file_per_table=1;
383-
SET GLOBAL innodb_large_prefix=1;
384-
```
385-
386-
## 为什么 MySQL 上的 Emoji 显示错误
387-
388-
不幸的是,MySQL 的`utf8`字符集不完全允许所有可能的 UTF-8 字符,特别是 Emoji。
389-
他们创建了一个名为 `utf8mb4`的字符集和校对规则,允许存储 Emoji,但使用
390-
utf8 字符集的表和连接将不会使用它。
391-
392-
请运行 `gitea doctor convert` 或对数据库运行 `ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;`
393-
并对每个表运行 `ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;`
394-
395-
您还需要将`app.ini`文件中的数据库字符集设置为`CHARSET=utf8mb4`
396-
397378
## 为什么 Emoji 只显示占位符或单色图像
398379

399380
Gitea 需要系统或浏览器安装其中一个受支持的 Emoji 字体,例如 Apple Color Emoji、Segoe UI Emoji、Segoe UI Symbol、Noto Color Emoji 和 Twemoji Mozilla。通常,操作系统应该已经提供了其中一个字体,但特别是在 Linux 上,可能需要手动安装它们。

docs/content/installation/database-preparation.en-us.md

+6-2
Original file line numberDiff line numberDiff line change
@@ -61,10 +61,14 @@ Note: All steps below requires that the database engine of your choice is instal
6161

6262
Replace username and password above as appropriate.
6363

64-
4. Create database with UTF-8 charset and collation. Make sure to use `utf8mb4` charset instead of `utf8` as the former supports all Unicode characters (including emojis) beyond _Basic Multilingual Plane_. Also, collation chosen depending on your expected content. When in doubt, use either `unicode_ci` or `general_ci`.
64+
4. Create database with UTF-8 charset and case-sensitive collation.
65+
66+
`utf8mb4_bin` is a common collation for both MySQL/MariaDB.
67+
When Gitea starts, it will try to find a better collation (`utf8mb4_0900_as_cs` or `uca1400_as_cs`) and alter the database if it is possible.
68+
If you would like to use other collation, you can set `[database].CHARSET_COLLATION` in the `app.ini` file.
6569

6670
```sql
67-
CREATE DATABASE giteadb CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_unicode_ci';
71+
CREATE DATABASE giteadb CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_bin';
6872
```
6973

7074
Replace database name as appropriate.

docs/content/installation/database-preparation.zh-cn.md

+4-2
Original file line numberDiff line numberDiff line change
@@ -59,10 +59,12 @@ menu:
5959

6060
根据需要替换上述用户名和密码。
6161

62-
4. 使用 UTF-8 字符集和排序规则创建数据库。确保使用 `**utf8mb4**` 字符集,而不是 `utf8`,因为前者支持 _Basic Multilingual Plane_ 之外的所有 Unicode 字符(包括表情符号)。排序规则根据您预期的内容选择。如果不确定,可以使用 `unicode_ci` 或 `general_ci`。
62+
4. 使用 UTF-8 字符集和大小写敏感的排序规则创建数据库。
63+
64+
Gitea 启动后会尝试把数据库修改为更合适的字符集,如果你想指定自己的字符集规则,可以在 app.ini 中设置 `[database].CHARSET_COLLATION`。
6365

6466
```sql
65-
CREATE DATABASE giteadb CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_unicode_ci';
67+
CREATE DATABASE giteadb CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_bin';
6668
```
6769

6870
根据需要替换数据库名称。

models/db/collation.go

+171
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
// Copyright 2023 The Gitea Authors. All rights reserved.
2+
// SPDX-License-Identifier: MIT
3+
4+
package db
5+
6+
import (
7+
"errors"
8+
"fmt"
9+
"strings"
10+
11+
"code.gitea.io/gitea/modules/container"
12+
"code.gitea.io/gitea/modules/log"
13+
"code.gitea.io/gitea/modules/setting"
14+
15+
"xorm.io/xorm"
16+
"xorm.io/xorm/schemas"
17+
)
18+
19+
type CheckCollationsResult struct {
20+
ExpectedCollation string
21+
AvailableCollation container.Set[string]
22+
DatabaseCollation string
23+
InconsistentCollationColumns []string
24+
IsCollationCaseSensitive func(s string) bool
25+
CollationEquals func(a, b string) bool
26+
}
27+
28+
func findAvailableCollationsMySQL(x *xorm.Engine) (ret container.Set[string], err error) {
29+
var res []struct {
30+
Collation string
31+
}
32+
if err = x.SQL("SHOW COLLATION WHERE (Collation = 'utf8mb4_bin') OR (Collation LIKE '%\\_as\\_cs%')").Find(&res); err != nil {
33+
return nil, err
34+
}
35+
ret = make(container.Set[string], len(res))
36+
for _, r := range res {
37+
ret.Add(r.Collation)
38+
}
39+
return ret, nil
40+
}
41+
42+
func findAvailableCollationsMSSQL(x *xorm.Engine) (ret container.Set[string], err error) {
43+
var res []struct {
44+
Name string
45+
}
46+
if err = x.SQL("SELECT * FROM sys.fn_helpcollations() WHERE name LIKE '%\\_CS\\_AS%'").Find(&res); err != nil {
47+
return nil, err
48+
}
49+
ret = make(container.Set[string], len(res))
50+
for _, r := range res {
51+
ret.Add(r.Name)
52+
}
53+
return ret, nil
54+
}
55+
56+
func CheckCollations(x *xorm.Engine) (*CheckCollationsResult, error) {
57+
dbTables, err := x.DBMetas()
58+
if err != nil {
59+
return nil, err
60+
}
61+
62+
res := &CheckCollationsResult{}
63+
res.CollationEquals = func(a, b string) bool { return a == b }
64+
65+
var candidateCollations []string
66+
if x.Dialect().URI().DBType == schemas.MYSQL {
67+
if _, err = x.SQL("SELECT @@collation_database").Get(&res.DatabaseCollation); err != nil {
68+
return nil, err
69+
}
70+
res.IsCollationCaseSensitive = func(s string) bool {
71+
return s == "utf8mb4_bin" || strings.HasSuffix(s, "_as_cs")
72+
}
73+
candidateCollations = []string{"utf8mb4_0900_as_cs", "uca1400_as_cs", "utf8mb4_bin"}
74+
res.AvailableCollation, err = findAvailableCollationsMySQL(x)
75+
if err != nil {
76+
return nil, err
77+
}
78+
res.CollationEquals = func(a, b string) bool {
79+
// MariaDB adds the "utf8mb4_" prefix, eg: "utf8mb4_uca1400_as_cs", but not the name "uca1400_as_cs" in "SHOW COLLATION"
80+
// At the moment, it's safe to ignore the database difference, just trim the prefix and compare. It could be fixed easily if there is any problem in the future.
81+
return a == b || strings.TrimPrefix(a, "utf8mb4_") == strings.TrimPrefix(b, "utf8mb4_")
82+
}
83+
} else if x.Dialect().URI().DBType == schemas.MSSQL {
84+
if _, err = x.SQL("SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation')").Get(&res.DatabaseCollation); err != nil {
85+
return nil, err
86+
}
87+
res.IsCollationCaseSensitive = func(s string) bool {
88+
return strings.HasSuffix(s, "_CS_AS")
89+
}
90+
candidateCollations = []string{"Latin1_General_CS_AS"}
91+
res.AvailableCollation, err = findAvailableCollationsMSSQL(x)
92+
if err != nil {
93+
return nil, err
94+
}
95+
} else {
96+
return nil, nil
97+
}
98+
99+
if res.DatabaseCollation == "" {
100+
return nil, errors.New("unable to get collation for current database")
101+
}
102+
103+
res.ExpectedCollation = setting.Database.CharsetCollation
104+
if res.ExpectedCollation == "" {
105+
for _, collation := range candidateCollations {
106+
if res.AvailableCollation.Contains(collation) {
107+
res.ExpectedCollation = collation
108+
break
109+
}
110+
}
111+
}
112+
113+
if res.ExpectedCollation == "" {
114+
return nil, errors.New("unable to find a suitable collation for current database")
115+
}
116+
117+
for _, table := range dbTables {
118+
for _, col := range table.Columns() {
119+
if col.Collation != "" && (!res.IsCollationCaseSensitive(col.Collation) || !res.CollationEquals(col.Collation, res.DatabaseCollation)) {
120+
res.InconsistentCollationColumns = append(res.InconsistentCollationColumns, fmt.Sprintf("%s.%s", table.Name, col.Name))
121+
}
122+
}
123+
}
124+
125+
return res, nil
126+
}
127+
128+
func CheckCollationsDefaultEngine() (*CheckCollationsResult, error) {
129+
return CheckCollations(x)
130+
}
131+
132+
func alterDatabaseCollation(x *xorm.Engine, checkResult *CheckCollationsResult) error {
133+
if x.Dialect().URI().DBType == schemas.MYSQL {
134+
_, err := x.Exec("ALTER DATABASE CHARACTER SET utf8mb4 COLLATE " + checkResult.ExpectedCollation)
135+
return err
136+
} else if x.Dialect().URI().DBType == schemas.MSSQL {
137+
// MSSQL has many limitations on changing database collation, it could fail in many cases
138+
_, err := x.Exec("ALTER DATABASE CURRENT COLLATE " + checkResult.ExpectedCollation)
139+
return err
140+
}
141+
return errors.New("unsupported database type")
142+
}
143+
144+
func preprocessDatabaseCollation(x *xorm.Engine) {
145+
r, err := CheckCollations(x)
146+
if err != nil {
147+
log.Error("Failed to check database collation: %v")
148+
}
149+
if r == nil {
150+
return // no check result means the database doesn't need to do such check/process (at the moment ....)
151+
}
152+
153+
if !r.CollationEquals(r.DatabaseCollation, r.ExpectedCollation) {
154+
if err = alterDatabaseCollation(x, r); err != nil {
155+
log.Error("Failed to change database collation to %q: %v", r.ExpectedCollation, err)
156+
} else {
157+
if r, err = CheckCollations(x); err != nil {
158+
log.Fatal("Failed to check database collation again after altering: %v", err) // impossible case
159+
}
160+
log.Warn("Current database has been altered to use collation %q", r.DatabaseCollation)
161+
}
162+
}
163+
164+
if !r.IsCollationCaseSensitive(r.DatabaseCollation) {
165+
log.Warn("Current database is using a case-insensitive collation %q, although Gitea could work with it, there might be some rare cases which don't work as expected.", r.DatabaseCollation)
166+
}
167+
168+
if len(r.InconsistentCollationColumns) > 0 {
169+
log.Error("There are %d table columns have inconsistent collation, they should use %q. Please go to admin panel Self Check page or refer to Gitea document", len(r.InconsistentCollationColumns), r.DatabaseCollation)
170+
}
171+
}

models/db/convert.go

+10-5
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,18 @@ import (
1414
"xorm.io/xorm/schemas"
1515
)
1616

17-
// ConvertUtf8ToUtf8mb4 converts database and tables from utf8 to utf8mb4 if it's mysql and set ROW_FORMAT=dynamic
18-
func ConvertUtf8ToUtf8mb4() error {
17+
// ConvertDatabaseTable converts database and tables from utf8 to utf8mb4 if it's mysql and set ROW_FORMAT=dynamic
18+
func ConvertDatabaseTable() error {
1919
if x.Dialect().URI().DBType != schemas.MYSQL {
2020
return nil
2121
}
2222

23-
_, err := x.Exec(fmt.Sprintf("ALTER DATABASE `%s` CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci", setting.Database.Name))
23+
r, err := CheckCollations(x)
24+
if err != nil {
25+
return err
26+
}
27+
28+
_, err = x.Exec(fmt.Sprintf("ALTER DATABASE `%s` CHARACTER SET utf8mb4 COLLATE %s", setting.Database.Name, r.ExpectedCollation))
2429
if err != nil {
2530
return err
2631
}
@@ -30,11 +35,11 @@ func ConvertUtf8ToUtf8mb4() error {
3035
return err
3136
}
3237
for _, table := range tables {
33-
if _, err := x.Exec(fmt.Sprintf("ALTER TABLE `%s` ROW_FORMAT=dynamic;", table.Name)); err != nil {
38+
if _, err := x.Exec(fmt.Sprintf("ALTER TABLE `%s` ROW_FORMAT=dynamic", table.Name)); err != nil {
3439
return err
3540
}
3641

37-
if _, err := x.Exec(fmt.Sprintf("ALTER TABLE `%s` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;", table.Name)); err != nil {
42+
if _, err := x.Exec(fmt.Sprintf("ALTER TABLE `%s` CONVERT TO CHARACTER SET utf8mb4 COLLATE %s", table.Name, r.ExpectedCollation)); err != nil {
3843
return err
3944
}
4045
}

models/db/engine.go

+2
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,8 @@ func InitEngineWithMigration(ctx context.Context, migrateFunc func(*xorm.Engine)
178178
return err
179179
}
180180

181+
preprocessDatabaseCollation(x)
182+
181183
if err = x.Ping(); err != nil {
182184
return err
183185
}

modules/setting/database.go

+4-3
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ var (
3535
Path string
3636
LogSQL bool
3737
MysqlCharset string
38+
CharsetCollation string
3839
Timeout int // seconds
3940
SQLiteJournalMode string
4041
DBConnectRetries int
@@ -67,7 +68,7 @@ func loadDBSetting(rootCfg ConfigProvider) {
6768
}
6869
Database.Schema = sec.Key("SCHEMA").String()
6970
Database.SSLMode = sec.Key("SSL_MODE").MustString("disable")
70-
Database.MysqlCharset = sec.Key("MYSQL_CHARSET").MustString("utf8mb4") // do not document it, end users won't need it.
71+
Database.CharsetCollation = sec.Key("CHARSET_COLLATION").String()
7172

7273
Database.Path = sec.Key("PATH").MustString(filepath.Join(AppDataPath, "gitea.db"))
7374
Database.Timeout = sec.Key("SQLITE_TIMEOUT").MustInt(500)
@@ -105,8 +106,8 @@ func DBConnStr() (string, error) {
105106
if tls == "disable" { // allow (Postgres-inspired) default value to work in MySQL
106107
tls = "false"
107108
}
108-
connStr = fmt.Sprintf("%s:%s@%s(%s)/%s%scharset=%s&parseTime=true&tls=%s",
109-
Database.User, Database.Passwd, connType, Database.Host, Database.Name, paramSep, Database.MysqlCharset, tls)
109+
connStr = fmt.Sprintf("%s:%s@%s(%s)/%s%scharset=utf8mb4&parseTime=true&tls=%s",
110+
Database.User, Database.Passwd, connType, Database.Host, Database.Name, paramSep, tls)
110111
case "postgres":
111112
connStr = getPostgreSQLConnectionString(Database.Host, Database.User, Database.Passwd, Database.Name, Database.SSLMode)
112113
case "mssql":

options/locale/locale_en-US.ini

+8
Original file line numberDiff line numberDiff line change
@@ -2691,6 +2691,7 @@ teams.invite.description = Please click the button below to join the team.
26912691

26922692
[admin]
26932693
dashboard = Dashboard
2694+
self_check = Self Check
26942695
identity_access = Identity & Access
26952696
users = User Accounts
26962697
organizations = Organizations
@@ -3216,6 +3217,13 @@ notices.desc = Description
32163217
notices.op = Op.
32173218
notices.delete_success = The system notices have been deleted.
32183219

3220+
self_check.no_problem_found = No problem found yet.
3221+
self_check.database_collation_mismatch = Expect database to use collation: %s
3222+
self_check.database_collation_case_insensitive = Database is using a collation %s, which is an insensitive collation. Although Gitea could work with it, there might be some rare cases which don't work as expected.
3223+
self_check.database_inconsistent_collation_columns = Database is using collation %s, but these columns are using mismatched collations. It might cause some unexpected problems.
3224+
self_check.database_fix_mysql = For MySQL/MariaDB users, you could use the "gitea doctor convert" command to fix the collation problems, or you could also fix the problem by "ALTER ... COLLATE ..." SQLs manually.
3225+
self_check.database_fix_mssql = For MSSQL users, you could only fix the problem by "ALTER ... COLLATE ..." SQLs manually at the moment.
3226+
32193227
[action]
32203228
create_repo = created repository <a href="%s">%s</a>
32213229
rename_repo = renamed repository from <code>%[1]s</code> to <a href="%[2]s">%[3]s</a>

0 commit comments

Comments
 (0)