Skip to content

MySQL case sensitivty fix #28651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions cmd/doctor_convert.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,16 @@ func runDoctorConvert(ctx *cli.Context) error {

switch {
case setting.Database.Type.IsMySQL():
if err := db.ConvertUtf8ToUtf8mb4(); err != nil {
log.Fatal("Failed to convert database from utf8 to utf8mb4: %v", err)
charset, collation, err := db.GetDesiredCharsetAndCollation()
if err != nil {
log.Fatal("Failed to determine the desired database charset or collation: %v", err)
return err
}
fmt.Println("Converted successfully, please confirm your database's character set is now utf8mb4")
if err := db.ConvertCharsetAndCollation(charset, collation); err != nil {
log.Fatal("Failed to convert database from utf8 to %s: %v", charset, err)
return err
}
fmt.Printf("Converted successfully, please confirm your database's character set is now %s, and collation is set to %s\n", charset, collation)
case setting.Database.Type.IsMSSQL():
if err := db.ConvertVarcharToNVarchar(); err != nil {
log.Fatal("Failed to convert database from varchar to nvarchar: %v", err)
Expand Down
5 changes: 5 additions & 0 deletions cmd/web.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (

_ "net/http/pprof" // Used for debugging if enabled and a web server is running

"code.gitea.io/gitea/models/db"
"code.gitea.io/gitea/modules/container"
"code.gitea.io/gitea/modules/graceful"
"code.gitea.io/gitea/modules/log"
Expand Down Expand Up @@ -193,6 +194,10 @@ func serveInstalled(ctx *cli.Context) error {

routers.InitWebInstalled(graceful.GetManager().HammerContext())

if err := db.SanityCheck(); err != nil {
log.Warn("database sanity check warning: %s", err)
}

// We check that AppDataPath exists here (it should have been created during installation)
// We can't check it in `InitWebInstalled`, because some integration tests
// use cmd -> InitWebInstalled, but the AppDataPath doesn't exist during those tests.
Expand Down
6 changes: 4 additions & 2 deletions docs/content/help/faq.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -385,10 +385,12 @@ Unfortunately MySQL's `utf8` charset does not completely allow all possible UTF-
They created a new charset and collation called `utf8mb4` that allows for emoji to be stored but tables which use
the `utf8` charset, and connections which use the `utf8` charset will not use this.

Please run `gitea doctor convert`, or run `ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;`
for the database_name and run `ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;`
Please run `gitea doctor convert`, or run `ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`
for the database_name and run `ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`
for each table in the database.

The most appropriate collate function depends on your variant of the database: for MySQL, it is `utf8mb4_0900_as_cs`, for MariaDB, it is `uca1400_as_cs`. Both of them support `utf8mb4_bin`, so that's the common ground. `gitea doctor convert` will choose the best one for you automatically.

## Why are Emoji displaying only as placeholders or in monochrome

Gitea requires the system or browser to have one of the supported Emoji fonts installed, which are Apple Color Emoji, Segoe UI Emoji, Segoe UI Symbol, Noto Color Emoji and Twemoji Mozilla. Generally, the operating system should already provide one of these fonts, but especially on Linux, it may be necessary to install them manually.
Expand Down
6 changes: 3 additions & 3 deletions docs/content/installation/database-preparation.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,13 @@ Note: All steps below requires that the database engine of your choice is instal

Replace username and password above as appropriate.

4. Create database with UTF-8 charset and collation. Make sure to use `utf8mb4` charset instead of `utf8` as the former supports all Unicode characters (including emojis) beyond _Basic Multilingual Plane_. Also, collation chosen depending on your expected content. When in doubt, use either `unicode_ci` or `general_ci`.
4. Create database with UTF-8 charset and collation. Make sure to use `utf8mb4` charset instead of `utf8` as the former supports all Unicode characters (including emojis) beyond _Basic Multilingual Plane_. Also, collation chosen depending on your expected content (such as `utf8mb4_0900_as_cs` for MySQL, or `uca1400_as_cs` for MariaDB, or `utf8mb4_bin` that works for both). When in doubt, leave it unset, and Gitea will adjust the database to use the most fitting one.

```sql
CREATE DATABASE giteadb CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_unicode_ci';
CREATE DATABASE giteadb CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_bin';
```

Replace database name as appropriate.
Replace database name and the collate function as appropriate.

5. Grant all privileges on the database to database user created above.

Expand Down
6 changes: 3 additions & 3 deletions models/db/convert.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ import (
)

// ConvertUtf8ToUtf8mb4 converts database and tables from utf8 to utf8mb4 if it's mysql and set ROW_FORMAT=dynamic
func ConvertUtf8ToUtf8mb4() error {
func ConvertCharsetAndCollation(charset, collation string) error {
if x.Dialect().URI().DBType != schemas.MYSQL {
return nil
}

_, err := x.Exec(fmt.Sprintf("ALTER DATABASE `%s` CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci", setting.Database.Name))
_, err := x.Exec(fmt.Sprintf("ALTER DATABASE `%s` CHARACTER SET `%s` COLLATE `%s`", setting.Database.Name, charset, collation))
if err != nil {
return err
}
Expand All @@ -34,7 +34,7 @@ func ConvertUtf8ToUtf8mb4() error {
return err
}

if _, err := x.Exec(fmt.Sprintf("ALTER TABLE `%s` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;", table.Name)); err != nil {
if _, err := x.Exec(fmt.Sprintf("ALTER TABLE `%s` CONVERT TO CHARACTER SET `%s` COLLATE `%s`", table.Name, charset, collation)); err != nil {
return err
}
}
Expand Down
131 changes: 131 additions & 0 deletions models/db/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,114 @@ func InitEngine(ctx context.Context) error {
return nil
}

func findCaseSensitiveCollation() (string, error) {
if x.Dialect().URI().DBType != schemas.MYSQL {
return "", nil
}

v, err := x.DBVersion()
if err != nil {
return "", nil
}

var collation string
switch v.Edition {
case "MariaDB":
collation = "uca1400_as_cs"
default:
collation = "utf8mb4_0900_as_cs"
}

return collation, nil
}

func GetDesiredCharsetAndCollation() (string, string, error) {
if x.Dialect().URI().DBType != schemas.MYSQL {
return "", "", nil
}

var charset string
var collation string
var err error
if setting.Database.DefaultCharset == "" {
charset = "utf8mb4"
} else {
charset = setting.Database.DefaultCharset
}
if setting.Database.DefaultCollation == "" {
collation, err = findCaseSensitiveCollation()
if err != nil {
return "", "", err
}
} else {
collation = setting.Database.DefaultCollation
}
return charset, collation, nil
}

func SanityCheck() error {
// We do not have any sanity checks for engines other than MySQL
if !setting.Database.Type.IsMySQL() {
return nil
}

expectedCharset, expectedCollation, err := GetDesiredCharsetAndCollation()
if err != nil {
return err
}

// check that the database collation is set to a case sensitive one.
var collation []string
_, err = x.SQL("SELECT default_collation_name FROM information_schema.schemata WHERE schema_name = ?",
setting.Database.Name).Get(&collation)
if err != nil {
return err
}
// For mariadb, when we set the collation to uca1400_as_cs, that is
// translated to utf8mb4_uca1400_as_cs, hence the suffix check.
if !strings.HasSuffix(collation[0], expectedCollation) {
return fmt.Errorf(`database collation ("%s") is not %s. Consider running "gitea doctor convert"`, collation[0], expectedCollation)
}

// check the database character set
var charset []string
_, err = x.SQL("SELECT default_character_set_name FROM information_schema.schemata WHERE schema_name = ?", setting.Database.Name).Get(&charset)
if err != nil {
return err
}
if charset[0] != expectedCharset {
return fmt.Errorf(`database charset ("%s") is not %s. Consider running "gitea doctor convert"`, charset[0], expectedCharset)
}

// check table collations and character sets
tables, err := x.DBMetas()
if err != nil {
return err
}
for _, table := range tables {
_, err := x.SQL("SELECT CCSA.character_set_name FROM information_schema.tables T, information_schema.collation_character_set_applicability CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = ? AND T.table_name = ?",
setting.Database.Name, table.Name).Get(&charset)
if err != nil {
return err
}
if charset[0] != expectedCharset {
return fmt.Errorf(`table charset for '%s' (%s) is not %s. Consider running "gitea doctor convert"`, table.Name, charset[0], expectedCharset)
}

_, err = x.SQL("SELECT CCSA.collation_name FROM information_schema.tables T, information_schema.collation_character_set_applicability CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = ? AND T.table_name = ?",
setting.Database.Name, table.Name).Get(&collation)
if err != nil {
return err
}
if !strings.HasSuffix(collation[0], expectedCollation) {
return fmt.Errorf(`table collation for '%s' (%s) is not %s. Consider running "gitea doctor convert"`, table.Name, collation[0], expectedCollation)
}
}

// if all is well, return without an error
return nil
}

// SetDefaultEngine sets the default engine for db
func SetDefaultEngine(ctx context.Context, eng *xorm.Engine) {
x = eng
Expand Down Expand Up @@ -185,6 +293,29 @@ func InitEngineWithMigration(ctx context.Context, migrateFunc func(*xorm.Engine)
return err
}

// If we're using MySQL, and there are no tables, set the database charaset
// and collation to the desired ones. This will help cases where the
// database is created automatically, and with the wrong settings (such as
// when using the official mysql/mariadb container images).
if x.Dialect().URI().DBType == schemas.MYSQL {
tables, err := x.DBMetas()
if err != nil {
return err
}

if len(tables) == 0 {
charset, collation, err := GetDesiredCharsetAndCollation()
if err != nil {
return err
}

_, err = x.Exec(fmt.Sprintf("ALTER DATABASE `%s` DEFAULT CHARACTER SET `%s` COLLATE `%s`", setting.Database.Name, charset, collation))
if err != nil {
return err
}
}
}

// We have to run migrateFunc here in case the user is re-running installation on a previously created DB.
// If we do not then table schemas will be changed and there will be conflicts when the migrations run properly.
//
Expand Down
2 changes: 1 addition & 1 deletion models/git/branch.go
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ func (err ErrBranchesEqual) Unwrap() error {
type Branch struct {
ID int64
RepoID int64 `xorm:"UNIQUE(s)"`
Name string `xorm:"UNIQUE(s) NOT NULL"` // git's ref-name is case-sensitive internally, however, in some databases (mssql, mysql, by default), it's case-insensitive at the moment
Name string `xorm:"UNIQUE(s) NOT NULL"`
CommitID string
CommitMessage string `xorm:"TEXT"` // it only stores the message summary (the first line)
PusherID int64
Expand Down
11 changes: 7 additions & 4 deletions modules/setting/database.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ var (
SSLMode string
Path string
LogSQL bool
MysqlCharset string
Timeout int // seconds
SQLiteJournalMode string
DBConnectRetries int
Expand All @@ -44,6 +43,8 @@ var (
ConnMaxLifetime time.Duration
IterateBufferSize int
AutoMigration bool
DefaultCharset string
DefaultCollation string
}{
Timeout: 500,
IterateBufferSize: 50,
Expand All @@ -67,7 +68,6 @@ func loadDBSetting(rootCfg ConfigProvider) {
}
Database.Schema = sec.Key("SCHEMA").String()
Database.SSLMode = sec.Key("SSL_MODE").MustString("disable")
Database.MysqlCharset = sec.Key("MYSQL_CHARSET").MustString("utf8mb4") // do not document it, end users won't need it.

Database.Path = sec.Key("PATH").MustString(filepath.Join(AppDataPath, "gitea.db"))
Database.Timeout = sec.Key("SQLITE_TIMEOUT").MustInt(500)
Expand All @@ -86,6 +86,9 @@ func loadDBSetting(rootCfg ConfigProvider) {
Database.DBConnectRetries = sec.Key("DB_RETRIES").MustInt(10)
Database.DBConnectBackoff = sec.Key("DB_RETRY_BACKOFF").MustDuration(3 * time.Second)
Database.AutoMigration = sec.Key("AUTO_MIGRATION").MustBool(true)

Database.DefaultCharset = sec.Key("DEFAULT_CHARSET").String()
Database.DefaultCollation = sec.Key("DEFAULT_COLLATION").String()
}

// DBConnStr returns database connection string
Expand All @@ -105,8 +108,8 @@ func DBConnStr() (string, error) {
if tls == "disable" { // allow (Postgres-inspired) default value to work in MySQL
tls = "false"
}
connStr = fmt.Sprintf("%s:%s@%s(%s)/%s%scharset=%s&parseTime=true&tls=%s",
Database.User, Database.Passwd, connType, Database.Host, Database.Name, paramSep, Database.MysqlCharset, tls)
connStr = fmt.Sprintf("%s:%s@%s(%s)/%s%sparseTime=true&tls=%s",
Database.User, Database.Passwd, connType, Database.Host, Database.Name, paramSep, tls)
case "postgres":
connStr = getPostgreSQLConnectionString(Database.Host, Database.User, Database.Passwd, Database.Name, Database.SSLMode)
case "mssql":
Expand Down
Loading