Skip to content

Language column in language_stat table is too small (needs to be at least 34) #12379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 7 tasks
somera opened this issue Jul 30, 2020 · 6 comments · Fixed by #12396
Closed
2 of 7 tasks

Language column in language_stat table is too small (needs to be at least 34) #12379

somera opened this issue Jul 30, 2020 · 6 comments · Fixed by #12396
Labels

Comments

@somera
Copy link

somera commented Jul 30, 2020

  • Gitea version (or commit ref): 1.12.3
  • Git version: 2.25.1
  • Operating system: Linux nuc-mini-server 5.4.0-42-generic API endpoints for stars #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • Not relevant
  • Log gist:
2020/07/31 11:33:25 ...m.io/xorm/core/tx.go:157:QueryContext() [I] [SQL] INSERT INTO "language_stat" ("repo_id","commit_id","is_primary","language","size","created_unix") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id" [786 dd0f55d6cee3bcf7d483522684933dc73f6b1831 true Glyph Bitmap Distribution Format 141557 1596188005] - 259.548µs
2020/07/31 11:33:25 ...o/xorm/session_tx.go:46:Rollback() [I] [SQL] ROLL BACK [] - 133.808µs
2020/07/30 19:20:46 ...dexer/stats/queue.go:24:handle() [E] stats queue idexer.Index(786) failed: pq: Wert z                                                                                               u lang für Typ character varying(30)
2020/07/31 11:33:25 ...dexer/stats/queue.go:24:handle() [E] stats queue idexer.Index(786) failed: pq: Wert zu lang für Typ character varying(30)

Enry v2 can detect languages with names longer than the currently provided maximum of 30 characters.

A quick look at Enry's source code demonstrates that the current maximum length for a detected language is 34 characters (see below code).

We therefore need to provide a migration to increase the size of this column or consider forcibly shortening Enry's detected language.


package main

import (
	"fmt"

	"github.com/go-enry/go-enry/v2/data"
)

func main() {
	maxLangLen := 0
	maxLang := ""
	for lang := range data.ExtensionsByLanguage {
		if len(lang) > maxLangLen {
			maxLang = lang
			maxLangLen = len(lang)
		}
	}
	for _, vals := range data.LanguagesByExtension {
		for _, lang := range vals {
			if len(lang) > maxLangLen {
				maxLang = lang
				maxLangLen = len(lang)
			}
		}
	}
	for lang := range data.LanguagesLogProbabilities {
		if len(lang) > maxLangLen {
			maxLang = lang
			maxLangLen = len(lang)
		}
	}
	fmt.Println("Max", maxLangLen, maxLang)
}
@somera somera changed the title DB Columnt to short error DB Column to short error Jul 30, 2020
@zeripath
Copy link
Contributor

@somera are there really no other lines there?

That pq log looks very difficult to read.

I'd also turn off stacktrace logging it's just not useful in general.

@somera
Copy link
Author

somera commented Jul 31, 2020

@somera are there really no other lines there?

That pq log looks very difficult to read.

I'd also turn off stacktrace logging it's just not useful in general.

Is there a connection to #12380?

I see this after Gitea (re)start prozess:

==> xorm.log <==
2020/07/31 11:33:25 ...m.io/xorm/core/db.go:154:QueryContext() [I] [SQL] SELECT "id", "owner_id", "owner_name", "lower_name", "name", "description", "website", "original_service_type", "original_url", "default_branch", "num_watches", "num_stars", "num_forks", "num_issues", "num_closed_issues", "num_pulls", "num_closed_pulls", "num_milestones", "num_closed_milestones", "is_private", "is_empty", "is_archived", "is_mirror", "status", "is_fork", "fork_id", "is_template", "template_id", "size", "is_fsck_enabled", "close_issues_via_commit_in_any_branch", "topics", "avatar", "created_unix", "updated_unix" FROM "repository" WHERE "id"=$1 LIMIT 1 [786] - 325.673µs
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/core/db.go:154 (0xb96044)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_raw.go:56 (0xc13923)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_get.go:105 (0xc05617)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_get.go:92 (0xc049a5)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_get.go:25 (0xc0452e)
        /go/src/code.gitea.io/gitea/models/repo.go:1737 (0x118a91c)
        /go/src/code.gitea.io/gitea/models/repo.go:1748 (0x16b60d2)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/db.go:18 (0x16b60a9)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/queue.go:23 (0x16b6b5a)
        /go/src/code.gitea.io/gitea/modules/queue/unique_queue_channel.go:59 (0x16a523c)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:383 (0x16a3cc1)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:238 (0x16a6042)
        /usr/local/go/src/runtime/asm_amd64.s:1373 (0x46f030)


2020/07/31 11:33:25 ...m.io/xorm/core/db.go:154:QueryContext() [I] [SQL] SELECT "id", "repo_id", "commit_sha", "indexer_type" FROM "repo_indexer_status" WHERE ("indexer_type" = $1) AND "repo_id"=$2 LIMIT 1 [1 786] - 217.979µs
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/core/db.go:154 (0xb96044)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_raw.go:56 (0xc13923)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_get.go:105 (0xc05617)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_get.go:92 (0xc049a5)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_get.go:25 (0xc0452e)
        /go/src/code.gitea.io/gitea/models/repo_indexer.go:72 (0x1199200)
        /go/src/code.gitea.io/gitea/models/repo_indexer.go:89 (0x16b6127)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/db.go:26 (0x16b60fd)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/queue.go:23 (0x16b6b5a)
        /go/src/code.gitea.io/gitea/modules/queue/unique_queue_channel.go:59 (0x16a523c)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:383 (0x16a3cc1)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:238 (0x16a6042)
        /usr/local/go/src/runtime/asm_amd64.s:1373 (0x46f030)


2020/07/31 11:33:25 ...m.io/xorm/core/tx.go:36:BeginTx() [I] [SQL] BEGIN TRANSACTION [] - 183.399µs
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/core/tx.go:36 (0xb9cd86)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_tx.go:16 (0xc1a2d6)
        /go/src/code.gitea.io/gitea/models/repo_language_stats.go:111 (0x119aabb)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/db.go:53 (0x16b62d0)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/queue.go:23 (0x16b6b5a)
        /go/src/code.gitea.io/gitea/modules/queue/unique_queue_channel.go:59 (0x16a523c)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:383 (0x16a3cc1)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:238 (0x16a6042)
        /usr/local/go/src/runtime/asm_amd64.s:1373 (0x46f030)


2020/07/31 11:33:25 ...m.io/xorm/core/tx.go:157:QueryContext() [I] [SQL] SELECT "id", "repo_id", "commit_id", "is_primary", "language", "size", "created_unix" FROM "language_stat" WHERE ("repo_id" = $1) ORDER BY "size" DESC [786] - 398.125µs
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/core/tx.go:157 (0xb9dfe3)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_raw.go:63 (0xc13abf)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_find.go:152 (0xc005be)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_find.go:148 (0xbffd1e)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_find.go:31 (0xbff413)
        /go/src/code.gitea.io/gitea/models/repo_language_stats.go:65 (0x119a367)
        /go/src/code.gitea.io/gitea/models/repo_language_stats.go:116 (0x119ab14)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/db.go:53 (0x16b62d0)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/queue.go:23 (0x16b6b5a)
        /go/src/code.gitea.io/gitea/modules/queue/unique_queue_channel.go:59 (0x16a523c)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:383 (0x16a3cc1)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:238 (0x16a6042)
        /usr/local/go/src/runtime/asm_amd64.s:1373 (0x46f030)


2020/07/31 11:33:25 ...m.io/xorm/core/tx.go:157:QueryContext() [I] [SQL] INSERT INTO "language_stat" ("repo_id","commit_id","is_primary","language","size","created_unix") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id" [786 dd0f55d6cee3bcf7d483522684933dc73f6b1831 false Python 8871 1596188005] - 392.057µs
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/core/tx.go:157 (0xb9dfe3)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_raw.go:63 (0xc13abf)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_raw.go:127 (0xc14408)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_insert.go:383 (0xc0b9aa)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_insert.go:84 (0xc0891e)
        /go/src/code.gitea.io/gitea/models/repo_language_stats.go:147 (0x119aff2)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/db.go:53 (0x16b62d0)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/queue.go:23 (0x16b6b5a)
        /go/src/code.gitea.io/gitea/modules/queue/unique_queue_channel.go:59 (0x16a523c)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:383 (0x16a3cc1)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:238 (0x16a6042)
        /usr/local/go/src/runtime/asm_amd64.s:1373 (0x46f030)


2020/07/31 11:33:25 ...m.io/xorm/core/tx.go:157:QueryContext() [I] [SQL] INSERT INTO "language_stat" ("repo_id","commit_id","is_primary","language","size","created_unix") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id" [786 dd0f55d6cee3bcf7d483522684933dc73f6b1831 true Glyph Bitmap Distribution Format 141557 1596188005] - 259.548µs
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/core/tx.go:157 (0xb9dfe3)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_raw.go:63 (0xc13abf)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_raw.go:127 (0xc14408)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_insert.go:383 (0xc0b9aa)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_insert.go:84 (0xc0891e)
        /go/src/code.gitea.io/gitea/models/repo_language_stats.go:147 (0x119aff2)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/db.go:53 (0x16b62d0)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/queue.go:23 (0x16b6b5a)
        /go/src/code.gitea.io/gitea/modules/queue/unique_queue_channel.go:59 (0x16a523c)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:383 (0x16a3cc1)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:238 (0x16a6042)
        /usr/local/go/src/runtime/asm_amd64.s:1373 (0x46f030)


2020/07/31 11:33:25 ...o/xorm/session_tx.go:46:Rollback() [I] [SQL] ROLL BACK [] - 133.808µs
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_tx.go:46 (0xc1a670)
        /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session.go:135 (0xbeb29f)
        /go/src/code.gitea.io/gitea/models/repo_language_stats.go:154 (0x119b02b)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/db.go:53 (0x16b62d0)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/queue.go:23 (0x16b6b5a)
        /go/src/code.gitea.io/gitea/modules/queue/unique_queue_channel.go:59 (0x16a523c)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:383 (0x16a3cc1)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:238 (0x16a6042)
        /usr/local/go/src/runtime/asm_amd64.s:1373 (0x46f030)



==> gitea.log <==
2020/07/31 11:33:25 ...dexer/stats/queue.go:24:handle() [E] stats queue idexer.Index(786) failed: pq: Wert zu lang für Typ character varying(30)
        /go/src/code.gitea.io/gitea/modules/indexer/stats/queue.go:24 (0x16b6b97)
        /go/src/code.gitea.io/gitea/modules/queue/unique_queue_channel.go:59 (0x16a523c)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:383 (0x16a3cc1)
        /go/src/code.gitea.io/gitea/modules/queue/workerpool.go:238 (0x16a6042)
        /usr/local/go/src/runtime/asm_amd64.s:1373 (0x46f030)

@zeripath
Copy link
Contributor

You are likely seeing the logging in #12380 because this is failing - #12380 is not a really a problem and I've proposed a PR to suggest that that logging goes down to Debug level.

The issue here is that the language of at least one of your files has been detected as Glyph Bitmap Distribution Format

That is clearly > 30 characters - which is then caused postgres to report: Wert zu lang für Typ character varying(30)

I don't know much about the language stats detector but have you explicitly added types etc?

@somera
Copy link
Author

somera commented Jul 31, 2020

You are likely seeing the logging in #12380 because this is failing - #12380 is not a really a problem and I've proposed a PR to suggest that that logging goes down to Debug level.

The issue here is that the language of at least one of your files has been detected as Glyph Bitmap Distribution Format

That is clearly > 30 characters - which is then caused postgres to report: Wert zu lang für Typ character varying(30)

I don't know much about the language stats detector but have you explicitly added types etc?

Which types? Where? PostgreSQL?

My PostgreSQL 11 is default installation without any special exensions.

@zeripath
Copy link
Contributor

OK it looks like enry the package that detects language types from files has a maximum detected language length of 34. Looks like that needs to be changed in the code and we need to think of a migration for this.

As a fix for you right now:

In Postgres do: ALTER TABLE language_stat ALTER COLUMN language TYPE VARCHAR(34)

or even set it to ALTER TABLE language_stat ALTER COLUMN language TYPE VARCHAR(50) in case enry decides that the languages need to be longer.

@zeripath zeripath changed the title DB Column to short error Language column in language_stat table is too small (needs to be at least 34) Jul 31, 2020
@somera
Copy link
Author

somera commented Jul 31, 2020

thx. now it works. I changed it to 50.

image

zeripath added a commit to zeripath/gitea that referenced this issue Aug 2, 2020
In go-gitea#12379 it was discovered that enry v2 has a maximum language length
of 34 characters which is larger than the 30 previously provided.

This PR updates the language column to 50.

Fix go-gitea#12379

Signed-off-by: Andrew Thornton <[email protected]>
zeripath added a commit that referenced this issue Aug 4, 2020
In #12379 it was discovered that enry v2 has a maximum language length
of 34 characters which is larger than the 30 previously provided.

This PR updates the language column to 50.

Fix #12379
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants