Skip to content

Querier: Queries hang forever after Ingester running several hours later sometimes #685

@mizeng

Description

@mizeng

Describe the bug
After the Loki Ingesters running for sometime, Log Query through Querier will hang for ever. However label query works at the same time.

To Reproduce
Steps to reproduce the behavior:

  1. Started loki in separate mode, keep ingesting logs and let Ingester run for several hours.
  2. Query: ./logcli --addr="http://querier:3100" --username="admin" --password="admin" labels works.
  3. Query: ./logcli --addr="http://querier:3100" --username="admin" --password="admin" query '{threadName="LokiSink0"}' --since=10m hangs forever.
  4. Recreate the Ingester pod, then the query works again.
➜  loki git:(master) ✗ ./logcli --addr="http://querier:3100" --username="admin" --password="admin" query '{threadName="LokiSink0"}' --since=10m
2019/06/20 09:24:41 http://querier:3100/api/prom/query?query=%7BthreadName%3D%22LokiSink0%22%7D&limit=30&start=1560993281045092000&end=1560993881045092000&direction=BACKWARD&regexp=
2019/06/20 09:24:41 Common labels: {hostname="xxx", namespace="performanceTest", threadName="LokiSink0"}
2019-06-20T01:24:41Z {} Thu Jun 20 01:24:41 UTC 2019 This is performance test, thread=1, schedule=1000, logNum=10000

Expected behavior
Querier should not hang.

Environment:

  • Infrastructure: Kubernetes

Screenshots, promtail config, or terminal output
Env: 1 Distributor, 1 Ingester, 1 table-manger, 1 querier
Store:
Index Store: BoltDB in CephFS,
Chunk Store: FileSystem in CephFS

The Ingester is with 20GB max memory, and only uses about 8GB now.

Logs from Querier for above hang query:

level=debug ts=2019-06-20T01:09:14.433633657Z caller=http.go:114 request="&QueryRequest{Query:{threadName=\"LokiSink0\"},Limit:30,Start:2019-06-20 00:59:14.146264 +0000 UTC,End:2019-06-20 01:09:14.146264 +0000 UTC,Direction:BACKWARD,Regex:,}"
ts=2019-06-20T01:09:14.434250471Z caller=spanlogger.go:37 org_id=fake trace_id=73ceef22e1a03e75 method=SeriesStore.GetChunkRefs level=debug metric=logs
ts=2019-06-20T01:09:14.434300888Z caller=spanlogger.go:37 org_id=fake trace_id=73ceef22e1a03e75 method=SeriesStore.lookupSeriesByMetricNameMatchers level=debug metricName=logs matchers=1
ts=2019-06-20T01:09:14.434354001Z caller=spanlogger.go:37 org_id=fake trace_id=73ceef22e1a03e75 method=SeriesStore.lookupSeriesByMetricNameMatcher level=debug metricName=logs matcher="threadName=\"LokiSink0\""
ts=2019-06-20T01:09:14.434400838Z caller=spanlogger.go:37 org_id=fake trace_id=73ceef22e1a03e75 method=SeriesStore.lookupSeriesByMetricNameMatcher level=debug queries=1
ts=2019-06-20T01:09:14.434461812Z caller=spanlogger.go:37 org_id=fake trace_id=73ceef22e1a03e75 method=cachingIndexClient.cacheFetch level=debug hits=0 misses=1

Logs for Querier for above same query but not hang

level=debug ts=2019-06-20T01:40:13.733006595Z caller=http.go:114 request="&QueryRequest{Query:{threadName=\"LokiSink0\"},Limit:30,Start:2019-06-20 01:35:13.447327 +0000 UTC,End:2019-06-20 01:40:13.447327 +0000 UTC,Direction:BACKWARD,Regex:,}"
ts=2019-06-20T01:40:13.764794175Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.GetChunkRefs level=debug metric=logs
ts=2019-06-20T01:40:13.764880813Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.lookupSeriesByMetricNameMatchers level=debug metricName=logs matchers=1
ts=2019-06-20T01:40:13.765321343Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.lookupSeriesByMetricNameMatcher level=debug metricName=logs matcher="threadName=\"LokiSink0\""
ts=2019-06-20T01:40:13.765426524Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.lookupSeriesByMetricNameMatcher level=debug queries=1
ts=2019-06-20T01:40:13.765464052Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=cachingIndexClient.cacheFetch level=debug hits=0 misses=1
ts=2019-06-20T01:40:13.765596379Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.lookupSeriesByMetricNameMatcher level=debug entries=3
ts=2019-06-20T01:40:13.765628761Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.lookupSeriesByMetricNameMatcher level=debug ids=3
ts=2019-06-20T01:40:13.765646999Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.lookupSeriesByMetricNameMatchers level=debug msg="post intersection" ids=3
ts=2019-06-20T01:40:13.765660083Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.GetChunkRefs level=debug series-ids=3
ts=2019-06-20T01:40:13.765678606Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.lookupChunksBySeries level=debug seriesIDs=3
ts=2019-06-20T01:40:13.765696522Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.lookupChunksBySeries level=debug queries=3
ts=2019-06-20T01:40:13.765754328Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=cachingIndexClient.cacheFetch level=debug hits=0 misses=3
ts=2019-06-20T01:40:13.789312745Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.lookupChunksBySeries level=debug entries=0
ts=2019-06-20T01:40:13.789402637Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.GetChunkRefs level=debug chunk-ids=0
ts=2019-06-20T01:40:13.789417386Z caller=spanlogger.go:37 org_id=fake trace_id=649671e1a0b78f48 method=SeriesStore.GetChunkRefs level=debug chunks-post-filtering=0
level=debug ts=2019-06-20T01:40:29.336996744Z caller=logging.go:44 traceID=649671e1a0b78f48 msg="GET /api/prom/query?query=%7BthreadName%3D%22LokiSink0%22%7D&limit=30&start=1560994513447327000&end=1560994813447327000&direction=BACKWARD&regexp= (200) 15.604040415s"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions