Closed
Description
What version of Go are you using (go version
)?
$ go version go version go1.15beta1 linux/amd64
Does this issue reproduce with the latest release?
Yes, 1.15beta1 is the latest release.
What operating system and processor architecture are you using (go env
)?
Linux amd64
What did you do?
Upgrade one production service instance from Go 1.14.4 to Go 1.15beta1. It's a quite complicated proxy-like service, currently, I can't reproduce by a simple test case.
What did you expect to see?
Run well as before.
What did you see instead?
The latency of this service instance sharply increases. I dump a trace of this process and find out after sweep termination, mark assist use nearly all cpus in about 20ms:
MMU:
part of GC trace:
gc 1958 @1686.290s 2%: 0.73+32+0.12 ms clock, 17+422/185/0.46+2.9 ms cpu, 1538->1541->788 MB, 1577 MB goal, 24 P
gc 1959 @1687.261s 2%: 0.34+31+0.13 ms clock, 8.3+397/176/0+3.2 ms cpu, 1537->1541->795 MB, 1576 MB goal, 24 P
gc 1960 @1688.215s 2%: 0.60+31+0.12 ms clock, 14+408/176/0+2.8 ms cpu, 1551->1554->796 MB, 1590 MB goal, 24 P
gc 1961 @1689.211s 2%: 0.44+31+0.10 ms clock, 10+409/177/0.036+2.6 ms cpu, 1552->1556->806 MB, 1592 MB goal, 24 P
gc 1962 @1690.182s 2%: 0.78+32+0.16 ms clock, 18+440/185/0.006+3.8 ms cpu, 1572->1578->803 MB, 1613 MB goal, 24 P
gc 1963 @1691.177s 2%: 0.28+39+0.19 ms clock, 6.7+434/213/0.012+4.7 ms cpu, 1566->1569->802 MB, 1606 MB goal, 24 P
gc 1964 @1692.139s 2%: 2.0+37+0.14 ms clock, 48+447/211/0+3.4 ms cpu, 1564->1567->812 MB, 1604 MB goal, 24 P
gc 1965 @1693.148s 2%: 0.50+40+0.18 ms clock, 12+477/224/0+4.3 ms cpu, 1584->1587->821 MB, 1625 MB goal, 24 P
gc 1966 @1694.139s 2%: 0.98+41+0.15 ms clock, 23+405/234/0.20+3.7 ms cpu, 1602->1605->814 MB, 1642 MB goal, 24 P
gc 1967 @1695.135s 2%: 1.2+30+0.13 ms clock, 30+439/170/0+3.2 ms cpu, 1588->1592->799 MB, 1629 MB goal, 24 P
gc 1968 @1696.128s 2%: 2.5+36+0.14 ms clock, 61+463/203/0+3.3 ms cpu, 1559->1561->799 MB, 1598 MB goal, 24 P
gc 1969 @1697.089s 2%: 0.40+37+1.3 ms clock, 9.6+480/206/0+32 ms cpu, 1558->1561->795 MB, 1598 MB goal, 24 P
gc 1970 @1698.056s 2%: 9.5+30+1.1 ms clock, 230+509/222/0+26 ms cpu, 1550->1552->793 MB, 1590 MB goal, 24 P
gc 1971 @1699.022s 2%: 3.0+34+0.17 ms clock, 73+457/196/0+4.1 ms cpu, 1547->1550->793 MB, 1586 MB goal, 24 P
gc 1972 @1699.982s 2%: 0.62+34+0.11 ms clock, 15+446/195/0+2.8 ms cpu, 1546->1549->792 MB, 1586 MB goal, 24 P
gc 1973 @1700.970s 2%: 1.0+35+0.15 ms clock, 24+468/195/0+3.7 ms cpu, 1546->1548->790 MB, 1585 MB goal, 24 P
gc 1974 @1701.951s 2%: 0.64+32+0.098 ms clock, 15+431/175/0+2.3 ms cpu, 1542->1545->792 MB, 1581 MB goal, 24 P
gc 1975 @1702.916s 2%: 1.0+30+0.16 ms clock, 26+378/170/0+3.9 ms cpu, 1545->1548->794 MB, 1584 MB goal, 24 P
gc 1976 @1703.899s 2%: 0.86+31+0.10 ms clock, 20+401/164/0.009+2.4 ms cpu, 1548->1552->799 MB, 1588 MB goal, 24 P
gc 1977 @1704.875s 2%: 0.94+33+0.25 ms clock, 22+449/190/0+6.1 ms cpu, 1559->1562->791 MB, 1599 MB goal, 24 P
gc 1978 @1705.841s 2%: 0.25+34+0.19 ms clock, 6.1+439/193/0+4.7 ms cpu, 1544->1547->784 MB, 1583 MB goal, 24 P
gc 1979 @1706.801s 2%: 1.3+39+0.16 ms clock, 32+427/210/0+3.8 ms cpu, 1529->1533->785 MB, 1568 MB goal, 24 P
gc 1980 @1707.770s 2%: 1.0+31+0.19 ms clock, 24+355/179/0+4.7 ms cpu, 1532->1535->785 MB, 1571 MB goal, 24 P
gc 1981 @1708.732s 2%: 0.64+31+0.10 ms clock, 15+454/180/0+2.5 ms cpu, 1532->1535->786 MB, 1571 MB goal, 24 P
gc 1982 @1709.702s 2%: 0.30+27+0.097 ms clock, 7.2+357/151/0+2.3 ms cpu, 1534->1537->789 MB, 1573 MB goal, 24 P
gc 1983 @1710.669s 2%: 0.60+27+0.10 ms clock, 14+350/147/0+2.5 ms cpu, 1540->1543->789 MB, 1579 MB goal, 24 P
gc 1984 @1711.640s 2%: 0.66+28+0.27 ms clock, 16+405/161/0+6.5 ms cpu, 1538->1542->788 MB, 1578 MB goal, 24 P
gc 1985 @1712.588s 2%: 16+22+0.16 ms clock, 391+441/216/0.068+3.8 ms cpu, 1537->1540->789 MB, 1576 MB goal, 24 P
gc 1986 @1713.566s 2%: 0.35+29+0.13 ms clock, 8.5+419/170/0+3.2 ms cpu, 1539->1542->791 MB, 1578 MB goal, 24 P
gc 1987 @1714.545s 2%: 0.45+28+0.088 ms clock, 10+371/160/0.24+2.1 ms cpu, 1543->1545->790 MB, 1582 MB goal, 24 P
gc 1988 @1715.523s 2%: 0.37+29+0.12 ms clock, 8.9+388/166/0.51+3.1 ms cpu, 1540->1544->792 MB, 1580 MB goal, 24 P
gc 1989 @1716.481s 2%: 0.29+31+0.13 ms clock, 7.0+420/178/0.10+3.1 ms cpu, 1545->1548->791 MB, 1584 MB goal, 24 P
gc 1990 @1717.439s 2%: 0.36+33+0.14 ms clock, 8.7+425/191/0.064+3.4 ms cpu, 1543->1546->791 MB, 1582 MB goal, 24 P
gc 1991 @1718.417s 2%: 0.26+30+0.21 ms clock, 6.3+346/174/0+5.2 ms cpu, 1543->1546->790 MB, 1582 MB goal, 24 P
gc 1992 @1719.391s 2%: 1.6+34+0.18 ms clock, 40+415/178/0+4.3 ms cpu, 1542->1544->790 MB, 1581 MB goal, 24 P
gc 1993 @1720.356s 2%: 0.27+33+0.37 ms clock, 6.5+414/180/0.40+8.9 ms cpu, 1542->1545->791 MB, 1581 MB goal, 24 P
gc 1994 @1721.330s 2%: 0.29+33+0.16 ms clock, 7.1+470/194/0.067+3.9 ms cpu, 1544->1547->793 MB, 1583 MB goal, 24 P
gc 1995 @1722.308s 2%: 0.33+36+0.14 ms clock, 7.9+501/209/0+3.4 ms cpu, 1547->1551->790 MB, 1586 MB goal, 24 P
gc 1996 @1723.280s 2%: 1.8+36+0.14 ms clock, 45+432/206/0+3.4 ms cpu, 1542->1545->793 MB, 1581 MB goal, 24 P
gc 1997 @1724.249s 2%: 0.37+35+0.82 ms clock, 8.9+380/195/0+19 ms cpu, 1547->1550->795 MB, 1586 MB goal, 24 P
gc 1998 @1725.216s 2%: 0.29+34+0.10 ms clock, 7.1+402/190/0.22+2.4 ms cpu, 1551->1553->793 MB, 1591 MB goal, 24 P
gc 1999 @1726.207s 2%: 0.37+34+0.13 ms clock, 8.8+411/196/0+3.3 ms cpu, 1547->1550->788 MB, 1586 MB goal, 24 P
gc 2000 @1727.180s 2%: 0.29+28+0.10 ms clock, 6.9+391/155/0.070+2.5 ms cpu, 1538->1542->794 MB, 1577 MB goal, 24 P
gc 2001 @1728.144s 2%: 0.71+24+0.093 ms clock, 17+351/138/0+2.2 ms cpu, 1548->1551->805 MB, 1588 MB goal, 24 P
gc 2002 @1729.131s 2%: 0.33+40+0.19 ms clock, 8.1+501/226/0+4.5 ms cpu, 1572->1576->809 MB, 1611 MB goal, 24 P
gc 2003 @1730.113s 2%: 0.33+33+0.16 ms clock, 8.0+431/190/0+3.8 ms cpu, 1578->1581->795 MB, 1618 MB goal, 24 P
gc 2004 @1731.079s 2%: 1.3+36+0.18 ms clock, 31+478/203/0+4.4 ms cpu, 1551->1555->796 MB, 1591 MB goal, 24 P
gc 2005 @1732.043s 2%: 0.36+32+0.11 ms clock, 8.8+454/189/0+2.6 ms cpu, 1552->1555->795 MB, 1592 MB goal, 24 P
gc 2006 @1733.020s 2%: 0.29+33+0.31 ms clock, 7.1+386/187/0+7.4 ms cpu, 1552->1555->795 MB, 1591 MB goal, 24 P
gc 2007 @1733.966s 2%: 0.33+40+0.10 ms clock, 8.0+498/230/0.12+2.5 ms cpu, 1551->1555->796 MB, 1591 MB goal, 24 P
gc 2008 @1734.935s 2%: 3.3+32+0.13 ms clock, 81+458/180/0+3.1 ms cpu, 1553->1556->796 MB, 1592 MB goal, 24 P
gc 2009 @1735.898s 2%: 15+24+0.18 ms clock, 371+515/220/0+4.4 ms cpu, 1553->1556->794 MB, 1592 MB goal, 24 P
gc 2010 @1736.868s 2%: 0.50+32+0.15 ms clock, 12+448/181/0+3.8 ms cpu, 1549->1552->795 MB, 1588 MB goal, 24 P
gc 2011 @1737.833s 2%: 1.6+33+0.10 ms clock, 40+384/186/0.55+2.6 ms cpu, 1551->1553->791 MB, 1590 MB goal, 24 P
gc 2012 @1738.805s 2%: 0.73+41+0.18 ms clock, 17+550/233/0+4.3 ms cpu, 1543->1546->795 MB, 1582 MB goal, 24 P
Metadata
Metadata
Assignees
Labels
Type
Projects
Relationships
Development
No branches or pull requests
Activity
ALTree commentedon Jul 15, 2020
cc @aclements @mknyszek
mknyszek commentedon Jul 15, 2020
@choleraehyq thanks for the detailed report!
I'll take a look through changes that were made in the runtime that could be relevant, since I can't think of what could be causing this off the top of my head.
Judging by the
gctrace
,GOGC
is 100. But, my main observation is that GCs frequently seem to start too early. For instance, look at GC#2000. At the end of that GC we had 794 MiB live, setting the goal for GC#2001 at 1588 MiB. However, if you look at when a GC started and ended in that cycle, it started at heap size 1548 MiB and ended at 1551 MiB. This pattern is consistent across nearly all pairs of GCs in the trace you posted. This pattern indicates to me that the pacer thinks there's a lot of work to do and has driven the GCs to start earlier, but they're ending earlier than expected. Can you easily produce a comparablegctrace
for Go 1.14.4? It would be good to see how the behavior has changed (aside from just mark assist really taking over CPU cycles during GC). Also running withGODEBUG=gctrace=1,gcpacertrace=1
would give us more insight into where the pacer is at during all of this.My other observation is that we're spending a comparatively high amount of CPU time in mark assists vs. background marking (like 410 vs. 180 on (a very very rough guess) average).
choleraehyq commentedon Jul 16, 2020
@mknyszek Thanks for your response. I'll produce
GODEBUG=gctrace=1,gcpacertrace=1
for Go 1.14.4 and Go1.15beta1 later.choleraehyq commentedon Jul 16, 2020
@mknyszek
1.15beta1:
1.14.4
mknyszek commentedon Jul 16, 2020
Thanks for the quick turnaround!
Ah, OK. I see what's going on re: early GCs. GCs appear to be relatively cheap for your application, so we're maxing out on how late we can start a GC. This happens in both 1.14 and 1.15, so that on its own shouldn't be the problem. Otherwise, at first glance, these two traces look very similar. I'll take a closer look.
mknyszek commentedon Jul 23, 2020
Sorry for the delayed reply.
Taking a closer look at both the traces, I don't see any significant differences that would lead to what you see in the execution trace. CPU time spent in assists and background GC are similar in both, though there's a lot more time spent in assists than in background GC. This suggests to me that your application is allocating fairly heavily, but there isn't much work for the GC to do (either there are just a few big allocations, there isn't much depth in the points-to graph, or most of the heap just doesn't survive). Maybe one difference is that
u_a
seems a little higher in Go 1.14, suggesting that perhaps assists are more effective in Go 1.14? But, it fluctuates a lot so it's hard to tell. It's notable though thatu_a
(the actual utilization) is 2x higher than the goal utilization (u_g
) which is kind of surprising, and does align with all the time spent in assists. I feel like there may be a broader pacing bug here? The GC should be trying to bring down the utilization in both 1.14.4 and 1.15 by making the GCs longer (see "Goal 2." under "Optimization Goals" in https://docs.google.com/document/d/1wmjrocXIWTr1JxU-3EQBI6BK6KgtiFArkG47XK73xIQ/edit#).@randall77 @aclements any thoughts on this?
Just to be totally clear, is that execution trace screencap you included reproducible in Go 1.14 or is this issue totally new? The data in each GC trace suggests that this should be reproducible.
choleraehyq commentedon Jul 25, 2020
@mknyszek I take a look at the execution trace of Go1.14, it's indeed quite similar to Go1.15. But my application latency is really higher using Go1.15, with the same application code and workload. I'll dig into it later.
runtime: export total GC Assist ns in MemStats and GCStats
gopherbot commentedon Sep 19, 2022
Change https://go.dev/cl/431877 mentions this issue:
runtime: export total GC Assist ns in MemStats and GCStats
mknyszek commentedon Sep 19, 2022
Mark assists should have gone down significantly in a variety of cases thanks to the pacer changes in Go 1.18 (specifically, the target mark assist percentage is now 0). If you have a chance @choleraehyq, please double-check! I'm inclined to mark this as resolved for now.
choleraehyq commentedon Sep 22, 2022
@mknyszek I don't find any latency problem now, thanks!