Skip to content

VC4 still cause weston hang every time I use it #1354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Fantu opened this issue Mar 20, 2016 · 11 comments
Closed

VC4 still cause weston hang every time I use it #1354

Fantu opened this issue Mar 20, 2016 · 11 comments

Comments

@Fantu
Copy link
Contributor

Fantu commented Mar 20, 2016

Hi, I did many tests time ago with vc4 having always hang problem.
Now I did another fast test updating to latest kernel 4.4.6 downloaded from firmware next branch but the problem persist.

After rpi boot I also had this calltrace about drm if can be useful:

Jan 31 15:34:40 jessie-rpi kernel: [ 11.771346] WARNING: CPU: 3 PID: 0 at drivers/gpu/drm/drm_irq.c:287 drm_update_vblank_count+0x188/0x368 drm
Jan 31 15:34:40 jessie-rpi kernel: [ 11.781765] Modules linked in: binfmt_misc vc4 drm_kms_helper drm syscopyarea sysfillrect sysimgblt fb_sys_fops bcm2835_gpiomem i2c_bcm2708 bcm2835_wdt uio_pdrv_genirq uio evdev snd_bcm2835 snd_pcm snd_timer snd
Jan 31 15:34:40 jessie-rpi kernel: [ 11.801559] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.6-v7+ #861
Jan 31 15:34:40 jessie-rpi kernel: [ 11.808119] Hardware name: BCM2709
Jan 31 15:34:40 jessie-rpi kernel: [ 11.811672] <80018734> from <80014068>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.819684] <80014068> from <8031f364>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.827250] <8031f364> from <80025300>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.835620] <80025300> from <800253ec>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.844766] <800253ec> from [<7f08aa9c>](drm_update_vblank_count+0x188/0x368 [drm])
Jan 31 15:34:40 jessie-rpi kernel: [ 11.854953] [<7f08aa9c>](drm_update_vblank_count [drm]) from [<7f08acec>](vblank_disable_and_save+0x70/0x80 [drm])
Jan 31 15:34:40 jessie-rpi kernel: [ 11.865935] [<7f08acec>](vblank_disable_and_save [drm]) from [<7f08ad70>](vblank_disable_fn+0x74/0xa0 [drm])
Jan 31 15:34:40 jessie-rpi kernel: [ 11.876328] [<7f08ad70>](vblank_disable_fn [drm]) from <80081b10>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.885319] <80081b10> from <80081e30>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.893860] <80081e30> from <80029040>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.902312] <80029040> from <80029620>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.909871] <80029620> from <80070fd4>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.917969] <80070fd4> from <80010a74>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.926245] <80010a74> from <8000951c>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.935506] <8000951c> from <805ed284>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.944664] Exception stack(0xba141f48 to 0xba141f90)
Jan 31 15:34:40 jessie-rpi kernel: [ 11.949889] 1f40: 00000000 ba706348 00000000 00000000 ba140000 8089e5dc
Jan 31 15:34:40 jessie-rpi kernel: [ 11.958337] 1f60: 10c0387d 8089e580 805f1e54 80903758 00000000 ba141fa4 8089f4f8 ba141f98
Jan 31 15:34:40 jessie-rpi kernel: [ 11.966779] 1f80: 80010b34 80010b38 600b0013 ffffffff
Jan 31 15:34:40 jessie-rpi kernel: [ 11.972012] <805ed284> from <80010b38>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.979662] <80010b38> from <80063dbc>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.988028] <80063dbc> from <80063fe4>
Jan 31 15:34:40 jessie-rpi kernel: [ 11.996929] <80063fe4> from <80015c98>
Jan 31 15:34:40 jessie-rpi kernel: [ 12.006271] <80015c98> from <000095ac>

I saw recently bugfix backport in 4.4 branch but some seems missed or I'm wrong?
Probably a check here can be useful:
https://github.com/anholt/linux/commits/drm-vc4-4.5
Seems all vc4 commits (including some fixes) applied to kernel 4.5, probably apply the missed ones also to rpi branch can be useful and if possible also backport them to kernel 4.4.
I don't have time to try backport them, rebuild kernel and test it today but if can be useful I can try it next weekend.
Any advice is appreciated.

Thanks for any reply and sorry for my bad english.

@maxnet
Copy link
Contributor

maxnet commented Mar 22, 2016

Starting a Qt 4 embedded application that just wants to use the framebuffer (/dev/fb0) directly (without any acceleration or anything fancy) also hangs the system instantly when vc4 overlay is active.

No kernel messages or other errors, just complete hang (serial console no longer responds, cannot turn numlock on/off on keyboard)

@Fantu
Copy link
Contributor Author

Fantu commented Jun 16, 2016

I updated kernel to latest build from firmware repository and retried vc4, after short time rpi2 become unusable, this is a short cut from kern.log:

Jun 16 15:12:52 jessie-rpi kernel: [ 8944.211575] [drm:vc4_bo_create [vc4]] ERROR Failed to allocate from CMA:
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218542] [drm] num bos allocated: 285
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218558] [drm] size bos allocated: 242296kb
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218569] [drm] num bos used: 282
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218579] [drm] size bos used: 239976kb
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218589] [drm] num bos cached: 3
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218598] [drm] size bos cached: 2320kb
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218641] [drm:vc4_validate_bin_cl [vc4]] ERROR 0x00000000: packet 112 (VC4_PACKET_TILE_BINNING_MODE_CONFIG) failed to validate
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.237992] vc4-drm soc:gpu: failed to allocate buffer with size 1056768
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.244983] vc4-drm soc:gpu: failed to allocate buffer with size 1056768
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.251830] [drm:vc4_validate_bin_cl [vc4]] ERROR 0x00000000: packet 112 (VC4_PACKET_TILE_BINNING_MODE_CONFIG) failed to validate
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.264294] vc4-drm soc:gpu: failed to allocate buffer with size 1089536
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.271215] vc4-drm soc:gpu: failed to allocate buffer with size 1089536
Jun 16 15:12:54 jessie-rpi kernel: [ 8946.196277] [drm] Resetting GPU.

from a fast search one related problem seems solved in latest vc4 bugfix patches upstream but missed here in actual rpi production branch (4.4): http://www.spinics.net/lists/dri-devel/msg109520.html

@pelwell @anholt Can someone backports all missed fixes if possible please?

Thanks for any reply and sorry for my bad english

@pelwell
Copy link
Contributor

pelwell commented Jun 16, 2016

The upstream commit that I could find (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/drm_gem_cma_helper.c?id=afe705be38f1e65b173868486944377186b9f206) is a different solution to the same problem, but if Eric points us at the required commits or creates a PR we'll merge it.

@anholt
Copy link
Contributor

anholt commented Jun 26, 2016

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/drm_gem_cma_helper.c?id=afe705be38f1e65b173868486944377186b9f206 should only be relevant if you've got the change that triggered the gem_free_object calls and NULL pointer derefs. I don't see those in the logs here.

Given:

Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218542] [drm] num bos allocated: 285
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218558] [drm] size bos allocated: 242296kb

you've completely exhausted your CMA. If this is how much weston has allocated, then you're going to need to wait for me to implement swapping objects out of CMA. However, given:

Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218589] [drm] num bos cached: 3
Jun 16 15:12:52 jessie-rpi kernel: [ 8944.218598] [drm] size bos cached: 2320kb

we've either screwed up our stats or you've failed your allocation before we've purged the cache. vc4_bo_create() should be trying to purge the cache to get the job done. This is suspicious. Printing the current emit/finished_seqnos during stats dumping might be interesting to see if we've screwed up our wait somehow.

@mranostay
Copy link
Contributor

@anholt Question about swapping out of CMA. Would this to be a SWAP partition or compressing of objects?

@JamesH65
Copy link
Contributor

Closing due to lack of activity. Reopen if you feel this issue is still relevant.

@schloegl
Copy link
Contributor

This issue is still relevant, I'm running raspbian/stretch with kernel
Linux raspi3 4.9.41-v7+ #1023 SMP Tue Aug 8 16:00:15 BST 2017 armv7l GNU/Linux

and it crashes frequently when running certain applications (firefox, gcompris, etc.). Checking /var/log/syslog shows the exact some entries.
crash.logs.20170819.txt

@anholt
Copy link
Contributor

anholt commented Aug 21, 2017

Looks like rpi-4.9.y never got the backport of a fix from April. I've tacked it onto my rpi-4.9.y PR.

@popcornmix
Copy link
Collaborator

Latest rpi-update kernel includes the latest vc4 backports referred to by @anholt - does it help?

@schloegl
Copy link
Contributor

Thanks for the hint. I run rpi-update and rebooted. After putting some load on it which requires a lot of memory (firefox, and gcompris, and working with it for some time, the machine crash eventually, again.

Looking at the syslog, it does not seem to be the same error. I noticed the following issues

  1. /var/log/syslog contain now a number of messages like
    alloc_contig_range: [21000, 22200) PFNs busy
    which I've not seen before. It is there independent of the load and does not seem to be a problem in terms of stability.

  2. occasionally, vc4 still does "unwind backtrace", which is not nice,

  3. raspi3 pulseaudio[2725]: [pulseaudio] shm.c: mmap() failed: not enough memory

eventually, the screen went black, on open ssh connection to the raspi still accepted input, but the first command never finished, and the machine had to be rebooted.

In a second attempt, the machine crashes, it means it does not accept any input form mouse and keyboard, or the screen goes black. An open ssh terminal session, does still accept input, but when exiting a command (htop), it did not finish. however ping to the rasp still did respond. Again, syslog contained

`Aug 26 21:28:48 raspi3 kernel: [ 1502.204578] ------------[ cut here ]------------
Aug 26 21:28:48 raspi3 kernel: [ 1502.204645] WARNING: CPU: 1 PID: 583 at drivers/gpu/drm/vc4/vc4_plane.c:756 vc4_plane_async_set_fb+0xa4/0xa8 [vc4]
Aug 26 21:28:48 raspi3 kernel: [ 1502.204648] Modules linked in: evdev joydev panel_raspberrypi_touchscreen hid_cherry vc4 brcmfmac drm_kms_helper brcmutil cfg80211 drm rfkill snd_soc_core snd_compress snd_pcm_dmaengine syscopyarea snd_bcm2835 sysfillrect snd_pcm sysimgblt snd_timer i2c_gpio fb_sys_fops snd i2c_algo_bit bcm2835_gpiomem i2c_bcm2835 uio_pdrv_genirq uio fixed ip_tables x_tables ipv6
Aug 26 21:28:48 raspi3 kernel: [ 1502.204726] CPU: 1 PID: 583 Comm: Xorg Not tainted 4.9.45-v7+ #1031
Aug 26 21:28:48 raspi3 kernel: [ 1502.204729] Hardware name: BCM2835
Aug 26 21:28:48 raspi3 kernel: [ 1502.204754] [<8010fb3c>] (unwind_backtrace) from [<8010c058>] (show_stack+0x20/0x24)
Aug 26 21:28:48 raspi3 kernel: [ 1502.204767] [<8010c058>] (show_stack) from [<80456104>] (dump_stack+0xd4/0x118)
Aug 26 21:28:48 raspi3 kernel: [ 1502.204777] [<80456104>] (dump_stack) from [<8011d344>] (__warn+0xf8/0x110)
Aug 26 21:28:48 raspi3 kernel: [ 1502.204784] [<8011d344>] (__warn) from [<8011d42c>] (warn_slowpath_null+0x30/0x38)
Aug 26 21:28:48 raspi3 kernel: [ 1502.204817] [<8011d42c>] (warn_slowpath_null) from [<7f479b5c>] (vc4_plane_async_set_fb+0xa4/0xa8 [vc4])
Aug 26 21:28:48 raspi3 kernel: [ 1502.204888] [<7f479b5c>] (vc4_plane_async_set_fb [vc4]) from [<7f479c2c>] (vc4_update_plane+0xcc/0x154 [vc4])
Aug 26 21:28:48 raspi3 kernel: [ 1502.205078] [<7f479c2c>] (vc4_update_plane [vc4]) from [<7f19ff48>] (__setplane_internal+0x190/0x238 [drm])
Aug 26 21:28:48 raspi3 kernel: [ 1502.205319] [<7f19ff48>] (__setplane_internal [drm]) from [<7f1a0110>] (drm_mode_cursor_universal+0x120/0x1bc [drm])
Aug 26 21:28:48 raspi3 kernel: [ 1502.205545] [<7f1a0110>] (drm_mode_cursor_universal [drm]) from [<7f1a0238>] (drm_mode_cursor_common+0x8c/0x194 [drm])
Aug 26 21:28:48 raspi3 kernel: [ 1502.205772] [<7f1a0238>] (drm_mode_cursor_common [drm]) from [<7f1a08e8>] (drm_mode_cursor2_ioctl+0x18/0x1c [drm])
Aug 26 21:28:48 raspi3 kernel: [ 1502.206003] [<7f1a08e8>] (drm_mode_cursor2_ioctl [drm]) from [<7f183c60>] (drm_ioctl+0x20c/0x428 [drm])
Aug 26 21:28:48 raspi3 kernel: [ 1502.206131] [<7f183c60>] (drm_ioctl [drm]) from [<802837b4>] (do_vfs_ioctl+0xac/0x820)
Aug 26 21:28:48 raspi3 kernel: [ 1502.206142] [<802837b4>] (do_vfs_ioctl) from [<80283f6c>] (SyS_ioctl+0x44/0x6c)
Aug 26 21:28:48 raspi3 kernel: [ 1502.206152] [<80283f6c>] (SyS_ioctl) from [<801080c0>] (ret_fast_syscall+0x0/0x1c)
Aug 26 21:28:48 raspi3 kernel: [ 1502.206158] ---[ end trace f9c8ab4871cfb4a2 ]---

`

@anholt
Copy link
Contributor

anholt commented Aug 28, 2017

OK, looks like the original problem here is fixed.

  1. That message should have been removed from the kernel, I wonder if they dropped that patch from the downstream tree.

  2. I don't know what you mean by "vc4 does unwind backtrace"

  3. That's not me.

  4. Let's open a new bug report to handle that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants