Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Cant see traceback #1340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
LukenAgile42 opened this issue Aug 22, 2019 · 23 comments
Closed

Cant see traceback #1340

LukenAgile42 opened this issue Aug 22, 2019 · 23 comments
Assignees

Comments

@LukenAgile42
Copy link

LukenAgile42 commented Aug 22, 2019

class FooConsumer(SyncConsumer):
    def connect(self):
        assert False

    ...

produces

WebSocket HANDSHAKING /foo/ [127.0.0.1:00000]
WebSocket DISCONNECT /foo/ [127.0.0.1:00000]

With no traceback

python==3.6.8
django==1.11.16
channels==2.2.0
daphne==2.1.0
channels-redis==2.1.1
asgi-redis==1.4.3
asgiref==2.2.0

macOS Mojave 10.14.1

When I try to reproduce this with a minimal setup, it gives traceback

Exception inside application: 
  File "/path/to/django_sandbox/venv/lib/python3.6/site-packages/channels/sessions.py", line 183, in __call__
 ... 
  File "/path/to/django_sandbox/foo/consumers.py", line 7, in connect
    assert False

as expected.

Adding a print

logger.error("Exception inside application: %s", exception_output)
print(exception_output)

inside
daphne.server.Server.application_checker
allows me to see the output, so it seems daphne swallows the errors somehow.

@Ivo-Donchev
Copy link

Ivo-Donchev commented Oct 9, 2019

I'm having the same issue: if anything inside the socket fails there's no stacktrace in the logs.

@LukenAgile42 We've implemented a really simple class decorator (just for easier debugging locally) that basically catches any exception inside the current class and prints the stacktrace:

def debuggable_function_decorator(func):
    from functools import wraps
    import traceback

    @wraps(func)
    def wrapper(*arg, **kwargs):
        try:
            return func(*arg, **kwargs)
        except Exception as exc:
            print(traceback.format_exc())
            raise

    return wrapper


def debuggable_class_decorator(cls):
    for attr, value in vars(cls).items():
        if callable(value):
            setattr(cls, attr, debuggable_function_decorator(value))

    return cls


class debugmeta(type):
    def __new__(cls, clsname, bases, clsdict):
        return debuggable_class_decorator(
            super().__new__(cls, clsname, bases, clsdict)
        )


class SomeConsumer(metaclass=debugmeta, JsonWebsocketConsumer):
    pass

However it would be great if someone take a look ! 🙂

@ambsw-technology
Copy link

ambsw-technology commented Oct 22, 2019

We just ran into a comparable issue. On a test system, it was due to a missing HOSTS entry for the Redis server. We're seeing the same behavior on a prototype production system but it sounds like what we're seeing is so non-specific that the actual error could be pretty much anything in the consumer.

EDIT: To be clear, we're getting the slightly more interesting:

24.13.66.23:443 - - [22/Oct/2019:11:31:12] "WSCONNECTING /ws" - -
24.13.66.23:443 - - [22/Oct/2019:11:31:12] "WSCONNECT /ws" - -
24.13.66.23:443 - - [22/Oct/2019:11:31:12] "WSDISCONNECT /ws" - -
# ... rinse repeate

EDIT2: In our prototype production instance (AWS), the problem was that TransitEncyption defaults to "true" in the Elasticache Redis template we were using. We weren't using rediss:// and were accessing the server using a CNAME that didn't match the certificate.

Obviously, the details are less important than the fact that none of these error are bubbling up into logging (Django or otherwise).

@carltongibson
Copy link
Member

OK, so there's clearly logging lacking here.

Question: if you dig-in, or use @Ivo-Donchev's suggestion, where are the errors occurring? Can we add logging in the right places there to surface the issues?

(Anyone want to put together a minimal project that replicates this that we can play with?)

@ambsw-technology
Copy link

OK. If I wrap my receive logic in a try-catch block, I get the exception. If I reraise it and trace it, it climbs the following stack:

image

If I watch the error, it reaches concurrent/futures/thread.py:59 where it calls self.future.set_exception(exc). In PyCharm, the error eventually bubbles to my Console:

ERROR 2019-10-26 22:27:03,135 server 16752 22660 Exception inside application: Multiple exceptions: [Errno 10061] Connect call failed ('::1', 6379, 0, 0), [Errno 10061] Connect call failed ('127.0.0.1', 6379)
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\channels\sessions.py", line 183, in __call__
    return await self.inner(receive, self.send)
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\channels\middleware.py", line 41, in coroutine_call
    await inner_instance(receive, send)
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\channels\consumer.py", line 59, in __call__
    [receive, self.channel_receive], self.dispatch
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\channels\utils.py", line 58, in await_many_dispatch
    await task
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\channels_redis\core.py", line 429, in receive
    real_channel
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\channels_redis\core.py", line 484, in receive_single
    index, channel_key, timeout=self.brpop_timeout
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\channels_redis\core.py", line 324, in _brpop_with_clean
    async with self.connection(index) as connection:
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\channels_redis\core.py", line 820, in __aenter__
    self.conn = await self.pool.pop()
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\channels_redis\core.py", line 70, in pop
    conns.append(await aioredis.create_redis(**self.host, loop=loop))
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\aioredis\commands\__init__.py", line 175, in create_redis
    loop=loop)
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\aioredis\connection.py", line 108, in create_connection
    timeout, loop=loop)
  File "C:\Python37_64\Lib\asyncio\tasks.py", line 388, in wait_for
    return await fut
  File "C:\Users\<User>\Envs\workflow_admin_dev\lib\site-packages\aioredis\stream.py", line 19, in open_connection
    lambda: protocol, host, port, **kwds)
  File "C:\Python37_64\Lib\asyncio\base_events.py", line 957, in create_connection
    ', '.join(str(exc) for exc in exceptions)))
  Multiple exceptions: [Errno 10061] Connect call failed ('::1', 6379, 0, 0), [Errno 10061] Connect call failed ('127.0.0.1', 6379)

I'm not clear why a production runner doesn't behave the same way.

@Ivo-Donchev
Copy link

Ivo-Donchev commented Nov 1, 2019

Hey guys 🙂 ,

I think I found what's the issue 🔥 🎉

So, what I did:

  1. I created a super basic django project (as @carltongibson suggested) and start reproducing the bug - https://github.com/HackSoftware/django-channels-playground
  2. I couldn't reproduce in this clean state ❌
  3. So I reproduce it in the project we first hit it and started slicing things from the code
  4. I turned out that django-debug-toolbar was causing the issue in our case :( So it's not a channels issue

I searched in the other issues and found this - #1193 (there's a suggested quickfix there)

I started debugging the django-debug-toolbar code to find the root of the problem and turned out that this class is causing it - https://github.com/jazzband/django-debug-toolbar/blob/87eb27ffb2d0aea60b8b139360ece4d9fc91b47e/debug_toolbar/utils.py#L222

It's used by a code hooked to the python logging module and collects only the logs from the current thread (which channels is not part from) 😞

#1340 (comment) - this was solving the problem because the print is forcing a message to the standart output rather than using the python logging module

Hope this will help someone ^

@ambsw-technology
Copy link

We're not using the debug toolbar. However, it certainly could be an issue with other middleware.

@carltongibson
Copy link
Member

Hi @Ivo-Donchev — Thanks for the follow-up! Super.
(I'll leave this open for now, because I want to step through it myself when I have the moment.)

@Pithikos
Copy link

Pithikos commented Feb 12, 2020

This issue still persists. Anyone who found a workaround? Not using the debug toolbar and this makes it working blindly.

@LukenAgile42
Copy link
Author

@Ivo-Donchev
I tried your debug decorator, but it didn't produce any output.

I also removed django-debug-toolbar, but the issue is still there.

@Pithikos
Copy link

Something else that worked for me is to set the root logger explicitly in settings:

LOGGING = {
        ..
        "loggers": {
            "": { # Root logger
                "handlers": ["console"],
                "level": "INFO",
            },
           ..
      ..
}

@Ivo-Donchev
Copy link

@Ivo-Donchev
I tried your debug decorator, but it didn't produce any output.

I also removed django-debug-toolbar, but the issue is still there.

@LukenAgile42 Hmm. Can I see your code? (if It's open source of course 😄)

@LukenAgile42
Copy link
Author

LukenAgile42 commented Feb 12, 2020

Its closed source, but my usage was just

class FooConsumer(AsyncJsonWebsocketConsumer, metaclass=debugmeta):
    def async connect(self):
         assert False

I tried it while inheriting from JsonWebsocketConsumer aswell.

@LukenAgile42
Copy link
Author

I stripped away all middleware and apps, except for some core django ones, but the logging was still broken. Must be an infrastructure issue somehow.

@carltongibson
Copy link
Member

carltongibson commented Feb 12, 2020

If this is really a logging issue, try installing logging_tree https://pypi.org/project/logging_tree/ and then inspecting the actual loggers in the connect function. Put a breakpoint in. Log something, is it displayed? (Sometimes a dictConfig will override all the existing loggers, so you don't get the config you thought you were.)

I tried your debug decorator, but it didn't produce any output.

The decorator is print()-ing so why isn't stdout being displayed? (The thought being that maybe the exception is being swallowed somewhere, but even if so the print() should do it's job. What's stdout?)

Without a minimal reproduce it's really hard to see what can be happening.

@LukenAgile42
Copy link
Author

@carltongibson

No stdout. Just

WebSocket HANDSHAKING /foo/ [127.0.0.1:00000]
WebSocket DISCONNECT /foo/ [127.0.0.1:00000]

as before.

@Ivo-Donchev
Copy link

@LukenAgile42 Can you setup a minimal django app so we can see it reproduced in isolation? I'd love to play with it and help fixing it.

@ivanff
Copy link

ivanff commented Apr 14, 2020

+1 have problem

@ambsw-technology
Copy link

ambsw-technology commented Apr 22, 2020

I keep hoping to actually build an example of this for testing, but haven't found time yet. I wonder if it's enough to analyze the stack trace I posted above.

Shouldn't we expect to log (and possibly reraise) the error at some point in that process? It seems to me that a logical place to do so is at the "boundary" between packages. It may not be fair to blame a 3rd party package per se for an error at a boundary, but it's certainly far more likely to have an error e.g. when incompatible changes are made to the base package.

In this case, channels.utils.await_many_dispatch seems to wrap activities from other modules (channels_redis above but presumably others). It seems likely that you'd want to know what went wrong in that list of tasks, but:

  • There's no except clause in the outer try so errors in tasks appear to cause us to jump to finally
  • In the finally, we actually await the task (here) which causes the exception to be emitted, but there's no bare except to log (or even handle) the issue.
  • As a result, I believe the cancel isn't sent to every task. This may not technically be a bug, but it's certainly not ideal.

If one of the tasks error, isn't this method a logical place to catch and log it?

  • My conclusion is that the correct place to fix this is in the try statement in the finally. I believe this block should include a bare except that logs and (perhaps counter-intuitively) passes.
  • This ensures that the cancel attempt is made for all of the tasks. We probably should accumulate any exceptions in a list and raise that list (as a new exception) when we're done cancelling.
  • I originally wanted to handle exceptions earlier (e.g. on the outer try), but I believe this will create a race condition. Assume a task fails and puts us in the except block. If a second tasks fails in the finally, we're back at square one... might as well handle all of them here.

To ensure all of the cancels are attempted ASAP, I suspect the block should look more like this:

        for task in tasks:
            task.cancel()
        failures = []
        for task in tasks:
            try:
                await task
            except asyncio.CancelledError:
                pass
            except e:
                # not sure what logger to get/use here.  maybe the caller of `await_many_dispatch`
                # actually does the logging of the exception raised below.
                logging.error(str(e))
                failures.append(e)
       if failures:
           raise MultipleExceptions(failures)

@ambsw-technology
Copy link

P.S. this is the caller. There isn't a logger at this level either, but at least a message here is put into the context of the AsyncConsumer. In theory, await_many_dispatch could be used elsewhere though Github's usage tracking only shows the two in this file on lines 58 and 62.

@raj2121990
Copy link

Is there any progress on this issue. I am also facing this. Unable to see any sort of logs.

@Wrufesh
Copy link

Wrufesh commented Jul 9, 2020

Same here

backend:

WebSocket HANDSHAKING /webrtc/ [127.0.0.1:38882]
WebSocket DISCONNECT /webrtc/ [127.0.0.1:38882]
WebSocket HANDSHAKING /webrtc/ [127.0.0.1:39086]
WebSocket DISCONNECT /webrtc/ [127.0.0.1:39086]
WebSocket HANDSHAKING /webrtc/ [127.0.0.1:39114]
WebSocket DISCONNECT /webrtc/ [127.0.0.1:39114]
WebSocket HANDSHAKING /webrtc/ [127.0.0.1:39208]
WebSocket DISCONNECT /webrtc/ [127.0.0.1:39208]

frontend:

WebSocket connection to 'ws://localhost:8000/webrtc/' failed: Error during WebSocket handshake: Unexpected response code: 500

@eugenzor
Copy link

I also had this issue, but finally, I figured out that I simply forgot to set

await self.accept()

dimadk24 pushed a commit to dimadk24/english-fight that referenced this issue Feb 15, 2021
Doesn't play nice with django channels
Silences exceptions in consumers
See:
django/channels#1340
django/channels#1193
django-commons/django-debug-toolbar#1300
dimadk24 pushed a commit to dimadk24/english-fight that referenced this issue Feb 16, 2021
Doesn't play nice with django channels
Silences exceptions in consumers
See:
django/channels#1340
django/channels#1193
django-commons/django-debug-toolbar#1300
dimadk24 pushed a commit to dimadk24/english-fight that referenced this issue Mar 7, 2021
Doesn't play nice with django channels
Silences exceptions in consumers
See:
django/channels#1340
django/channels#1193
django-commons/django-debug-toolbar#1300
dimadk24 pushed a commit to dimadk24/english-fight that referenced this issue Mar 11, 2021
Doesn't play nice with django channels
Silences exceptions in consumers
See:
django/channels#1340
django/channels#1193
django-commons/django-debug-toolbar#1300
@eriktelepovsky
Copy link

Something else that worked for me is to set the root logger explicitly in settings:

LOGGING = {
        ..
        "loggers": {
            "": { # Root logger
                "handlers": ["console"],
                "level": "INFO",
            },
           ..
      ..
}

Thank you @Pithikos. This actually helped me to show the traceback in the console. Is there a way to find out which logger handled the exception?

@django django locked and limited conversation to collaborators Jun 22, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants