Skip to content

Job hanging / Event Rule for virtual machine model launching custom script causes job to hang #19204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chaeynz opened this issue Apr 15, 2025 · 6 comments · Fixed by #19297
Closed
Assignees
Labels
severity: low Does not significantly disrupt application functionality, or a workaround is available status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application
Milestone

Comments

@chaeynz
Copy link

chaeynz commented Apr 15, 2025

Deployment Type

Self-hosted

NetBox Version

4.2.6

Python Version

3.11

Steps to Reproduce

Redis server v=7.0.15 sha=00000000:0 malloc=jemalloc-5.3.0 bits=64 build=c89c70d1d28059e4
(probably not relevant, but here u go)

  1. Add custom script to Netbox that returns data
  2. Create Event Rule for Virtual Machine with Action: Launch Script
  3. Create/modify virtual machine

Script:

from extras.scripts import *
class MyCustomScript(Script):

    class Meta:
        name = "My Custom Script"
        description = "My description"


    def run(self, data, commit):

        return data

Virtual Machine Custom Fields:
2x Selection
1x Multiple Selection
1x Object:
Image

Platform Custom Fields:
1x Selection

Tenant Custom Fields:
1x Object: Netbox-DNS->Record

Here a quick json dump of the virtual machine I created.

{
    "id": 153,
    "url": "https://mydomain/api/virtualization/virtual-machines/153/",
    "display_url": "mydomain/virtualization/virtual-machines/153/",
    "display": "test",
    "name": "test",
    "status": {
        "value": "active",
        "label": "Active"
    },
    "site": {
        "id": 2,
        "url": "https://mydomain/api/dcim/sites/2/",
        "display": "NEXS_00",
        "name": "NEXS_00",
        "slug": "nexs_00",
        "description": "Chaeynz Home"
    },
    "cluster": {
        "id": 1,
        "url": "https://mydomain/api/virtualization/clusters/1/",
        "display": "RC",
        "name": "RC",
        "description": ""
    },
    "device": null,
    "serial": "",
    "role": null,
    "tenant": null,
    "platform": null,
    "primary_ip": null,
    "primary_ip4": null,
    "primary_ip6": null,
    "vcpus": null,
    "memory": null,
    "disk": null,
    "description": "",
    "comments": "",
    "config_template": null,
    "local_context_data": null,
    "tags": [],
    "custom_fields": {
        "resource_pool": null,
        "machine_service_category": null,
        "nfs_share": null,
        "vm_prefix": null
    },
    "config_context": {},
    "created": "2025-04-15T20:24:08.790042Z",
    "last_updated": "2025-04-15T20:42:32.559175Z",
    "interface_count": 0,
    "virtual_disk_count": 0
}

I did not investigate further, but it seems to me there is some kind of loop going on. You have try except blocks in the code to launch scripts from what I am seeing..
Maybe it is because of the relation with objects?

I hope this is useful, if you have any questions I will be happy to help!

Expected Behavior

Script should return virtual machine json data
Or.. atleast timeout?

Observed Behavior

After 1 hour the jobs are still hanging that ran against for my virtual machine model.

Image

@chaeynz chaeynz added status: needs triage This issue is awaiting triage by a maintainer type: bug A confirmed report of unexpected behavior in the application labels Apr 15, 2025
@chaeynz
Copy link
Author

chaeynz commented Apr 15, 2025

when running against the tenant object for example it works without problems

Image

@chaeynz
Copy link
Author

chaeynz commented Apr 15, 2025

note
#18222
extras/events.py

        # Compile event data
        event_data = event_rule.action_data or {}
        event_data.update(data)

@arthanson arthanson added status: needs owner This issue is tentatively accepted pending a volunteer committed to its implementation severity: low Does not significantly disrupt application functionality, or a workaround is available and removed status: needs triage This issue is awaiting triage by a maintainer labels Apr 17, 2025
@arthanson
Copy link
Collaborator

Weird, it does seem to only do it for Virtual Machines, get's following stack trace:

Traceback (most recent call last):
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/rq/worker.py", line 1633, in perform_job
    return_value = job.perform()
                   ^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/rq/job.py", line 1331, in perform
    self._result = self._execute()
                   ^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/rq/job.py", line 1365, in _execute
    result = self.func(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/netbox/netbox/jobs.py", line 82, in handle
    job.terminate(status=JobStatusChoices.STATUS_ERRORED, error=repr(e))
  File "/Users/ahanson/dev/work/netbox/netbox/core/models/jobs.py", line 201, in terminate
    self.save()
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/models/base.py", line 892, in save
    self.save_base(
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/models/base.py", line 998, in save_base
    updated = self._save_table(
              ^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/models/base.py", line 1130, in _save_table
    updated = self._do_update(
              ^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/models/base.py", line 1195, in _do_update
    return filtered._update(values) > 0
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/models/query.py", line 1278, in _update
    return query.get_compiler(self.db).execute_sql(CURSOR)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/models/sql/compiler.py", line 2003, in execute_sql
    cursor = super().execute_sql(result_type)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/models/sql/compiler.py", line 1574, in execute_sql
    cursor.execute(sql, params)
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/backends/utils.py", line 122, in execute
    return super().execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/django_prometheus/db/common.py", line 69, in execute
    return super().execute(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/psycopg/cursor.py", line 93, in execute
    self._conn.wait(
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/psycopg/connection.py", line 407, in wait
    return waiting.wait(gen, self.pgconn.socket, interval=interval)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "psycopg_c/_psycopg/waiting.pyx", line 198, in psycopg_c._psycopg.wait_c
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/psycopg/_cursor_base.py", line 194, in _execute_gen
    pgq = self._convert_query(query, params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/psycopg/_cursor_base.py", line 454, in _convert_query
    pgq.convert(query, params)
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/psycopg/_queries.py", line 268, in convert
    self.dump(vars)
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/psycopg/_queries.py", line 278, in dump
    self.params = tuple(
                  ^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/psycopg/_queries.py", line 279, in <genexpr>
    self._tx.as_literal(p) if p is not None else b"NULL" for p in params
    ^^^^^^^^^^^^^^^^^^^^^^
  File "psycopg_c/_psycopg/transform.pyx", line 204, in psycopg_c._psycopg.Transformer.as_literal
  File "psycopg_c/_psycopg/waiting.pyx", line 213, in psycopg_c._psycopg.wait_c
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/psycopg/adapt.py", line 59, in quote
    value = self.dump(obj)
            ^^^^^^^^^^^^^^
  File "/Users/ahanson/dev/work/netbox/venv/lib/python3.12/site-packages/psycopg/types/json.py", line 152, in dump
    data = dumps(obj)
           ^^^^^^^^^^
  File "/Users/ahanson/.pyenv/versions/3.12.3/lib/python3.12/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/.pyenv/versions/3.12.3/lib/python3.12/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/.pyenv/versions/3.12.3/lib/python3.12/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/Users/ahanson/.pyenv/versions/3.12.3/lib/python3.12/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type __proxy__ is not JSON serializable

@jeremystretch jeremystretch self-assigned this Apr 23, 2025
@jeremystretch jeremystretch added status: accepted This issue has been accepted for implementation and removed status: needs owner This issue is tentatively accepted pending a volunteer committed to its implementation labels Apr 23, 2025
@jeremystretch
Copy link
Member

Weird, it does seem to only do it for Virtual Machines

This is reproducible for other objects as well (sites, for instance).

This is happening because the data passed to the script's run() can contain Python objects which may not be directly serializable as JSON. In this case, it appears to be the gettext() wrapper on the object's status label (which is used to support translation).

Here's a reproduction which isolates the root issue from the custom script & event rule:

>>> import json
>>> from virtualization.api.serializers import VirtualMachineSerializer
>>> 
>>> vm = VirtualMachine.objects.first()
>>> data = VirtualMachineSerializer(vm, context={'request': None}).data
>>> json.dumps(data)
Traceback (most recent call last):
  File "/usr/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/usr/lib/python3.10/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type __proxy__ is not JSON serializable
>>> type(data['status']['label'])
<class 'django.utils.functional.lazy.<locals>.__proxy__'>

In practice, the exception is raised because NetBox is trying to save the output returned by run() in a JSONField, and the default encoder for this field (json.JSONEncoder) doesn't support such objects. We could change it to DjangoJSONEncoder, which forces the resolution of Promise objects to serializable strings.

@jeremystretch
Copy link
Member

I should note for posterity that the fix for this bug is limited to the specific scenario detailed above. The switch to using DjangoJSONEncoder will not magically serialize arbitrary objects, but it does provide better serialization support in general.

@chaeynz
Copy link
Author

chaeynz commented Apr 24, 2025

I should note for posterity that the fix for this bug is limited to the specific scenario detailed above. The switch to using DjangoJSONEncoder will not magically serialize arbitrary objects, but it does provide better serialization support in general.

Why was it hanging though, shouldnt the job returned the TypeError error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity: low Does not significantly disrupt application functionality, or a workaround is available status: accepted This issue has been accepted for implementation type: bug A confirmed report of unexpected behavior in the application
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants