Skip to content

v1.15.0: OmniScraperGraph not working: Error parsing input keys for ImageToText #580

@LorenzoPaleari

Description

@LorenzoPaleari

Describe the bug
OmniScraperGraph throws error. Tested on minimal example on GitHub.
omni_scraper_openai.py

To Reproduce

mkdir test
cd test
python3 -m venv venv
source venv/bin/activate
pip install scrapegraphai \
    "scrapegraphai[burr]" \ 
    "scrapegraphai[more-browser-options]" \
    "pip install scrapegraphai[other-language-models]" \
    langchain_google_vertexai --no-cache     # to have a clean environment 
# It do not start without all of this libraries. This is potentially a bug itself
playwright install

# Using the provided example for openai found on GitHub
# Set up openai key in .env
python3 omni_scraper_openai.py

Output

--- Executing Fetch Node ---
--- (Fetching HTML from: https://perinim.github.io/projects/) ---
--- Executing Parse Node ---
--- Executing ImageToText Node ---
Traceback (most recent call last):
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/nodes/base_node.py", line 112, in get_input_keys
    input_keys = self._parse_input_keys(state, self.input)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/nodes/base_node.py", line 236, in _parse_input_keys
    raise ValueError("No state keys matched the expression.")
ValueError: No state keys matched the expression.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/lollo/Desktop/test/test.py", line 42, in <module>
    result = omni_scraper_graph.run()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/graphs/omni_scraper_graph.py", line 124, in run
    self.final_state, self.execution_info = self.graph.execute(inputs)
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/graphs/base_graph.py", line 263, in execute
    return self._execute_standard(initial_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/graphs/base_graph.py", line 185, in _execute_standard
    raise e
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/graphs/base_graph.py", line 169, in _execute_standard
    result = current_node.execute(state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/nodes/image_to_text_node.py", line 54, in execute
    input_keys = self.get_input_keys(state)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/nodes/base_node.py", line 116, in get_input_keys
    raise ValueError(f"Error parsing input keys for {self.node_name}: {str(e)}")
ValueError: Error parsing input keys for ImageToText: No state keys matched the expression.

Adding burr arguments to graph config:

"burr_kwargs": {
        "project_name": "test-scraper",
        "app_instance_id":"1234",
    }
Starting action: Fetch
--- Executing Fetch Node ---
--- (Fetching HTML from: https://perinim.github.io/projects/) ---

********************************************************************************
-------------------------------------------------------------------
Oh no an error! Need help with Burr?
Join our discord and ask for help! https://discord.gg/4FxBMyzW5n
-------------------------------------------------------------------
> Action: `Fetch` encountered an error!<
> State (at time of action):
{'__SEQUENCE_ID': 0,
 'url': 'https://perinim.github.io/projects/',
 'user_prompt': "'List me all the projects with their titles and im..."}
> Inputs (at time of action):
{}
********************************************************************************
Traceback (most recent call last):
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/core/application.py", line 561, in _step
    new_state = _run_reducer(next_action, self._state, result, next_action.name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/core/application.py", line 199, in _run_reducer
    _validate_reducer_writes(reducer, new_state, name)
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/core/application.py", line 174, in _validate_reducer_writes
    raise ValueError(
ValueError: State is missing write keys after running: Fetch. Missing keys are: {'link_urls', 'img_urls'}. Has writes: ['doc', 'link_urls', 'img_urls']
Finishing action: Fetch
Traceback (most recent call last):
  File "/Users/lollo/Desktop/test/test.py", line 46, in <module>
    result = omni_scraper_graph.run()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/graphs/omni_scraper_graph.py", line 124, in run
    self.final_state, self.execution_info = self.graph.execute(inputs)
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/graphs/base_graph.py", line 260, in execute
    result = bridge.execute(initial_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/scrapegraphai/integrations/burr_bridge.py", line 215, in execute
    last_action, result, final_state = self.burr_app.run(
                                       ^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/telemetry.py", line 273, in wrapped_fn
    return call_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/core/application.py", line 893, in run
    next(gen)
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/core/application.py", line 838, in iterate
    prior_action, result, state = self.step(inputs=inputs)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/core/application.py", line 515, in step
    out = self._step(inputs=inputs, _run_hooks=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/core/application.py", line 568, in _step
    raise e
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/core/application.py", line 561, in _step
    new_state = _run_reducer(next_action, self._state, result, next_action.name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/core/application.py", line 199, in _run_reducer
    _validate_reducer_writes(reducer, new_state, name)
  File "/Users/lollo/Desktop/test/venv/lib/python3.11/site-packages/burr/core/application.py", line 174, in _validate_reducer_writes
    raise ValueError(
ValueError: State is missing write keys after running: Fetch. Missing keys are: {'link_urls', 'img_urls'}. Has writes: ['doc', 'link_urls', 'img_urls']

Desktop

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions