Skip to content

test_datasets HF scenario fails in CI, sometimes, failing to fetch simpleqa dataset #1959

@booxter

Description

@booxter

System Info

.

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

AFAIU the test run fails only at times; it passes if repeated. The dataset exists, with the desired train split: https://huggingface.co/datasets/llamastack/simpleqa

Error logs

tests/integration/datasets/test_datasets.py::test_register_and_iterrows[meta-llama/Llama-3.2-3B-Instruct-eval/messages-answer-source0-huggingface-10] FAILED [ 33%]
tests/integration/datasets/test_datasets.py::test_register_and_iterrows[meta-llama/Llama-3.2-3B-Instruct-eval/messages-answer-source1-localfs-2] PASSED [ 66%]
tests/integration/datasets/test_datasets.py::test_register_and_iterrows[meta-llama/Llama-3.2-3B-Instruct-eval/messages-answer-source2-localfs-5] PASSED [100%]

=================================== FAILURES ===================================
_ test_register_and_iterrows[meta-llama/Llama-3.2-3B-Instruct-eval/messages-answer-source0-huggingface-10] _
tests/integration/datasets/test_datasets.py:87: in test_register_and_iterrows
    iterrow_response = llama_stack_client.datasets.iterrows(dataset.identifier, limit=limit)
.venv/lib/python3.10/site-packages/llama_stack_client/resources/datasets.py:147: in iterrows
    return self._get(
.venv/lib/python3.10/site-packages/llama_stack_client/_base_client.py:1171: in get
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
llama_stack/distribution/library_client.py:177: in request
    return asyncio.run(self.async_client.request(*args, **kwargs))
../../../.local/share/uv/python/cpython-3.10.17-linux-x86_64-gnu/lib/python3.10/asyncio/runners.py:44: in run
    return loop.run_until_complete(main)
../../../.local/share/uv/python/cpython-3.10.17-linux-x86_64-gnu/lib/python3.10/asyncio/base_events.py:649: in run_until_complete
    return future.result()
llama_stack/distribution/library_client.py:265: in request
    response = await self._call_non_streaming(
llama_stack/distribution/library_client.py:286: in _call_non_streaming
    result = await matched_func(**body)
llama_stack/distribution/routers/routers.py:696: in iterrows
    return await self.routing_table.get_provider_impl(dataset_id).iterrows(
llama_stack/providers/remote/datasetio/huggingface/huggingface.py:78: in iterrows
    loaded_dataset = hf_datasets.load_dataset(path, **params)
.venv/lib/python3.10/site-packages/datasets/load.py:2129: in load_dataset
    builder_instance = load_dataset_builder(
.venv/lib/python3.10/site-packages/datasets/load.py:1849: in load_dataset_builder
    dataset_module = dataset_module_factory(
.venv/lib/python3.10/site-packages/datasets/load.py:1727: in dataset_module_factory
    raise FileNotFoundError(
E   FileNotFoundError: Couldn't find any data file at /home/runner/work/llama-stack/llama-stack/llamastack/simpleqa. Couldn't find 'llamastack/simpleqa' on the Hugging Face Hub either: LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
---------------------------- Captured stdout setup -----------------------------
INFO     2025-04-15 16:07:44,638 llama_stack.providers.remote.inference.ollama.ollama:99 inference: checking            
         connectivity to Ollama at `http://localhost:11434`...                                                          
WARNING  2025-04-15 16:07:47,785 root:72 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will   
         not work correctly.                                                                                            
INFO     2025-04-15 16:07:47,951 llama_stack.providers.remote.inference.ollama.ollama:338 inference: Pulling embedding  
         model `all-minilm:latest` if necessary...                                                                      
------------------------------ Captured log setup ------------------------------
INFO     llama_stack.providers.remote.inference.ollama.ollama:ollama.py:99 checking connectivity to Ollama at `http://localhost:11434`...
WARNING  root:agents.py:72 Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
INFO     llama_stack.providers.remote.inference.ollama.ollama:ollama.py:338 Pulling embedding model `all-minilm:latest` if necessary...
=========================== short test summary info ============================
FAILED tests/integration/datasets/test_datasets.py::test_register_and_iterrows[meta-llama/Llama-3.2-3B-Instruct-eval/messages-answer-source0-huggingface-10] - FileNotFoundError: Couldn't find any data file at /home/runner/work/llama-stack/llama-stack/llamastack/simpleqa. Couldn't find 'llamastack/simpleqa' on the Hugging Face Hub either: LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

Error copied from: https://github.com/meta-llama/llama-stack/actions/runs/14474048390/job/40595420299?pr=1957

Expected behavior

CI passes consistently.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions