-
-
Notifications
You must be signed in to change notification settings - Fork 7
feat: Add zero-shot classification example uv
script
#155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #155 +/- ##
=======================================
Coverage 92.56% 92.56%
=======================================
Files 64 64
Lines 3281 3281
=======================================
Hits 3037 3037
Misses 244 244 🚀 New features to boost your workflow:
|
uv
script
Managed to get it to run with:
but seems to fail due to
Looks like libGL.so.1 (OpenGL) is not on teh docker container running the hf job. |
@davanstrien is there a way to use a custom docker container? |
@ivyleavedtoadflax You can use any image as base AFAIK, as long as it's on Docker Hub. Maybe give https://hub.docker.com/r/nvidia/cuda a shot? |
Also failed. I started looking in to using a custom docker container - the same that HF uses but just with the OpenCV deps added. Note that this is happening because some of the OCR dependencies require it. If we moved the OCR dependencies to |
@ivyleavedtoadflax Done in #163. Also updated to the new API design. Try again? Shouldn't require |
Great, i will try again |
Weirdly this does not solve the issue, it continues to asks for the |
# Conflicts: # sieves/tasks/preprocessing/ingestion/ingestion_import.py # uv.lock
@ivyleavedtoadflax Fixed and tested. I can run this with:
LMK if this works for you too? |
Looks great! One quick suggestion: it would be better to load the HF token from environment so it can be passed as a secret in the jobs command. i.e. Instead of --hf-token flag, just grab it from the environment: token = os.environ.get("HF_TOKEN") or get_token()
if token:
login(token=token) Then in the HF Jobs example: hfjobs run --flavor l4x1 \
-s HF_TOKEN \
uv run https://github.com/raw/MantisAI/sieves/main/examples/create_classification_dataset.py \
--input-dataset stanfordnlp/imdb \
--column text \
--labels "positive,negative" \
--model HuggingFaceTB/SmolLM-360M-Instruct \
--output-dataset your-username/imdb-classified Otherwise, this looks very nice! |
# Conflicts: # sieves/tests/tasks/test_optimization.py
Good catch, thanks! Added in 2bbdc5e. I'll leave this PR open for a bit longer in case @ivyleavedtoadflax has any further comments, then I'll merge. |
Description
Adds a example
uv
script for zero-shot classification usingsieves
. The script provides a complete workflow for classifying Hugging Face datasets with automatic device detection, structured outputs, and dataset publishing.Inspired by https://huggingface.co/datasets/uv-scripts/classification.
Caution
I don't have access to a pro/enterprise HF account yet, so I couldn't test running this with
hf jobs
. Maybe @davanstrien or @ivyleavedtoadflax can give it a shot?Related Issues
-
Changes Made
examples/create_classification_dataset_with_sieves.py
- A comprehensive uv-compatible script for zero-shot text classificationto_hf_dataset()
method with proper label normalization and multi-label supportKey Features of the Example Script:
uv
compatibility: Uses PEP 723 inline script metadata for dependency managementUsage Examples:
Checklist
Screenshots/Examples (if applicable)
The script provides comprehensive logging output showing: