31 Jul 14:10

elronbandel

8eb8974

Unitxt 1.26.5 Latest

Latest

What's Changed

For load_dataset, use_cache default value is taken from settings by @eladven in #1880
Support watsonx.ai on-prem credentials by @pratapkishorevarma in #1883
extend condition to also filter by field exists or not by @dafnapension in #1879
fix performance test by @dafnapension in #1884
Add support for inline-defined templates in the UI by @Chemafiz in #1886
Mitigate HTTP 403 errors in pandas by @bnayahu in #1888
Biggen benchmark and pearson correlation metric by @martinscooper in #1887
Update version to 1.26.5 by @elronbandel in #1889

New Contributors

@pratapkishorevarma made their first contribution in #1883
@Chemafiz made their first contribution in #1886

Full Changelog: 1.26.4...1.26.5

Contributors

eladven, martinscooper, and 5 other contributors

Assets 2

22 Jul 14:35

elronbandel

1.26.4

83063f9

Unitxt 1.26.4

What's Changed

Add more Judgebench benchmarks by @martinscooper in #1869
Make sqlite3 not an optional dependency by @elronbandel in #1871
Removed legacy topicality, idk, and groundness metrics that worked only on BAM by @yoavkatz in #1875
Bench and models by @martinscooper in #1872
Handle a case in ToolCallPostProcessor where prediction is an empty list of tools by @yoavkatz in #1874
Update version to 1.26.4 by @elronbandel in #1876

Full Changelog: 1.26.3...1.26.4

Contributors

martinscooper, elronbandel, and yoavkatz

Assets 2

16 Jul 17:47

elronbandel

1.26.3

728fcc8

Unitxt 1.26.3

What's Changed

LLM Judge: Improve context/prediction fields parsing by @martinscooper in #1856
Fixed bug in tool inference by @yoavkatz in #1868
Added a new MetricBasedNer that allows calculating entity similary using any Unitxt metric. by @yoavkatz in #1860
Update version to 1.26.3 by @elronbandel in #1870

Full Changelog: 1.26.2...1.26.3

Contributors

martinscooper, elronbandel, and yoavkatz

Assets 2

16 Jul 09:44

elronbandel

1.26.2

68aa406

Unitxt 1.26.2

What's Changed

Add tot dataset by @elronbandel in #1865
Add tokenizer_name to base huggingface inference engines by @elronbandel in #1862
Add hf to cross provider inference engine by @yoavkatz in #1866
Update version to 1.26.2 by @elronbandel in #1867

Full Changelog: 1.26.1...1.26.2

Contributors

elronbandel and yoavkatz

Assets 2

10 Jul 17:27

elronbandel

1.26.1

b6cc840

Unitxt 1.26.1

Lock datasets dependency to <4.0.0

The latest datasets v4.0.0 release removes support for loading datasets with trust_remote_code=True. This change breaks compatibility with many datasets currently in the Unitxt catalog, as several datasets require this feature to load properly.

This patch restricts the datasets version to below 4.0.0 until we can find or develop replacements for affected datasets.

Assets 2

09 Jul 14:27

elronbandel

1.26.0

9561615

Unitxt 1.26.0 - Multi Threading

Main changes:

Made Unitxt Thread-Safe so it can run in multi-threaded environments.
Added an option to set sampling seed for demos (in context example). This is done by demos_sampling_seed. It allows running the same dataset with different demo examples.
Improved printouts of instance scores with to_markdown() and summary in Unitxt. For example :

results = evaluate(predictions=predictions, data=dataset)
print(results.instance_scores.summary)

All changes:

Add to_markdown() to InstanceScores to pretty print output by @yoavkatz in #1846
Improved InstanceScores summary to be readible and in decent width by @yoavkatz in #1847
Improve multi turn tool calling example by @elronbandel in #1848
Add metrics documentation including range, directionality and references by @elronbandel in #1850
Fix sacrebleu documentation by @elronbandel in #1851
Add F1 score documentation to F1Fast metric class by @elronbandel in #1852
Add more llmjudge benchmarks by @martinscooper in #1804
Fix llama scout name and url on rits by @martinscooper in #1857
Add demos_sampling_seed to recipe api by @elronbandel in #1858
Add comprehensive multi threading support and tests by @elronbandel in #1853
Update BlueBench to match the original implementation by @bnayahu in #1855

Full Changelog: 1.25.0...1.26.0

Contributors

martinscooper, bnayahu, and 2 other contributors

Assets 2

25 Jun 18:51

elronbandel

1.25.0

c5acd23

Unitxt 1.25.0 - Improved Error Messages

Main changes

Error message simplied and improved. Now each failue produces a short stack trace, following by the context the error occured, a link to help documention, and then the detailed error message

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 🦄 Unitxt Error Context                                                                                              │
│ -------------------------------------------------------------------------------------------------------------------- │
│  - Python: 3.10.17                                                                                                   │
│  - Unitxt: 1.25.0                                                                                                    │
│  - Stage: Metric Processing                                                                                          │
│  - Stream: all_data>>                                                                                                │
│  - Object: KeyValueExtraction (https://www.unitxt.ai/en/latest/unitxt.metrics.html#unitxt.metrics.KeyValueExtraction)│
│  - Help: https://www.unitxt.ai/en/latest/docs/adding_metric.html                                                     │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Each reference is expected to be of type 'Dict[str, str]' in metrics.key_value_extraction.accuracy metric. Received reference of type <class 'str'>: Austin

Added Granite Thinking support including example.
Added a flag in the format to determine whether the to place the template instructions once in the system turn, or in the user turns (for each demo and for the final input). This is important because some models delete their default system prompt, when their recieve an external system prompt.
Added option to get generated text in meta data when calling infer_log_prob() . In the past only seperated tokens were returned.
See example code.
Added support for multi turn dialog metrics. See tool calling example.

What's Changed

Add Multi Turn Metrics Support by @elronbandel in #1579
add a test for faithfulness with an external client and fetch artifact by @matanor in #1824
Fix rits model names and the judges that use them by @martinscooper in #1825
Add option to store template instruction in user role and not system role and added granite thinking example by @yoavkatz in #1667
Bluebench fixes by @bnayahu in #1828
Fix huggingface auto model log probs evaluation by @elronbandel in #1829
Add support for tool calling in HFAutoModelInferenceEngine by @elronbandel in #1827
Changed artifiact.to_yaml() to use standard dict to yaml API + added example to create a yaml representation of a data card by @yoavkatz in #1831
Fix CLI issues by @bnayahu in #1832
Allow changing default ollama api_base by @martinscooper in #1830
fix bug when WML does not return any content or tool call by @yoavkatz in #1835
Arena hard fix by @bnayahu in #1836
Add full generated text when running infer_log_prob() with meta data enabled. by @yoavkatz in #1834
Improved parsing of MT bench style scores by @yoavkatz in #1839
Use os.path.join to create infer cache dir path by @martinscooper in #1840
Add multi turn tool calling task and support multiple tools per call by @elronbandel in #1811
Results summarization utility for the CLI by @bnayahu in #1842
Improved error messages by @elronbandel in #1838
Improve Text2SQL Metrics: Refactoring, New Execution Metric, and Bug Fixes by @oktie in #1841
Update coverage exclusions by @elronbandel in #1843
Use full artifact representation as the cache key for the dataset by @elronbandel in #1644
Update version to 1.25.0 by @elronbandel in #1844

Full Changelog: 1.24.0...1.25.0

Contributors

oktie, martinscooper, and 4 other contributors

Assets 2

03 Jun 15:52

elronbandel

1.24.0

5d576f6

Unitxt 1.24.0

What's Changed

External client for wml infer engine by @matanor in #1817
Improved JoinStream error messages by @yoavkatz in #1819
Added param to control of confidence interval calculation in evaluate api by @yoavkatz in #1815
extend code coverage some by @dafnapension in #1814
Make api_key_env_var optional in LoadFromAPI by @martinscooper in #1799
Fix Issue with multi byte token decoding by @elronbandel in #1821
Fix ruff format pre-commit by @elronbandel in #1822
Test eval utils with external client by @matanor in #1820
Improved and Optimized JaccardIndex , Spearman, StringContainment metrics and added MSE and RMSE metrics by @elronbandel in #1816
Update version to 1.24.0 by @elronbandel in #1823

Full Changelog: 1.23.1...1.24.0

Contributors

martinscooper, elronbandel, and 3 other contributors

Assets 2

29 May 14:23

elronbandel

1.23.1

f5e47f7

Unitxt 1.23.1

What's Changed

Add more metrics for schema linking by @kurhula in #1788
Fixed argument_value_precision by @yoavkatz in #1794
FIx granite guardian agentic metric and align it with unitxt built in tool calling types by @elronbandel in #1786
Allow running benchmarks and recipes in cli by @elronbandel in #1785
Add ToRR Benchmark Readme file by @csrajmohan in #1793
Add tool calling correctness metric by @elronbandel in #1796
Remove IBM branding from opensource doc by @yoavkatz in #1802
Add LoadJsonFile loader and tests by @elronbandel in #1801
LLM judge judgebench benchmarks by @martinscooper in #1800
Added granite tool calling system prompt by @Narayanan-V-Eswar in #1798
Documenation updates by @yoavkatz in #1790
Cards for the Real MM RAG datasets by @assaftibm in #1795
Add more judges by @martinscooper in #1808
Fixed problematic load of json with a single dictionary line. by @yoavkatz in #1806
Add more cross provider models by @martinscooper in #1807
Fix model name by @martinscooper in #1809
watsonx.ai mistral small support by @LukaszCmielowski in #1810
Fix: number of batches calculation is incorrect by @martinscooper in #1805
Fix example dependencies installation by @elronbandel in #1812
Update version to 1.23.1 by @elronbandel in #1818

New Contributors

@kurhula made their first contribution in #1788
@LukaszCmielowski made their first contribution in #1810

Full Changelog: 1.23.0...1.23.1

Contributors

csrajmohan, kurhula, and 6 other contributors

Assets 2

13 May 13:10

elronbandel

1.23.0

fd97309

Unitxt 1.23.0

Main changes

Revised the tool calling tasks and metrics introduced in 1.22.4) - Non backward compatible change. Existing datasets addressed.
Fixed support for running HF with AutoModelInferenceEngine (MultiGPU + tokenization issue)
Added to_yaml() to create a yaml representation of the card that can be used for running custom datasets in Granite.build

What's Changed

FIx batching support for hf Dataset in HFAutoModelInferenceEngine by @elronbandel in #1771
Fix litellm inference without task_data by @elronbandel in #1772
Added to_yaml shorthand function to artifact by @yoavkatz in #1768
Simplify tool calling base types by @elronbandel in #1773
Added tool calling to wml chat by @pawelknes in #1782
Reverting to datasets=351 can solve problems in test catalog preparation by @dafnapension in #1784
Update ibm wml engine #1775 by @MikolajCharchut in #1781
Fix HF AutoModel tokenization issue with chat template + issue with multi GPU by @OfirArviv in #1779
Performance to report accurate times based on end-to-end time() diffs, rather than accumulate cProfile numbers over methods whose names seem relevant by @dafnapension in #1783
Add support to mix args and textual query in load_dataset by @elronbandel in #1778
Add installation of spacy as a binary dependency for examples regression tests by @elronbandel in #1787
Improvements to tool calling - NON BACKWARD COMPATIBLE CHANGES by @Narayanan-V-Eswar in #1770
Added example for standalone metric evaluation by @yoavkatz in #1769
Update version to 1.23.0 by @elronbandel in #1789

New Contributors

@Narayanan-V-Eswar made their first contribution in #1770

Full Changelog: 1.22.4...1.23.0

Contributors

OfirArviv, elronbandel, and 5 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Lock datasets dependency to <4.0.0

Uh oh!

Contributors

Uh oh!

Main changes

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

Main changes

What's Changed

New Contributors

Contributors

Uh oh!

Releases: IBM/unitxt

Unitxt 1.26.5

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.26.4

What's Changed

Contributors

Uh oh!

Unitxt 1.26.3

What's Changed

Contributors

Uh oh!

Unitxt 1.26.2

What's Changed

Contributors

Uh oh!

Unitxt 1.26.1

Lock datasets dependency to <4.0.0

Uh oh!

Unitxt 1.26.0 - Multi Threading

Contributors

Uh oh!

Unitxt 1.25.0 - Improved Error Messages

Main changes

What's Changed

Contributors

Uh oh!

Unitxt 1.24.0

What's Changed

Contributors

Uh oh!

Unitxt 1.23.1

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.23.0

Main changes

What's Changed

New Contributors

Contributors

Uh oh!