Skip to content

Port to orjson from ujson #8584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

aadya940
Copy link

As specified in #8540 , ujson is in maintanance mode.

@@ -318,7 +318,7 @@ def _save_data_to_local_file(train_data: list[dict[str, Any]], data_format: Trai
elif data_format == TrainDataFormat.COMPLETION:
_validate_completion_data(item)

f.write(ujson.dumps(item) + "\n")
f.write(orjson.dumps(item).decode() + "\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend changing the open(file_path, "w"), above, to "wb" mode to eliminate the decodes happening here.

The end result will be:

Suggested change
f.write(orjson.dumps(item).decode() + "\n")
f.write(orjson.dumps(item) + b"\n")

Comment on lines 61 to +63
with open(file_path, "w") as f:
for item in data:
f.write(ujson.dumps(item) + "\n")
f.write(orjson.dumps(item).decode() + "\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I recommend eliminating the calls to .decode():

    with open(file_path, "wb") as f:
        for item in data:
            f.write(orjson.dumps(item) + b"\n")

This suggestion applies to the other change in this file.

Comment on lines 251 to +252
with open(path, encoding="utf-8") as f:
state = ujson.loads(f.read())
state = orjson.loads(f.read().encode('utf-8'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last place I'll leave feedback, but in general the suggestion is "don't decode content only to re-encode it immediately:

            with open(path, "rb") as f:
                state = orjson.loads(f.read())

Also, these are pathlib objects so this is more ideal:

            state = orjson.loads(path.read_bytes())

@okhat
Copy link
Collaborator

okhat commented Aug 10, 2025

Thanks so much @aadya940 and thanks @kurtmckee for the comments!

@aadya940 Mind addressing the failures (ruff mostly? or maybe some tests?) and checking out the comments (I didn't dive into them)

0xEval pushed a commit to 0xEval/dspy-orjson-mig that referenced this pull request Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants