-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Port to orjson
from ujson
#8584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -318,7 +318,7 @@ def _save_data_to_local_file(train_data: list[dict[str, Any]], data_format: Trai | |||
elif data_format == TrainDataFormat.COMPLETION: | |||
_validate_completion_data(item) | |||
|
|||
f.write(ujson.dumps(item) + "\n") | |||
f.write(orjson.dumps(item).decode() + "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend changing the open(file_path, "w")
, above, to "wb" mode to eliminate the decodes happening here.
The end result will be:
f.write(orjson.dumps(item).decode() + "\n") | |
f.write(orjson.dumps(item) + b"\n") |
with open(file_path, "w") as f: | ||
for item in data: | ||
f.write(ujson.dumps(item) + "\n") | ||
f.write(orjson.dumps(item).decode() + "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I recommend eliminating the calls to .decode()
:
with open(file_path, "wb") as f:
for item in data:
f.write(orjson.dumps(item) + b"\n")
This suggestion applies to the other change in this file.
with open(path, encoding="utf-8") as f: | ||
state = ujson.loads(f.read()) | ||
state = orjson.loads(f.read().encode('utf-8')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last place I'll leave feedback, but in general the suggestion is "don't decode content only to re-encode it immediately:
with open(path, "rb") as f:
state = orjson.loads(f.read())
Also, these are pathlib objects so this is more ideal:
state = orjson.loads(path.read_bytes())
Thanks so much @aadya940 and thanks @kurtmckee for the comments! @aadya940 Mind addressing the failures (ruff mostly? or maybe some tests?) and checking out the comments (I didn't dive into them) |
As specified in #8540 ,
ujson
is in maintanance mode.