-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Jsonlines export error #2615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting @TevenLeScao! I'm having a look... |
(not sure what just happened on the assignations sorry) |
For some reason this happens (both |
@TevenLeScao we are using |
@TevenLeScao I have just checked it: this was a bug in |
Thanks ! I'm creating a PR |
Well I though it was me who has taken on this issue... 😅 |
Sorry, I was also talking to teven offline so I already had the PR ready before noticing x) |
I was also already working in my PR... Nevermind. Next time we should pay attention if there is somebody (self-)assigned to an issue and if he/she is still working on it before overtaking it... 😄 |
The fix is available on |
Describe the bug
When exporting large datasets in jsonlines (c4 in my case) the created file has an error every 9999 lines: the 9999th and 10000th are concatenated, thus breaking the jsonlines format. This sounds like it is related to batching, which is by 10000 by default
Steps to reproduce the bug
This what I'm running:
in python:
then out of python:
Expected results
Properly separated lines
Actual results
The last line is a concatenation of two lines
Environment info
datasets
version: 1.9.1.dev0The text was updated successfully, but these errors were encountered: