Skip to content

ENH: add option to save json without escaping forward slashes #61442

Open
@ellisbrown

Description

@ellisbrown

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I love pandas and use it extensively. one very common use case for me is saving large json / jsonl files to describe ML training datasets. unfortunately, pandas uses ujson under the hood which automatically escapes forward slashes---which are a very common use case in my dataset files to describe filepaths to images/videos/etc.

the escaped filepaths hit issues with some (non-pandas) downstream libs that ingest my json/jsonl dataset files. so instead of using of using the native pandas .to_json() function, I have to import the json package and manually write the file myself. this can be much slower for very large files

I am ok living with this inconvenience, but it seems to me to be a gap in the pandas api. perhaps adding an option to prevent the escaping could would be a good enhancement

Feature Description

add a new parameter to pandas.DataFrame.to_json() to escape_forward_slashes

def to_json(self, ..., escape_forward_slashes=True) -> str | None:
    ...

or even a ujson_options dict

def to_json(self, ..., ujson_options={}) -> str | None:
    ...

Alternative Solutions

instead of

df.to_json(path)

you have to manually use the json package

import json

with open(path, "w") as f:
    json.dump(df.to_dict(orient="records"), f)

Additional Context

also note that the ujson project explicitly states

this library has been put into a maintenance-only mode... Users are encouraged to migrate to orjson which is both much faster and less likely to introduce a surprise buffer overflow vulnerability in the future.

so it might be worth migrating to orjson during this development effort

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO JSONread_json, to_json, json_normalizeNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions