Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -372,10 +372,10 @@ I/O
^^^
- :meth:`DataFrame.to_orc` now raising ``ValueError`` when non-default :class:`Index` is given (:issue:`51828`)
- :meth:`DataFrame.to_sql` now raising ``ValueError`` when the name param is left empty while using SQLAlchemy to connect (:issue:`52675`)
- Bug in :func:`json_normalize`, json_normalize cannot parse metadata fields list type (:issue:`#37782`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Bug in :func:`json_normalize`, json_normalize cannot parse metadata fields list type (:issue:`#37782`)
- Bug in :func:`json_normalize`, json_normalize cannot parse metadata fields list type (:issue:`37782`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke I'm seeing the message: This branch has conflicts that must be resolved
Only those with write access to this repository can merge pull requests.
Conflicting files
doc/source/whatsnew/v2.1.0.rst

Can you please take a look and guide me on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow these constructions to update your branch and fix the conflict: https://pandas.pydata.org/docs/development/contributing.html#updating-your-pull-request

- Bug in :func:`read_hdf` not properly closing store after a ``IndexError`` is raised (:issue:`52781`)
- Bug in :func:`read_html`, style elements were read into DataFrames (:issue:`52197`)
- Bug in :func:`read_html`, tail texts were removed together with elements containing ``display:none`` style (:issue:`51629`)
-

Period
^^^^^^
Expand Down
12 changes: 11 additions & 1 deletion pandas/io/json/_normalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -535,5 +535,15 @@ def _recursive_extract(data, path, seen_meta, level: int = 0) -> None:
raise ValueError(
f"Conflicting metadata name {k}, need distinguishing prefix "
)
result[k] = np.array(v, dtype=object).repeat(lengths)
#### FIX BUG #37782: https://github.com/pandas-dev/pandas/issues/37782

values = np.array(v, dtype=object)

if values.ndim > 1:
# GH#37782
values = np.empty((len(v),), dtype=object)
for i, v in enumerate(v):
values[i] = v

result[k] = values.repeat(lengths)
return result
16 changes: 16 additions & 0 deletions pandas/tests/io/json/test_normalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,11 @@ def max_level_test_input_data():
]


@pytest.fixture
def parse_metadata_fields_list_type():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just hardcode this in the function?

Copy link
Contributor Author

@felipemaion felipemaion May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke like that?

    def test_fields_list_type_normalize(self):
        parse_metadata_fields_list_type=[{"values": [1, 2, 3], "metadata": {"listdata": [1, 2]}}]
        result = json_normalize(
            parse_metadata_fields_list_type,
            record_path=["values"],
            meta=[["metadata", "listdata"]],
        )
        expected = DataFrame(
            {0: [1, 2, 3], "metadata.listdata": [[1, 2], [1, 2], [1, 2]]}
        )
        tm.assert_frame_equal(result, expected)
        
    Should I push it? I did this as function just by looking other examples in the file.
    Let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke I just did it. But things looks different. I'm not sure if it was pushed properly. Can you check please?
(My first PR for a big project)

return [{"values": [1, 2, 3], "metadata": {"listdata": [1, 2]}}]


class TestJSONNormalize:
def test_simple_records(self):
recs = [
Expand Down Expand Up @@ -170,6 +175,17 @@ def test_simple_normalize(self, state_data):

tm.assert_frame_equal(result, expected)

def test_fields_list_type_normalize(self, parse_metadata_fields_list_type):
result = json_normalize(
parse_metadata_fields_list_type,
record_path=["values"],
meta=[["metadata", "listdata"]],
)
expected = DataFrame(
{0: [1, 2, 3], "metadata.listdata": [[1, 2], [1, 2], [1, 2]]}
)
tm.assert_frame_equal(result, expected)

def test_empty_array(self):
result = json_normalize([])
expected = DataFrame()
Expand Down