Skip to content

add ntv-pandas to the ecosystem #55421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Oct 6, 2023
Merged

add ntv-pandas to the ecosystem #55421

merged 37 commits into from
Oct 6, 2023

Conversation

loco-philippe
Copy link
Contributor

As defined in the conclusion of the PDEP-12, add ntv-pandas to the ecosystem .

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@@ -345,6 +345,23 @@ which pandas excels.

## IO

### [NTV-pandas](https://github.com/loco-philippe/ntv-pandas)*: A semantic, compact and reversible JSON-pandas converter*

pandas provides JSON converter but three limitations are present:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contributiin. Can you phrase this not talking about pandas or its limitations, but about NTV-pandas directly? Something like "NTV-pandas provides a JSON converter with more data types than the ones supoorted by pandas directly. Its main features are: ...". You don't need to use my description of course, but I think it's more useful for users to go direct to what the project does than starting by pandas limitations. If users can have an intuition on whether the project can be for them in a more concise way, then they can quickly jump to the project site for the details.

What I'd add here is a very concise example, so users can faster understand not only what the project does, but what it takes to use it.

Happy to discuss further if you disagree on my points, but I think this approach will make readers life easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your quick reply and your advise !

You are right and I suggest changing the text as below,:

NTV-pandas provides a JSON converter with more data types than the ones supported by pandas directly.

*It supports the following data-types: *

The interface is always reversible (conversion round trip) with two formats (JSON-NTV and JSON-TableSchema).

NTV-pandas was developed originally in the json-NTV project

Is it better and sufficiently clear and concise ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks great. Just couple of styling comments, I wouldn't use bold for the line between * if that's the idea. And I would use "data types" consistently in the bullet points instead of dtype and data-type. I'd also remove the commas at the end of the bullet points (or be consistent).

The last line on where it was developed doesn't seem relevant to me for the average pandas user potentially interested in this, I'd leave this to the project page itself.

And as I said, if you can show a minimal example on how the library is used, I think that can help users understand better before clicking in the link if this project may be of interest.

Anyways, great job, I think it's useful this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> I took the comments into account and added the example. That's what happens (not too long ?):

NTV-pandas provides a JSON converter with more data types than the ones supported by pandas directly.

It supports the following data types:

The interface is always reversible (conversion round trip) with two formats (JSON-NTV and JSON-TableSchema).

data example

In [1]: from shapely.geometry import Point
        from datetime import date
        import pandas as pd
        import ntv_pandas as npd

In [2]: data = {'dates::date':  [date(1964,1,1), date(1985,2,5)],
                'value32':      pd.Series([12, 10], dtype='int32'),
                'coord::point': [Point(1,2), Point(3,4)],
                'unique':       True }
        df = pd.DataFrame(data)

In [3]: df
Out[3]:     dates::date  value32  coord::point   unique
        1    1964-01-01       12   POINT (1 2)   True
        2    1985-02-05       10   POINT (3 4)   True

JSON representation

In [4]: pprint(npd.to_json(df), sort_dicts=False)
Out[4]: {':tab': {'index': [0, 1],
                  'dates::date': ['1964-01-01', '1985-02-05'],
                  'value32::int32': [12, 10],
                  'coord::point': [[1.0, 2.0], [3.0, 4.0]],
                  'unique': True}}

In [5]: pprint(npd.to_json(df, table=True), width=100, sort_dicts=False)
Out[5]: {'schema': {'fields': [{'name': 'index', 'type': 'integer'},
                               {'name': 'dates', 'type': 'date'},
                               {'name': 'value32', 'type': 'integer', 'format': 'int32'},
                               {'name': 'coord', 'type': 'geopoint', 'format': 'array'},
                               {'name': 'unique', 'type': 'boolean'}],
                    'primaryKey': ['index'],
                    'pandas_version': '1.4.0'},
         'data': [{'index': 0, 'dates': '1964-01-01', 'value32': 12, 'coord': [1.0, 2.0], 'unique': True},
                  {'index': 1, 'dates': '1985-02-05', 'value32': 10, 'coord': [3.0, 4.0], 'unique': True}]}

Reversibility

In [6]: print(npd.read_json(npd.to_json(df)).equals(df),
              npd.read_json(npd.to_json(df, table=True)).equals(df))
Out[6]: True True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first part of the example can be reduced (without [3]):

In [1]: from shapely.geometry import Point
        import pandas as pd
        import ntv_pandas as npd

In [2]: df = pd.DataFrame({'dates::date':  [datetime.date(1964,1,1), datetime.date(1985,2,5)],
                           'value32':      pd.Series([12, 10], dtype='int32'),
                           'coord::point': [Point(1,2), Point(3,4)],
                           'unique':       True }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the example is indeed way too big. What do you think about just something like:

import ntv_pandas as npd

df = npd.read_json('data.json')  # load (maybe add some arg that highlights the difference with pandas.read_json?
npd.to_json(df)  # save (not sure if the code you wrote about is correct, seems to be missing the path where to save)

In any case, can you update the PR with whatever you think it's best, it's easier to review in the PR than in a comment.

Btw, if you are the author of the library, do you know that you can register an accessor so on import ntv_pandas you get something like df.npd.to_json() available? You can check the docs on extending pandas for more info, it's trivial to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR is updated !

You are right with the accessor, i try this:

@pd.api.extensions.register_dataframe_accessor("npd")
class NtvAccessor:
    def __init__(self, pandas_obj):
        self._obj = pandas_obj

    def to_json(self, **kwargs):
        return ntv_pandas.to_json(self._obj, **kwargs)

and it works !!!

@mroeschke mroeschke added the Docs label Oct 6, 2023
@mroeschke mroeschke added this to the 2.2 milestone Oct 6, 2023
@mroeschke mroeschke merged commit a6893c1 into pandas-dev:main Oct 6, 2023
@mroeschke
Copy link
Member

Thanks @loco-philippe

@datapythonista
Copy link
Member

Thanks @loco-philippe, very nice, please have a look at the rendered site, this update should be updated shortly, and open a follow up PR is something is not displayed as expected.

@loco-philippe
Copy link
Contributor Author

Thank you very much for your time and help, you were efficient and quick !

The rendering of the bullet list is not ok -> I will open an other PR.

Note: I have the same rendering problem with the last update of the PDEP-12, i will open another PR too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants