add `ntv-pandas` to the ecosystem #55421

loco-philippe · 2023-10-06T07:52:48Z

As defined in the conclusion of the PDEP-12, add ntv-pandas to the ecosystem .

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

datapythonista · 2023-10-06T08:06:38Z

web/pandas/community/ecosystem.md

@@ -345,6 +345,23 @@ which pandas excels.

 ## IO

+### [NTV-pandas](https://github.com/loco-philippe/ntv-pandas)*: A semantic, compact and reversible JSON-pandas converter*
+
+pandas provides JSON converter but three limitations are present:


Thanks for the contributiin. Can you phrase this not talking about pandas or its limitations, but about NTV-pandas directly? Something like "NTV-pandas provides a JSON converter with more data types than the ones supoorted by pandas directly. Its main features are: ...". You don't need to use my description of course, but I think it's more useful for users to go direct to what the project does than starting by pandas limitations. If users can have an intuition on whether the project can be for them in a more concise way, then they can quickly jump to the project site for the details.

What I'd add here is a very concise example, so users can faster understand not only what the project does, but what it takes to use it.

Happy to discuss further if you disagree on my points, but I think this approach will make readers life easier.

Thank you for your quick reply and your advise !

You are right and I suggest changing the text as below,:

NTV-pandas provides a JSON converter with more data types than the ones supported by pandas directly.

*It supports the following data-types: *

pandas dtype,

data-type defined in the NTV format

data-type defined in Table Schema specification,

The interface is always reversible (conversion round trip) with two formats (JSON-NTV and JSON-TableSchema).

NTV-pandas was developed originally in the json-NTV project

Is it better and sufficiently clear and concise ?

Yes, this looks great. Just couple of styling comments, I wouldn't use bold for the line between * if that's the idea. And I would use "data types" consistently in the bullet points instead of dtype and data-type. I'd also remove the commas at the end of the bullet points (or be consistent).

The last line on where it was developed doesn't seem relevant to me for the average pandas user potentially interested in this, I'd leave this to the project page itself.

And as I said, if you can show a minimal example on how the library is used, I think that can help users understand better before clicking in the link if this project may be of interest.

Anyways, great job, I think it's useful this way.

-> I took the comments into account and added the example. That's what happens (not too long ?):

NTV-pandas provides a JSON converter with more data types than the ones supported by pandas directly.

It supports the following data types:

pandas data types

data types defined in the NTV format

data types defined in Table Schema specification

The interface is always reversible (conversion round trip) with two formats (JSON-NTV and JSON-TableSchema).

data example

In [1]: from shapely.geometry import Point from datetime import date import pandas as pd import ntv_pandas as npd In [2]: data = {'dates::date': [date(1964,1,1), date(1985,2,5)], 'value32': pd.Series([12, 10], dtype='int32'), 'coord::point': [Point(1,2), Point(3,4)], 'unique': True } df = pd.DataFrame(data) In [3]: df Out[3]: dates::date value32 coord::point unique 1 1964-01-01 12 POINT (1 2) True 2 1985-02-05 10 POINT (3 4) True

JSON representation

In [4]: pprint(npd.to_json(df), sort_dicts=False) Out[4]: {':tab': {'index': [0, 1], 'dates::date': ['1964-01-01', '1985-02-05'], 'value32::int32': [12, 10], 'coord::point': [[1.0, 2.0], [3.0, 4.0]], 'unique': True}} In [5]: pprint(npd.to_json(df, table=True), width=100, sort_dicts=False) Out[5]: {'schema': {'fields': [{'name': 'index', 'type': 'integer'}, {'name': 'dates', 'type': 'date'}, {'name': 'value32', 'type': 'integer', 'format': 'int32'}, {'name': 'coord', 'type': 'geopoint', 'format': 'array'}, {'name': 'unique', 'type': 'boolean'}], 'primaryKey': ['index'], 'pandas_version': '1.4.0'}, 'data': [{'index': 0, 'dates': '1964-01-01', 'value32': 12, 'coord': [1.0, 2.0], 'unique': True}, {'index': 1, 'dates': '1985-02-05', 'value32': 10, 'coord': [3.0, 4.0], 'unique': True}]}

Reversibility

In [6]: print(npd.read_json(npd.to_json(df)).equals(df), npd.read_json(npd.to_json(df, table=True)).equals(df)) Out[6]: True True

The first part of the example can be reduced (without [3]):

In [1]: from shapely.geometry import Point import pandas as pd import ntv_pandas as npd In [2]: df = pd.DataFrame({'dates::date': [datetime.date(1964,1,1), datetime.date(1985,2,5)], 'value32': pd.Series([12, 10], dtype='int32'), 'coord::point': [Point(1,2), Point(3,4)], 'unique': True }

I think the example is indeed way too big. What do you think about just something like:

import ntv_pandas as npd df = npd.read_json('data.json') # load (maybe add some arg that highlights the difference with pandas.read_json? npd.to_json(df) # save (not sure if the code you wrote about is correct, seems to be missing the path where to save)

In any case, can you update the PR with whatever you think it's best, it's easier to review in the PR than in a comment.

Btw, if you are the author of the library, do you know that you can register an accessor so on import ntv_pandas you get something like df.npd.to_json() available? You can check the docs on extending pandas for more info, it's trivial to implement.

the PR is updated !

You are right with the accessor, i try this:

@pd.api.extensions.register_dataframe_accessor("npd") class NtvAccessor: def __init__(self, pandas_obj): self._obj = pandas_obj def to_json(self, **kwargs): return ntv_pandas.to_json(self._obj, **kwargs)

and it works !!!

mroeschke · 2023-10-06T16:10:22Z

Thanks @loco-philippe

datapythonista · 2023-10-06T16:47:18Z

Thanks @loco-philippe, very nice, please have a look at the rendered site, this update should be updated shortly, and open a follow up PR is something is not displayed as expected.

loco-philippe · 2023-10-06T20:42:13Z

Thank you very much for your time and help, you were efficient and quick !

The rendering of the bullet list is not ok -> I will open an other PR.

Note: I have the same rendering problem with the last update of the PDEP-12, i will open another PR too

loco-philippe added 30 commits June 18, 2023 15:09

Create 0007-compact-and-reversible-JSON-interface.md

049b7b0

Merge branch 'main' into ntv

6dc92e4

Merge branch 'main' into ntv

cd6884d

change PDEP number (7 -> 12)

09ed538

Merge branch 'main' into ntv

13e6fd2

Add FAQ to the PDEPS 0012

f4d1f5e

Merge branch 'main' into ntv

fff7adb

Merge remote-tracking branch 'upstream/main' into ntv

92732e6

Update 0012-compact-and-reversible-JSON-interface.md

8d0f2f4

Update 0012-compact-and-reversible-JSON-interface.md

3f3aae0

Update 0012-compact-and-reversible-JSON-interface.md

82b1992

pre-commit codespell

a051d9c

Merge remote-tracking branch 'upstream/main' into ntv

a0f16dd

Update 0012-compact-and-reversible-JSON-interface.md

63d92ec

Update 0012-compact-and-reversible-JSON-interface.md

aca4a47

delete summary

d0b41a6

delete mermaid flowchart

3f7135a

with summary, without mermaid flowchart

16d7201

rename Annexe -> Appendix

08cf17b

Merge remote-tracking branch 'upstream/main' into ntv

7da5c63

add tableschema specification

ec31662

add orient="table"

4dbb822

Add Table Schema extension

38e92b2

Update 0012-compact-and-reversible-JSON-interface.md

1e3f793

Update 0012-compact-and-reversible-JSON-interface.md

8dad555

Merge remote-tracking branch 'upstream/main' into ntv

2239c66

Update 0012-compact-and-reversible-JSON-interface.md

7e7d878

Update 0012-compact-and-reversible-JSON-interface.md

fbc5fe5

Merge remote-tracking branch 'upstream/main' into ntv

cceee0b

Update 0012-compact-and-reversible-JSON-interface.md

65bee1d

loco-philippe added 3 commits October 1, 2023 17:01

Update 0012-compact-and-reversible-JSON-interface.md

c567277

Merge remote-tracking branch 'upstream/main' into ntv

698835c

add 'ntv-pandas' in the 'ecosystem' file

1bbe9bf

loco-philippe requested a review from datapythonista as a code owner October 6, 2023 07:52

Update ecosystem.md

3d234de

datapythonista reviewed Oct 6, 2023

View reviewed changes

loco-philippe added 3 commits October 6, 2023 16:03

Update ecosystem.md

cf1a82d

Merge remote-tracking branch 'upstream/main' into ntv

f555a75

Update ecosystem.md

fcbfa72

mroeschke added the Docs label Oct 6, 2023

mroeschke added this to the 2.2 milestone Oct 6, 2023

mroeschke approved these changes Oct 6, 2023

View reviewed changes

mroeschke merged commit a6893c1 into pandas-dev:main Oct 6, 2023

loco-philippe mentioned this pull request Oct 6, 2023

WEB: Fix rendering of ntv-pandas ecosystem page #55430

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add `ntv-pandas` to the ecosystem #55421

add `ntv-pandas` to the ecosystem #55421

Uh oh!

loco-philippe commented Oct 6, 2023

Uh oh!

datapythonista Oct 6, 2023

Uh oh!

loco-philippe Oct 6, 2023

Uh oh!

datapythonista Oct 6, 2023

Uh oh!

loco-philippe Oct 6, 2023

Uh oh!

loco-philippe Oct 6, 2023

Uh oh!

datapythonista Oct 6, 2023

Uh oh!

loco-philippe Oct 6, 2023

Uh oh!

mroeschke commented Oct 6, 2023

Uh oh!

datapythonista commented Oct 6, 2023

Uh oh!

loco-philippe commented Oct 6, 2023

Uh oh!

Uh oh!

Uh oh!

add ntv-pandas to the ecosystem #55421

add ntv-pandas to the ecosystem #55421

Uh oh!

Conversation

loco-philippe commented Oct 6, 2023

Uh oh!

datapythonista Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

loco-philippe Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

datapythonista Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

loco-philippe Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

loco-philippe Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

datapythonista Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

loco-philippe Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Oct 6, 2023

Uh oh!

datapythonista commented Oct 6, 2023

Uh oh!

loco-philippe commented Oct 6, 2023

Uh oh!

Uh oh!

add `ntv-pandas` to the ecosystem #55421

add `ntv-pandas` to the ecosystem #55421