-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
add ntv-pandas
to the ecosystem
#55421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
web/pandas/community/ecosystem.md
Outdated
@@ -345,6 +345,23 @@ which pandas excels. | |||
|
|||
## IO | |||
|
|||
### [NTV-pandas](https://github.com/loco-philippe/ntv-pandas)*: A semantic, compact and reversible JSON-pandas converter* | |||
|
|||
pandas provides JSON converter but three limitations are present: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contributiin. Can you phrase this not talking about pandas or its limitations, but about NTV-pandas directly? Something like "NTV-pandas provides a JSON converter with more data types than the ones supoorted by pandas directly. Its main features are: ...". You don't need to use my description of course, but I think it's more useful for users to go direct to what the project does than starting by pandas limitations. If users can have an intuition on whether the project can be for them in a more concise way, then they can quickly jump to the project site for the details.
What I'd add here is a very concise example, so users can faster understand not only what the project does, but what it takes to use it.
Happy to discuss further if you disagree on my points, but I think this approach will make readers life easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your quick reply and your advise !
You are right and I suggest changing the text as below,:
NTV-pandas provides a JSON converter with more data types than the ones supported by pandas directly.
*It supports the following data-types: *
- pandas
dtype
, - data-type defined in the NTV format
- data-type defined in Table Schema specification,
The interface is always reversible (conversion round trip) with two formats (JSON-NTV and JSON-TableSchema).
NTV-pandas was developed originally in the json-NTV project
Is it better and sufficiently clear and concise ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks great. Just couple of styling comments, I wouldn't use bold for the line between * if that's the idea. And I would use "data types" consistently in the bullet points instead of dtype and data-type. I'd also remove the commas at the end of the bullet points (or be consistent).
The last line on where it was developed doesn't seem relevant to me for the average pandas user potentially interested in this, I'd leave this to the project page itself.
And as I said, if you can show a minimal example on how the library is used, I think that can help users understand better before clicking in the link if this project may be of interest.
Anyways, great job, I think it's useful this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> I took the comments into account and added the example. That's what happens (not too long ?):
NTV-pandas provides a JSON converter with more data types than the ones supported by pandas directly.
It supports the following data types:
- pandas data types
- data types defined in the NTV format
- data types defined in Table Schema specification
The interface is always reversible (conversion round trip) with two formats (JSON-NTV and JSON-TableSchema).
data example
In [1]: from shapely.geometry import Point
from datetime import date
import pandas as pd
import ntv_pandas as npd
In [2]: data = {'dates::date': [date(1964,1,1), date(1985,2,5)],
'value32': pd.Series([12, 10], dtype='int32'),
'coord::point': [Point(1,2), Point(3,4)],
'unique': True }
df = pd.DataFrame(data)
In [3]: df
Out[3]: dates::date value32 coord::point unique
1 1964-01-01 12 POINT (1 2) True
2 1985-02-05 10 POINT (3 4) True
JSON representation
In [4]: pprint(npd.to_json(df), sort_dicts=False)
Out[4]: {':tab': {'index': [0, 1],
'dates::date': ['1964-01-01', '1985-02-05'],
'value32::int32': [12, 10],
'coord::point': [[1.0, 2.0], [3.0, 4.0]],
'unique': True}}
In [5]: pprint(npd.to_json(df, table=True), width=100, sort_dicts=False)
Out[5]: {'schema': {'fields': [{'name': 'index', 'type': 'integer'},
{'name': 'dates', 'type': 'date'},
{'name': 'value32', 'type': 'integer', 'format': 'int32'},
{'name': 'coord', 'type': 'geopoint', 'format': 'array'},
{'name': 'unique', 'type': 'boolean'}],
'primaryKey': ['index'],
'pandas_version': '1.4.0'},
'data': [{'index': 0, 'dates': '1964-01-01', 'value32': 12, 'coord': [1.0, 2.0], 'unique': True},
{'index': 1, 'dates': '1985-02-05', 'value32': 10, 'coord': [3.0, 4.0], 'unique': True}]}
Reversibility
In [6]: print(npd.read_json(npd.to_json(df)).equals(df),
npd.read_json(npd.to_json(df, table=True)).equals(df))
Out[6]: True True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first part of the example can be reduced (without [3]):
In [1]: from shapely.geometry import Point
import pandas as pd
import ntv_pandas as npd
In [2]: df = pd.DataFrame({'dates::date': [datetime.date(1964,1,1), datetime.date(1985,2,5)],
'value32': pd.Series([12, 10], dtype='int32'),
'coord::point': [Point(1,2), Point(3,4)],
'unique': True }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the example is indeed way too big. What do you think about just something like:
import ntv_pandas as npd
df = npd.read_json('data.json') # load (maybe add some arg that highlights the difference with pandas.read_json?
npd.to_json(df) # save (not sure if the code you wrote about is correct, seems to be missing the path where to save)
In any case, can you update the PR with whatever you think it's best, it's easier to review in the PR than in a comment.
Btw, if you are the author of the library, do you know that you can register an accessor so on import ntv_pandas
you get something like df.npd.to_json()
available? You can check the docs on extending pandas for more info, it's trivial to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the PR is updated !
You are right with the accessor, i try this:
@pd.api.extensions.register_dataframe_accessor("npd")
class NtvAccessor:
def __init__(self, pandas_obj):
self._obj = pandas_obj
def to_json(self, **kwargs):
return ntv_pandas.to_json(self._obj, **kwargs)
and it works !!!
Thanks @loco-philippe |
Thanks @loco-philippe, very nice, please have a look at the rendered site, this update should be updated shortly, and open a follow up PR is something is not displayed as expected. |
Thank you very much for your time and help, you were efficient and quick ! The rendering of the bullet list is not ok -> I will open an other PR. Note: I have the same rendering problem with the last update of the PDEP-12, i will open another PR too |
As defined in the conclusion of the PDEP-12, add
ntv-pandas
to the ecosystem .doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.