Skip to content

Set python-snappy as optional dependency to work with Python 3.11 on pip install #116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
weiji14 opened this issue May 20, 2023 · 2 comments · Fixed by #117
Closed

Set python-snappy as optional dependency to work with Python 3.11 on pip install #116

weiji14 opened this issue May 20, 2023 · 2 comments · Fixed by #117

Comments

@weiji14
Copy link
Contributor

weiji14 commented May 20, 2023

Is your feature request related to a problem? Please describe.

Trying to install spatialpandas in a Python 3.11 environment currently fails due to a hard dependency on python-snappy which doesn't have wheels for Python 3.11 (see intake/python-snappy#124).

mamba create --name temp python=3.11
mamba activate temp
python -m pip install spatialpandas==0.4.7

produces this traceback

Collecting spatialpandas==0.4.7
  Using cached spatialpandas-0.4.7-py2.py3-none-any.whl (120 kB)
Collecting dask (from spatialpandas==0.4.7)
  Using cached dask-2023.5.0-py3-none-any.whl (1.2 MB)
Collecting fsspec (from spatialpandas==0.4.7)
  Using cached fsspec-2023.5.0-py3-none-any.whl (160 kB)
Collecting numba (from spatialpandas==0.4.7)
  Downloading numba-0.57.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 22.8 MB/s eta 0:00:00
Collecting pandas (from spatialpandas==0.4.7)
  Downloading pandas-2.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.2/12.2 MB 19.9 MB/s eta 0:00:00
Collecting param (from spatialpandas==0.4.7)
  Using cached param-1.13.0-py2.py3-none-any.whl (87 kB)
Collecting pyarrow>=1.0 (from spatialpandas==0.4.7)
  Downloading pyarrow-12.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.9/38.9 MB 3.2 MB/s eta 0:00:00
Collecting python-snappy (from spatialpandas==0.4.7)
  Downloading python-snappy-0.6.1.tar.gz (24 kB)
  Preparing metadata (setup.py) ... done
Collecting retrying (from spatialpandas==0.4.7)
  Using cached retrying-1.3.4-py3-none-any.whl (11 kB)
Collecting numpy>=1.16.6 (from pyarrow>=1.0->spatialpandas==0.4.7)
  Downloading numpy-1.24.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 7.6 MB/s eta 0:00:00
Collecting click>=8.0 (from dask->spatialpandas==0.4.7)
  Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting cloudpickle>=1.5.0 (from dask->spatialpandas==0.4.7)
  Using cached cloudpickle-2.2.1-py3-none-any.whl (25 kB)
Collecting packaging>=20.0 (from dask->spatialpandas==0.4.7)
  Using cached packaging-23.1-py3-none-any.whl (48 kB)
Collecting partd>=1.2.0 (from dask->spatialpandas==0.4.7)
  Using cached partd-1.4.0-py3-none-any.whl (18 kB)
Collecting pyyaml>=5.3.1 (from dask->spatialpandas==0.4.7)
  Downloading PyYAML-6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (757 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 757.9/757.9 kB 3.3 MB/s eta 0:00:00
Collecting toolz>=0.10.0 (from dask->spatialpandas==0.4.7)
  Using cached toolz-0.12.0-py3-none-any.whl (55 kB)
Collecting importlib-metadata>=4.13.0 (from dask->spatialpandas==0.4.7)
  Using cached importlib_metadata-6.6.0-py3-none-any.whl (22 kB)
Collecting llvmlite<0.41,>=0.40.0dev0 (from numba->spatialpandas==0.4.7)
  Downloading llvmlite-0.40.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.1/42.1 MB 9.5 MB/s eta 0:00:00
Collecting python-dateutil>=2.8.2 (from pandas->spatialpandas==0.4.7)
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2020.1 (from pandas->spatialpandas==0.4.7)
  Using cached pytz-2023.3-py2.py3-none-any.whl (502 kB)
Collecting tzdata>=2022.1 (from pandas->spatialpandas==0.4.7)
  Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB)
Collecting six>=1.7.0 (from retrying->spatialpandas==0.4.7)
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting zipp>=0.5 (from importlib-metadata>=4.13.0->dask->spatialpandas==0.4.7)
  Using cached zipp-3.15.0-py3-none-any.whl (6.8 kB)
Collecting locket (from partd>=1.2.0->dask->spatialpandas==0.4.7)
  Using cached locket-1.0.0-py2.py3-none-any.whl (4.4 kB)
Building wheels for collected packages: python-snappy
  Building wheel for python-snappy (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [27 lines of output]
      /home/user/mambaforge/envs/temp/lib/python3.11/site-packages/setuptools/_distutils/dist.py:265: UserWarning: Unknown distribution option: 'cffi_modules'
        warnings.warn(msg)
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-311
      creating build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/__main__.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/snappy.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/hadoop_snappy.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/snappy_cffi_builder.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/snappy_cffi.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/snappy_formats.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/__init__.py -> build/lib.linux-x86_64-cpython-311/snappy
      running build_ext
      building 'snappy._snappy' extension
      creating build/temp.linux-x86_64-cpython-311
      creating build/temp.linux-x86_64-cpython-311/src
      creating build/temp.linux-x86_64-cpython-311/src/snappy
      gcc -pthread -B /home/user/mambaforge/envs/temp/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/user/mambaforge/envs/temp/include -fPIC -O2 -isystem /home/user/mambaforge/envs/temp/include -fPIC -I/home/user/mambaforge/envs/temp/include/python3.11 -c src/snappy/crc32c.c -o build/temp.linux-x86_64-cpython-311/src/snappy/crc32c.o
      gcc -pthread -B /home/user/mambaforge/envs/temp/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/user/mambaforge/envs/temp/include -fPIC -O2 -isystem /home/user/mambaforge/envs/temp/include -fPIC -I/home/user/mambaforge/envs/temp/include/python3.11 -c src/snappy/snappymodule.cc -o build/temp.linux-x86_64-cpython-311/src/snappy/snappymodule.o
      src/snappy/snappymodule.cc:33:10: fatal error: snappy-c.h: No such file or directory
         33 | #include <snappy-c.h>
            |          ^~~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for python-snappy
  Running setup.py clean for python-snappy
Failed to build python-snappy
ERROR: Could not build wheels for python-snappy, which is required to install pyproject.toml-based projects

Describe the solution you'd like

A clear and concise description of what you want to happen.

Convert python-snappy from a required to an optional dependency in the setup.py file:

spatialpandas/setup.py

Lines 31 to 40 in bc3e52c

install_requires = [
'dask',
'fsspec',
'numba',
'pandas',
'param',
'pyarrow >=1.0',
'python-snappy',
'retrying',
]

Looking at the codebase, I only see snappy mentioned for the parquet I/O in two places:

compression="snappy",

compression: Optional[str] = "snappy",
filesystem: Optional[fsspec.AbstractFileSystem] = None,
index: Optional[bool] = None,
storage_options: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> None:
if filesystem is not None:
filesystem = validate_coerce_filesystem(path, filesystem, storage_options)
# Standard pandas to_parquet with pyarrow engine
to_parquet_args = {
"df": df,
"path": path,
"engine": "pyarrow",
"compression": compression,
"filesystem": filesystem,
"index": index,
**kwargs,
}
if PANDAS_GE_12:
to_parquet_args.update({"storage_options": storage_options})
else:
if filesystem is None:
filesystem = validate_coerce_filesystem(path, filesystem, storage_options)
to_parquet_args.update({"filesystem": filesystem})
pd_to_parquet(**to_parquet_args)
def read_parquet(
path: PathType,
columns: Optional[Iterable[str]] = None,
filesystem: Optional[fsspec.AbstractFileSystem] = None,
storage_options: Optional[Dict[str, Any]] = None,
engine_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> GeoDataFrame:
engine_kwargs = engine_kwargs or {}
filesystem = validate_coerce_filesystem(path, filesystem, storage_options)
if LEGACY_PYARROW:
basic_kwargs = dict(validate_schema=False)
else:
basic_kwargs = dict(use_legacy_dataset=False)
# Load using pyarrow to handle parquet files and directories across filesystems
dataset = ParquetDataset(
path,
filesystem=filesystem,
**basic_kwargs,
**engine_kwargs,
**kwargs,
)
if LEGACY_PYARROW:
metadata = _load_parquet_pandas_metadata(
path,
filesystem=filesystem,
storage_options=storage_options,
engine_kwargs=engine_kwargs,
)
else:
metadata = dataset.schema.pandas_metadata
# If columns specified, prepend index columns to it
if columns is not None:
all_columns = set(column['name'] for column in metadata.get('columns', []))
index_col_metadata = metadata.get('index_columns', [])
extra_index_columns = []
for idx_metadata in index_col_metadata:
if isinstance(idx_metadata, str):
name = idx_metadata
elif isinstance(idx_metadata, dict):
name = idx_metadata.get('name', None)
else:
name = None
if name is not None and name not in columns and name in all_columns:
extra_index_columns.append(name)
columns = extra_index_columns + list(columns)
df = dataset.read(columns=columns).to_pandas()
# Return result
return GeoDataFrame(df)
def to_parquet_dask(
ddf: DaskGeoDataFrame,
path: PathType,
compression: Optional[str] = "snappy",

So for operations that don't use parquet, it should not be necessary to use python-snappy. Note that pandas does support other compression methods like gzip as mentioned at https://pandas.pydata.org/pandas-docs/version/2.0/reference/api/pandas.DataFrame.to_parquet.html, though snappy compression is currently the default.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Ideally, python-snappy would release Python 3.11 compatible wheels at intake/python-snappy#124, but the last commit on that repo was 17 Mar 2022, so not likely to happen anytime soon.

Additional context

Add any other context or screenshots about the feature request here.

I noticed that there was a PR checking for Python 3.11 compatibility at #113, but in that case, python-snappy was installed from conda-forge (that does support Python 3.11 https://anaconda.org/conda-forge/python-snappy/files?version=0.6.1) rather than PyPI.

For historical context, snappy was added as a required dependency in 498e7fc/#60.

Happy to open a PR to make python-snappy optional if the above sounds good!

@ianthomas23
Copy link
Member

Happy to open a PR to make python-snappy optional if the above sounds good!

Hi @weiji14. Yes please!

It looks like from intake/python-snappy#124 that the recommendation is to replace use of python-snappy with cramjam instead. But that is long-term, at the moment I'd be happy if our CI passes without python-snappy.

@weiji14
Copy link
Contributor Author

weiji14 commented May 22, 2023

It looks like from andrix/python-snappy#124 that the recommendation is to replace use of python-snappy with cramjam instead. But that is long-term, at the moment I'd be happy if our CI passes without python-snappy.

Cool, started a PR at #117. We could look into cramjam separately, it looks like a promising replacement built on Rust!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants