Skip to content

error when from datafusion import SessionContext  #830

Closed
@l1t1

Description

@l1t1

Describe the bug
when import


>>> from datafusion import SessionContext
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python38\lib\site-packages\datafusion\__init__.py", line 29, in <module>
    from .context import (
  File "D:\Python38\lib\site-packages\datafusion\context.py", line 22, in <module>
    from ._internal import SessionConfig as SessionConfigInternal
ImportError: DLL load failed while importing _internal: 找不到指定的程序。

To Reproduce

pip install datafusion -U
python
from datafusion import SessionContext

Expected behavior
it works as old version, such as version 36.0 did

>>> from datafusion import SessionContext
>>> ctx = SessionContext()
>>> ctx.register_parquet("taxi", "d:/yellow_tripdata_2022-01.parquet")
>>> x="select passenger_count, count(*) from taxi where passenger_count is not null group by passenger_count order by passenger_count"
>>> df = ctx.sql(x)
>>> df
DataFrame()
+-----------------+----------+
| passenger_count | COUNT(*) |
+-----------------+----------+
| 0.0             | 52061    |
| 1.0             | 1794055  |
| 2.0             | 343026   |
| 3.0             | 84570    |
| 4.0             | 35321    |
| 5.0             | 51338    |
| 6.0             | 32037    |
| 7.0             | 9        |
| 8.0             | 8        |
| 9.0             | 3        |
+-----------------+----------+

Additional context
my os is windows 7
my python version

Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:18:16) [MSC v.1928 64 bit (AMD64)] on win32

Activity

added
bugSomething isn't working
on Aug 22, 2024
Michael-J-Ward

Michael-J-Ward commented on Aug 22, 2024

@Michael-J-Ward
Contributor

So, I don't have a Windows machine but I did encounter a similar linux error when my environment couldn't load a shared C++ library that pyarrow needed (see bottom).

  • Could you provide the output for checking pyarrow and installing as I have below?
  • Is this the same machine / environment that you previously used with version 36.0.0?

Successful install and import

Checking pyarrow

❯ python
Python 3.12.4 (main, Jun  6 2024, 18:26:44) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> pyarrow.__version__
'17.0.0'

Installing

pip install datafusion -U
Collecting datafusion
  Using cached datafusion-40.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Requirement already satisfied: pyarrow>=11.0.0 in /nix/store/3zsajax8hkvl1yc9fygpjn702m2qwh7m-python3.12-pyarrow-17.0.0/lib/python3.12/site-packages (from datafusion) (17.0.0)
Collecting typing-extensions (from datafusion)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: numpy>=1.16.6 in /nix/store/5qnnxrlcfiiv9b84cj1n02gnfq2hbsp4-python3.12-numpy-1.26.4/lib/python3.12/site-packages (from pyarrow>=11.0.0->datafusion) (1.26.4)
Using cached datafusion-40.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.5 MB)
Using cached typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Installing collected packages: typing-extensions, datafusion
Successfully installed datafusion-40.1.0 typing-extensions-4.12.2

Running

python
Python 3.12.4 (main, Jun  6 2024, 18:26:44) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datafusion import SessionContext
>>> ctx = SessionContext()

Failed because my dev-env wasn't setup properly

python
Python 3.12.4 (main, Jun  6 2024, 18:26:44) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datafusion import SessionContext
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mike/workspace/rust-python-coverage/.venv/lib/python3.12/site-packages/datafusion/__init__.py", line 29, in <module>
    from .context import (
  File "/home/mike/workspace/rust-python-coverage/.venv/lib/python3.12/site-packages/datafusion/context.py", line 30, in <module>
    from datafusion.dataframe import DataFrame
  File "/home/mike/workspace/rust-python-coverage/.venv/lib/python3.12/site-packages/datafusion/dataframe.py", line 35, in <module>
    from datafusion.expr import Expr
  File "/home/mike/workspace/rust-python-coverage/.venv/lib/python3.12/site-packages/datafusion/expr.py", line 28, in <module>
    import pyarrow as pa
  File "/home/mike/workspace/rust-python-coverage/.venv/lib/python3.12/site-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory
l1t1

l1t1 commented on Aug 22, 2024

@l1t1
Author

Checking pyarrow

>>> import pyarrow
>>> pyarrow.__version__
'15.0.0'

Installing

pip install datafusion -U
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: datafusion in d:\python38\lib\site-packages (36.0.0)
Collecting datafusion
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9f/12/3a6bf3baa1759315f14cbfb8006efee2ae8971c378e7363732342ff58417/datafusion-40.1.0-cp38-abi3-win_amd64.whl (17.9 MB)
     ---------------------------------------- 17.9/17.9 MB 2.3 MB/s eta 0:00:00
Requirement already satisfied: pyarrow>=11.0.0 in d:\python38\lib\site-packages (from datafusion) (15.0.0)
Requirement already satisfied: typing-extensions in d:\python38\lib\site-packages (from datafusion) (4.10.0)
Requirement already satisfied: numpy<2,>=1.16.6 in d:\python38\lib\site-packages (from pyarrow>=11.0.0->datafusion) (1.21.0)
Installing collected packages: datafusion
  Attempting uninstall: datafusion
    Found existing installation: datafusion 36.0.0
    Uninstalling datafusion-36.0.0:
      Successfully uninstalled datafusion-36.0.0
Successfully installed datafusion-40.1.0

the machine / environment is the same of version 36

Michael-J-Ward

Michael-J-Ward commented on Aug 22, 2024

@Michael-J-Ward
Contributor

I know this isn't your exact setup, but I was able to spin a vm up with:

  • Windows 10
  • python 3.11.9

I was able to successfully install with pip install datafusion -U and run

>>> import pyarrow
>>> pyarrow.__version__
'17.0.0'
>>> from datafusion import SessionContext
>>> ctx = SessionContext()

So... I'm at an impasse.

Have you tried a completely fresh new virtual environment?

l1t1

l1t1 commented on Aug 23, 2024

@l1t1
Author

thank you.
Maybe related to rust-lang/rust#121317
maybe new rust version didn't support windows 7 /python 3.8
new polars version also import error in this machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @Michael-J-Ward@l1t1

        Issue actions

          error when `from datafusion import SessionContext` · Issue #830 · apache/datafusion-python