Skip to content

Conversation

bjornjorgensen
Copy link
Contributor

@bjornjorgensen bjornjorgensen commented Jun 23, 2023

Describe your changes

This will install pandas version 1.5.3 for all spark images.
Spark have a module called pandas API on spark, this module uses pandas for many tasks.
Apache spark have decided not to upgrade pandas API on spark to version 2.0 before Apache spark 4.0
https://lists.apache.org/thread/jbxrpf81flr6rk1poj9mqjyl8p34ro6w

Issue ticket if applicable

Fix: #1924

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

@bjornjorgensen
Copy link
Contributor Author

@mathbunnyru do you understand this? :)

Copy link
Member

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few suggestions, please, take a look.

@bjornjorgensen
Copy link
Contributor Author

I think that we will see Apache Spark 4.0.0 in jan. or feb. 2024 and then we can remove this.

@mathbunnyru
Copy link
Member

I think that we will see Apache Spark 4.0.0 in jan. or feb. 2024 and then we can remove this.

Great, thanks.

@bjornjorgensen
Copy link
Contributor Author

The error is for cache?

@mathbunnyru
Copy link
Member

The error is for cache?

No, we need to add a new exception here: https://github.com/jupyter/docker-stacks/blob/main/tests/base-notebook/test_packages.py#L67

@bjornjorgensen
Copy link
Contributor Author

hmm.. ok, so it just to add pandas to that list?

@mathbunnyru mathbunnyru reopened this Jun 27, 2023
@mathbunnyru mathbunnyru merged commit df5d516 into jupyter:main Jun 27, 2023
@mathbunnyru
Copy link
Member

mathbunnyru commented Jun 27, 2023

@bjornjorgensen thank you!
I have finally merged this :)
The images should update in approximately 2 hours.

kentwait pushed a commit to kentwait/docker-stacks that referenced this pull request Aug 3, 2023
* 1.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add note

* typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove test

* >=1.5.3 and <2.0.0

* update test

* Update pyspark-notebook/Dockerfile

Co-authored-by: Ayaz Salikhov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update pyspark-notebook/Dockerfile

Co-authored-by: Ayaz Salikhov <[email protected]>

* move test to file

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pandas to EXCLUDED_PACKAGES

* add 1.5.3,<2.0.0 and sort list

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add '

* "pandas[version='>"

* Rename test_pandas_version.py to unit_pandas_version.py

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ayaz Salikhov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] - Pin pandas version for spark images
2 participants