Skip to content

[SPARK-53182][PYTHON][DOCS] Fix broken and missing links in PySpark DataFrames user guide #51851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jonnycomes
Copy link

@jonnycomes jonnycomes commented Aug 5, 2025

What changes were proposed in this pull request?

This PR fixes two small issues in the PySpark DataFrames user guide:

  1. Replaces a broken external link to a section on data manipulation. The previous link pointed to an outdated Databricks-hosted page. It is now replaced with a working internal link to Chapter 3 of the PySpark user guide:
    Chapter 3: Function Junction - Data manipulation with PySpark

  2. Adds a missing link for the section on saving DataFrames to persistent storage. The text previously said “TODO: add link.” This has been replaced with a correct reference to:
    Chapter 7: Load and Behold - Data loading, storage, file formats

Why are the changes needed?

These changes improve the quality and usability of the documentation by fixing a broken link and completing a placeholder that may confuse users. It ensures readers are directed to up-to-date, relevant internal documentation instead of an outdated or unavailable external resource.

Does this PR introduce any user-facing change?

Yes. It updates two markdown cells in the dataframes.ipynb user guide notebook, affecting how users navigate to related documentation when reading the generated HTML docs.

How was this patch tested?

The documentation was built locally using make html in the python/docs directory. The rendered output for the notebook was reviewed in a browser to confirm the links appear and function correctly.

Screenshot 2025-08-07 at 11 53 19 AM Screenshot 2025-08-07 at 11 52 54 AM

Was this patch authored or co-authored using generative AI tooling?

No.

- Updated outdated external link to data manipulation section (now Chapter 3) with a working internal link to the official PySpark documentation.
- Replaced "TODO: add link" with correct reference to Chapter 7 on data loading.
@xinrong-meng
Copy link
Member

Would you create a JIRA ticket and link to the PR title? Please see https://spark.apache.org/contributing.html
Also it would be great if a screenshot of rendered output can be attached to the pr description.

@jonnycomes jonnycomes changed the title Fix broken and missing links in PySpark DataFrames user guide [SPARK-53182] Fix broken and missing links in PySpark DataFrames user guide Aug 7, 2025
@jonnycomes
Copy link
Author

Thanks! I've created the JIRA ticket, updated the PR title accordingly, and added screenshots of the rendered output to the description.

This is my first time contributing—thanks for your guidance!

@HyukjinKwon HyukjinKwon changed the title [SPARK-53182] Fix broken and missing links in PySpark DataFrames user guide [SPARK-53182][PYTHON] Fix broken and missing links in PySpark DataFrames user guide Aug 12, 2025
@HyukjinKwon
Copy link
Member

cc @asl3 FYI

@HyukjinKwon HyukjinKwon changed the title [SPARK-53182][PYTHON] Fix broken and missing links in PySpark DataFrames user guide [SPARK-53182][PYTHON][DOCS] Fix broken and missing links in PySpark DataFrames user guide Aug 12, 2025
Copy link
Contributor

@asl3 asl3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants