diff --git a/doc/source/development/code_style.rst b/doc/source/development/code_style.rst
index 77c8d56765e5e..7bbfc010fbfb2 100644
--- a/doc/source/development/code_style.rst
+++ b/doc/source/development/code_style.rst
@@ -28,7 +28,7 @@ Testing
Failing tests
--------------
-See https://docs.pytest.org/en/latest/skipping.html for background.
+See https://docs.pytest.org/en/latest/how-to/skipping.html for background.
Do not use ``pytest.xfail``
---------------------------
diff --git a/doc/source/development/contributing_codebase.rst b/doc/source/development/contributing_codebase.rst
index f72145475e5a4..4826921d4866b 100644
--- a/doc/source/development/contributing_codebase.rst
+++ b/doc/source/development/contributing_codebase.rst
@@ -155,7 +155,7 @@ Python (PEP8 / black)
pandas follows the `PEP8 `_ standard
and uses `Black `_ and
-`Flake8 `_ to ensure a consistent code
+`Flake8 `_ to ensure a consistent code
format throughout the project. We encourage you to use :ref:`pre-commit `.
:ref:`Continuous Integration ` will run those tools and
@@ -204,7 +204,7 @@ Import formatting
pandas uses `isort `__ to standardise import
formatting across the codebase.
-A guide to import layout as per pep8 can be found `here `__.
+A guide to import layout as per pep8 can be found `here `__.
A summary of our current import sections ( in order ):
@@ -449,7 +449,7 @@ continuous integration services, once your pull request is submitted.
However, if you wish to run the test suite on a branch prior to submitting the pull request,
then the continuous integration services need to be hooked to your GitHub repository. Instructions are here
for `GitHub Actions `__ and
-`Azure Pipelines `__.
+`Azure Pipelines `__.
A pull-request will be considered for merging when you have an all 'green' build. If any tests are failing,
then you will get a red 'X', where you can click through to see the individual failed tests.
@@ -776,10 +776,10 @@ Running the performance test suite
Performance matters and it is worth considering whether your code has introduced
performance regressions. pandas is in the process of migrating to
-`asv benchmarks `__
+`asv benchmarks `__
to enable easy monitoring of the performance of critical pandas operations.
These benchmarks are all found in the ``pandas/asv_bench`` directory, and the
-test results can be found `here `__.
+test results can be found `here `__.
To use all features of asv, you will need either ``conda`` or
``virtualenv``. For more details please check the `asv installation
@@ -787,7 +787,7 @@ webpage `_.
To install asv::
- pip install git+https://github.com/spacetelescope/asv
+ pip install git+https://github.com/airspeed-velocity/asv
If you need to run a benchmark, change your directory to ``asv_bench/`` and run::
diff --git a/doc/source/development/contributing_environment.rst b/doc/source/development/contributing_environment.rst
index fda4f3ecf6dbf..5f36a2a609c9f 100644
--- a/doc/source/development/contributing_environment.rst
+++ b/doc/source/development/contributing_environment.rst
@@ -82,7 +82,7 @@ You will need `Build Tools for Visual Studio 2019
In the installer, select the "C++ build tools" workload.
You can install the necessary components on the commandline using
-`vs_buildtools.exe `_:
+`vs_buildtools.exe `_:
.. code::
@@ -138,8 +138,8 @@ Creating a Python environment
Now create an isolated pandas development environment:
-* Install either `Anaconda `_, `miniconda
- `_, or `miniforge `_
+* Install either `Anaconda `_, `miniconda
+ `_, or `miniforge `_
* Make sure your conda is up to date (``conda update conda``)
* Make sure that you have :any:`cloned the repository `
* ``cd`` to the pandas source directory
@@ -181,7 +181,7 @@ To return to your root environment::
conda deactivate
-See the full conda docs `here `__.
+See the full conda docs `here `__.
Creating a Python environment (pip)
@@ -238,7 +238,7 @@ Consult the docs for setting up pyenv `here `__.
Below is a brief overview on how to set-up a virtual environment with Powershell
under Windows. For details please refer to the
-`official virtualenv user guide `__
+`official virtualenv user guide `__
Use an ENV_DIR of your choice. We'll use ~\\virtualenvs\\pandas-dev where
'~' is the folder pointed to by either $env:USERPROFILE (Powershell) or
diff --git a/doc/source/development/debugging_extensions.rst b/doc/source/development/debugging_extensions.rst
index 894277d304020..7ba2091e18853 100644
--- a/doc/source/development/debugging_extensions.rst
+++ b/doc/source/development/debugging_extensions.rst
@@ -80,7 +80,7 @@ Once the process launches, simply type ``run`` and the test suite will begin, st
Checking memory leaks with valgrind
===================================
-You can use `Valgrind `_ to check for and log memory leaks in extensions. For instance, to check for a memory leak in a test from the suite you can run:
+You can use `Valgrind `_ to check for and log memory leaks in extensions. For instance, to check for a memory leak in a test from the suite you can run:
.. code-block:: sh
diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst
index a7a10e192a9a7..5347aab2c731a 100644
--- a/doc/source/development/extending.rst
+++ b/doc/source/development/extending.rst
@@ -468,7 +468,7 @@ This would be more or less equivalent to:
The backend module can then use other visualization tools (Bokeh, Altair,...)
to generate the plots.
-Libraries implementing the plotting backend should use `entry points `__
+Libraries implementing the plotting backend should use `entry points `__
to make their backend discoverable to pandas. The key is ``"pandas_plotting_backends"``. For example, pandas
registers the default "matplotlib" backend as follows.
diff --git a/doc/source/development/maintaining.rst b/doc/source/development/maintaining.rst
index a0e9ba53acd00..a8521039c5427 100644
--- a/doc/source/development/maintaining.rst
+++ b/doc/source/development/maintaining.rst
@@ -237,4 +237,4 @@ a milestone before tagging, you can request the bot to backport it with:
.. _governance documents: https://github.com/pandas-dev/pandas-governance
-.. _list of permissions: https://help.github.com/en/github/setting-up-and-managing-organizations-and-teams/repository-permission-levels-for-an-organization
+.. _list of permissions: https://docs.github.com/en/organizations/managing-access-to-your-organizations-repositories/repository-roles-for-an-organization
diff --git a/doc/source/development/roadmap.rst b/doc/source/development/roadmap.rst
index 2689e7e45f3ff..ccdb4f1fafae4 100644
--- a/doc/source/development/roadmap.rst
+++ b/doc/source/development/roadmap.rst
@@ -203,4 +203,4 @@ We improved the pandas documentation
* :ref:`getting_started` contains a number of resources intended for new
pandas users coming from a variety of backgrounds (:issue:`26831`).
-.. _pydata-sphinx-theme: https://github.com/pandas-dev/pydata-sphinx-theme
+.. _pydata-sphinx-theme: https://github.com/pydata/pydata-sphinx-theme
diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst
index 8ef1a358c2e7d..16cae9bbfbf46 100644
--- a/doc/source/ecosystem.rst
+++ b/doc/source/ecosystem.rst
@@ -19,7 +19,7 @@ development to remain focused around it's original requirements.
This is an inexhaustive list of projects that build on pandas in order to provide
tools in the PyData space. For a list of projects that depend on pandas,
see the
-`libraries.io usage page for pandas `_
+`Github network dependents for pandas `_
or `search pypi for pandas `_.
We'd like to make it easier for users to find these projects, if you know of other
@@ -30,8 +30,8 @@ substantial projects that you feel should be on this list, please let us know.
Data cleaning and validation
----------------------------
-`Pyjanitor `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`Pyjanitor `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pyjanitor provides a clean API for cleaning data, using method chaining.
@@ -71,19 +71,19 @@ a long-standing special relationship with pandas. Statsmodels provides powerful
econometrics, analysis and modeling functionality that is out of pandas' scope.
Statsmodels leverages pandas objects as the underlying data container for computation.
-`sklearn-pandas `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`sklearn-pandas `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use pandas DataFrames in your `scikit-learn `__
ML pipeline.
`Featuretools `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Featuretools is a Python library for automated feature engineering built on top of pandas. It excels at transforming temporal and relational datasets into feature matrices for machine learning using reusable feature engineering "primitives". Users can contribute their own primitives in Python and share them with the rest of the community.
`Compose `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compose is a machine learning tool for labeling data and prediction engineering. It allows you to structure the labeling process by parameterizing prediction problems and transforming time-driven relational data into target values with cutoff times that can be used for supervised learning.
@@ -115,8 +115,8 @@ simplicity produces beautiful and effective visualizations with a
minimal amount of code. Altair works with pandas DataFrames.
-`Bokeh `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`Bokeh `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bokeh is a Python interactive visualization library for large datasets that natively uses
the latest web technologies. Its goal is to provide elegant, concise construction of novel
@@ -147,7 +147,7 @@ estimation while plotting, aggregating across observations and visualizing the
fit of statistical models to emphasize patterns in a dataset.
`plotnine `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hadley Wickham's `ggplot2 `__ is a foundational exploratory visualization package for the R language.
Based on `"The Grammar of Graphics" `__ it
@@ -161,10 +161,10 @@ A good implementation for Python users is `has2k1/plotnine `__ leverages `Vega
`__ to create plots within Jupyter Notebook.
-`Plotly `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`Plotly `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-`Plotly’s `__ `Python API `__ enables interactive figures and web shareability. Maps, 2D, 3D, and live-streaming graphs are rendered with WebGL and `D3.js `__. The library supports plotting directly from a pandas DataFrame and cloud-based collaboration. Users of `matplotlib, ggplot for Python, and Seaborn `__ can convert figures into interactive web-based plots. Plots can be drawn in `IPython Notebooks `__ , edited with R or MATLAB, modified in a GUI, or embedded in apps and dashboards. Plotly is free for unlimited sharing, and has `cloud `__, `offline `__, or `on-premise `__ accounts for private use.
+`Plotly’s `__ `Python API `__ enables interactive figures and web shareability. Maps, 2D, 3D, and live-streaming graphs are rendered with WebGL and `D3.js `__. The library supports plotting directly from a pandas DataFrame and cloud-based collaboration. Users of `matplotlib, ggplot for Python, and Seaborn `__ can convert figures into interactive web-based plots. Plots can be drawn in `IPython Notebooks `__ , edited with R or MATLAB, modified in a GUI, or embedded in apps and dashboards. Plotly is free for unlimited sharing, and has `offline `__, or `on-premise `__ accounts for private use.
`Lux `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -179,7 +179,7 @@ A good implementation for Python users is `has2k1/plotnine `__ that highlights interesting trends and patterns in the dataframe. Users can leverage any existing pandas commands without modifying their code, while being able to visualize their pandas data structures (e.g., DataFrame, Series, Index) at the same time. Lux also offers a `powerful, intuitive language `__ that allow users to create `Altair `__, `matplotlib `__, or `Vega-Lite `__ visualizations without having to think at the level of code.
+By printing out a dataframe, Lux automatically `recommends a set of visualizations `__ that highlights interesting trends and patterns in the dataframe. Users can leverage any existing pandas commands without modifying their code, while being able to visualize their pandas data structures (e.g., DataFrame, Series, Index) at the same time. Lux also offers a `powerful, intuitive language `__ that allow users to create `Altair `__, `matplotlib `__, or `Vega-Lite `__ visualizations without having to think at the level of code.
`Qtpandas `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -204,8 +204,7 @@ invoked with the following command
dtale.show(df)
D-Tale integrates seamlessly with Jupyter notebooks, Python terminals, Kaggle
-& Google Colab. Here are some demos of the `grid `__
-and `chart-builder `__.
+& Google Colab. Here are some demos of the `grid `__.
`hvplot `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -220,7 +219,7 @@ It can be loaded as a native pandas plotting backend via
.. _ecosystem.ide:
IDE
-------
+---
`IPython `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -264,7 +263,7 @@ debugging and profiling functionality of a software development tool with the
data exploration, interactive execution, deep inspection and rich visualization
capabilities of a scientific environment like MATLAB or Rstudio.
-Its `Variable Explorer `__
+Its `Variable Explorer `__
allows users to view, manipulate and edit pandas ``Index``, ``Series``,
and ``DataFrame`` objects like a "spreadsheet", including copying and modifying
values, sorting, displaying a "heatmap", converting data types and more.
@@ -274,9 +273,9 @@ Spyder can also import data from a variety of plain text and binary files
or the clipboard into a new pandas DataFrame via a sophisticated import wizard.
Most pandas classes, methods and data attributes can be autocompleted in
-Spyder's `Editor `__ and
-`IPython Console `__,
-and Spyder's `Help pane `__ can retrieve
+Spyder's `Editor `__ and
+`IPython Console `__,
+and Spyder's `Help pane `__ can retrieve
and render Numpydoc documentation on pandas objects in rich text with Sphinx
both automatically and on-demand.
@@ -312,8 +311,8 @@ The following data feeds are available:
* Stooq Index Data
* MOEX Data
-`Quandl/Python `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`Quandl/Python `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Quandl API for Python wraps the Quandl REST API to return
pandas DataFrames with timeseries indexes.
@@ -324,8 +323,8 @@ PyDatastream is a Python interface to the
REST API to return indexed pandas DataFrames with financial data.
This package requires valid credentials for this API (non free).
-`pandaSDMX `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`pandaSDMX `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pandaSDMX is a library to retrieve and acquire statistical data
and metadata disseminated in
`SDMX `_ 2.1, an ISO-standard
@@ -357,8 +356,8 @@ with pandas.
Domain specific
---------------
-`Geopandas `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`Geopandas `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Geopandas extends pandas data objects to include geographic information which support
geometric operations. If your work entails maps and geographical coordinates, and
@@ -398,7 +397,7 @@ any Delta table into Pandas dataframe.
.. _ecosystem.out-of-core:
Out-of-core
--------------
+-----------
`Blaze `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -436,8 +435,8 @@ can selectively scale parts of their pandas DataFrame applications.
print(df3)
-`Dask `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`Dask `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dask is a flexible parallel computing library for analytics. Dask
provides a familiar ``DataFrame`` interface for out-of-core, parallel and distributed computing.
@@ -475,8 +474,8 @@ time-consuming tasks like ingesting data (``read_csv``, ``read_excel``,
df = pd.read_csv("big.csv") # use all your cores!
-`Odo `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`Odo `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Odo provides a uniform API for moving data between different formats. It uses
pandas own ``read_csv`` for CSV IO and leverages many existing packages such as
@@ -500,8 +499,8 @@ If also displays progress bars.
df.parallel_apply(func)
-`Vaex `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`Vaex `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a Python library for Out-of-Core DataFrames (similar to pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).
@@ -575,11 +574,11 @@ Library Accessor Classes Description
.. _pathlib.Path: https://docs.python.org/3/library/pathlib.html
.. _pint-pandas: https://github.com/hgrecco/pint-pandas
.. _composeml: https://github.com/alteryx/compose
-.. _datatest: https://datatest.readthedocs.io/
+.. _datatest: https://datatest.readthedocs.io/en/stable/
.. _woodwork: https://github.com/alteryx/woodwork
Development tools
-----------------------------
+-----------------
`pandas-stubs `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/source/getting_started/comparison/comparison_with_r.rst b/doc/source/getting_started/comparison/comparison_with_r.rst
index 864081002086b..f91f4218c3429 100644
--- a/doc/source/getting_started/comparison/comparison_with_r.rst
+++ b/doc/source/getting_started/comparison/comparison_with_r.rst
@@ -31,7 +31,7 @@ Quick reference
We'll start off with a quick reference guide pairing some common R
operations using `dplyr
-`__ with
+`__ with
pandas equivalents.
@@ -326,8 +326,8 @@ table below shows how these data structures could be mapped in Python.
| data.frame | dataframe |
+------------+-------------------------------+
-|ddply|_
-~~~~~~~~
+ddply
+~~~~~
An expression using a data.frame called ``df`` in R where you want to
summarize ``x`` by ``month``:
@@ -372,8 +372,8 @@ For more details and examples see :ref:`the groupby documentation
reshape / reshape2
------------------
-|meltarray|_
-~~~~~~~~~~~~~
+meltarray
+~~~~~~~~~
An expression using a 3 dimensional array called ``a`` in R where you want to
melt it into a data.frame:
@@ -390,8 +390,8 @@ In Python, since ``a`` is a list, you can simply use list comprehension.
a = np.array(list(range(1, 24)) + [np.NAN]).reshape(2, 3, 4)
pd.DataFrame([tuple(list(x) + [val]) for x, val in np.ndenumerate(a)])
-|meltlist|_
-~~~~~~~~~~~~
+meltlist
+~~~~~~~~
An expression using a list called ``a`` in R where you want to melt it
into a data.frame:
@@ -412,8 +412,8 @@ In Python, this list would be a list of tuples, so
For more details and examples see :ref:`the Into to Data Structures
documentation `.
-|meltdf|_
-~~~~~~~~~~~~~~~~
+meltdf
+~~~~~~
An expression using a data.frame called ``cheese`` in R where you want to
reshape the data.frame:
@@ -447,8 +447,8 @@ In Python, the :meth:`~pandas.melt` method is the R equivalent:
For more details and examples see :ref:`the reshaping documentation
`.
-|cast|_
-~~~~~~~
+cast
+~~~~
In R ``acast`` is an expression using a data.frame called ``df`` in R to cast
into a higher dimensional array:
@@ -577,20 +577,5 @@ For more details and examples see :ref:`categorical introduction `
.. |subset| replace:: ``subset``
.. _subset: https://stat.ethz.ch/R-manual/R-patched/library/base/html/subset.html
-.. |ddply| replace:: ``ddply``
-.. _ddply: https://cran.r-project.org/web/packages/plyr/plyr.pdf#Rfn.ddply.1
-
-.. |meltarray| replace:: ``melt.array``
-.. _meltarray: https://cran.r-project.org/web/packages/reshape2/reshape2.pdf#Rfn.melt.array.1
-
-.. |meltlist| replace:: ``melt.list``
-.. meltlist: https://cran.r-project.org/web/packages/reshape2/reshape2.pdf#Rfn.melt.list.1
-
-.. |meltdf| replace:: ``melt.data.frame``
-.. meltdf: https://cran.r-project.org/web/packages/reshape2/reshape2.pdf#Rfn.melt.data.frame.1
-
-.. |cast| replace:: ``cast``
-.. cast: https://cran.r-project.org/web/packages/reshape2/reshape2.pdf#Rfn.cast.1
-
.. |factor| replace:: ``factor``
.. _factor: https://stat.ethz.ch/R-manual/R-devel/library/base/html/factor.html
diff --git a/doc/source/getting_started/comparison/comparison_with_sas.rst b/doc/source/getting_started/comparison/comparison_with_sas.rst
index c2392af63b6ff..5a624c9c55782 100644
--- a/doc/source/getting_started/comparison/comparison_with_sas.rst
+++ b/doc/source/getting_started/comparison/comparison_with_sas.rst
@@ -96,7 +96,7 @@ Reading external data
Like SAS, pandas provides utilities for reading in data from
many formats. The ``tips`` dataset, found within the pandas
-tests (`csv `_)
+tests (`csv `_)
will be used in many of the following examples.
SAS provides ``PROC IMPORT`` to read csv data into a data set.
@@ -335,7 +335,7 @@ Extracting substring by position
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SAS extracts a substring from a string based on its position with the
-`SUBSTR `__ function.
+`SUBSTR `__ function.
.. code-block:: sas
@@ -538,7 +538,7 @@ This means that the size of data able to be loaded in pandas is limited by your
machine's memory, but also that the operations on that data may be faster.
If out of core processing is needed, one possibility is the
-`dask.dataframe `_
+`dask.dataframe `_
library (currently in development) which
provides a subset of pandas functionality for an on-disk ``DataFrame``
diff --git a/doc/source/getting_started/comparison/comparison_with_spreadsheets.rst b/doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
index e3380db7c821e..a7148405ba8a0 100644
--- a/doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
+++ b/doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
@@ -11,7 +11,7 @@ of how various spreadsheet operations would be performed using pandas. This page
terminology and link to documentation for Excel, but much will be the same/similar in
`Google Sheets `_,
`LibreOffice Calc `_,
-`Apple Numbers `_, and other
+`Apple Numbers `_, and other
Excel-compatible spreadsheet software.
.. include:: includes/introduction.rst
@@ -85,7 +85,7 @@ In a spreadsheet, `values can be typed directly into cells `__
+Both `Excel `__
and :ref:`pandas <10min_tut_02_read_write>` can import data from various sources in various
formats.
diff --git a/doc/source/getting_started/comparison/comparison_with_stata.rst b/doc/source/getting_started/comparison/comparison_with_stata.rst
index 9831f8e29b338..636778a2ca32e 100644
--- a/doc/source/getting_started/comparison/comparison_with_stata.rst
+++ b/doc/source/getting_started/comparison/comparison_with_stata.rst
@@ -92,7 +92,7 @@ Reading external data
Like Stata, pandas provides utilities for reading in data from
many formats. The ``tips`` data set, found within the pandas
-tests (`csv `_)
+tests (`csv `_)
will be used in many of the following examples.
Stata provides ``import delimited`` to read csv data into a data set in memory.
@@ -496,6 +496,6 @@ Disk vs memory
pandas and Stata both operate exclusively in memory. This means that the size of
data able to be loaded in pandas is limited by your machine's memory.
If out of core processing is needed, one possibility is the
-`dask.dataframe `_
+`dask.dataframe `_
library, which provides a subset of pandas functionality for an
on-disk ``DataFrame``.
diff --git a/doc/source/getting_started/install.rst b/doc/source/getting_started/install.rst
index 05c47d5cdf4f7..1cc74eeddbddb 100644
--- a/doc/source/getting_started/install.rst
+++ b/doc/source/getting_started/install.rst
@@ -12,7 +12,7 @@ cross platform distribution for data analysis and scientific computing.
This is the recommended installation method for most users.
Instructions for installing from source,
-`PyPI `__, `ActivePython `__, various Linux distributions, or a
+`PyPI `__, `ActivePython `__, various Linux distributions, or a
`development version `__ are also provided.
.. _install.version:
@@ -70,18 +70,18 @@ and involves downloading the installer which is a few hundred megabytes in size.
If you want to have more control on which packages, or have a limited internet
bandwidth, then installing pandas with
-`Miniconda `__ may be a better solution.
+`Miniconda `__ may be a better solution.
-`Conda `__ is the package manager that the
+`Conda `__ is the package manager that the
`Anaconda `__ distribution is built upon.
It is a package manager that is both cross-platform and language agnostic
(it can play a similar role to a pip and virtualenv combination).
`Miniconda `__ allows you to create a
minimal self contained Python installation, and then use the
-`Conda `__ command to install additional packages.
+`Conda `__ command to install additional packages.
-First you will need `Conda `__ to be installed and
+First you will need `Conda `__ to be installed and
downloading and running the `Miniconda
`__
will do this for you. The installer
@@ -143,8 +143,8 @@ Installing with ActivePython
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Installation instructions for
-`ActivePython `__ can be found
-`here `__. Versions
+`ActivePython `__ can be found
+`here `__. Versions
2.7, 3.5 and 3.6 include pandas.
Installing using your Linux distribution's package manager.
@@ -158,10 +158,10 @@ The commands in this table will install pandas for Python 3 from your distributi
Debian, stable, `official Debian repository `__ , ``sudo apt-get install python3-pandas``
- Debian & Ubuntu, unstable (latest packages), `NeuroDebian `__ , ``sudo apt-get install python3-pandas``
+ Debian & Ubuntu, unstable (latest packages), `NeuroDebian `__ , ``sudo apt-get install python3-pandas``
Ubuntu, stable, `official Ubuntu repository `__ , ``sudo apt-get install python3-pandas``
OpenSuse, stable, `OpenSuse Repository `__ , ``zypper in python3-pandas``
- Fedora, stable, `official Fedora repository `__ , ``dnf install python3-pandas``
+ Fedora, stable, `official Fedora repository `__ , ``dnf install python3-pandas``
Centos/RHEL, stable, `EPEL repository `__ , ``yum install python3-pandas``
**However**, the packages in the linux package managers are often a few versions behind, so
@@ -199,7 +199,7 @@ the code base as of this writing. To run it on your machine to verify that
everything is working (and that you have all of the dependencies, soft and hard,
installed), make sure you have `pytest
`__ >= 6.0 and `Hypothesis
-`__ >= 3.58, then run:
+`__ >= 3.58, then run:
::
diff --git a/doc/source/getting_started/overview.rst b/doc/source/getting_started/overview.rst
index 306eb28d23fe7..320d2da01418c 100644
--- a/doc/source/getting_started/overview.rst
+++ b/doc/source/getting_started/overview.rst
@@ -75,7 +75,7 @@ Some other notes
specialized tool.
- pandas is a dependency of `statsmodels
- `__, making it an important part of the
+ `__, making it an important part of the
statistical computing ecosystem in Python.
- pandas has been used extensively in production in financial applications.
diff --git a/doc/source/getting_started/tutorials.rst b/doc/source/getting_started/tutorials.rst
index a349251bdfca6..a4c555ac227e6 100644
--- a/doc/source/getting_started/tutorials.rst
+++ b/doc/source/getting_started/tutorials.rst
@@ -90,11 +90,11 @@ Video tutorials
* `Data analysis in Python with pandas `_
(2016-2018)
`GitHub repo `__ and
- `Jupyter Notebook `__
+ `Jupyter Notebook `__
* `Best practices with pandas `_
(2018)
`GitHub repo `__ and
- `Jupyter Notebook `__
+ `Jupyter Notebook `__
Various tutorials
diff --git a/doc/source/user_guide/basics.rst b/doc/source/user_guide/basics.rst
index 40ff1049e5820..a34d4891b9d77 100644
--- a/doc/source/user_guide/basics.rst
+++ b/doc/source/user_guide/basics.rst
@@ -848,8 +848,8 @@ have introduced the popular ``(%>%)`` (read pipe) operator for R_.
The implementation of ``pipe`` here is quite clean and feels right at home in Python.
We encourage you to view the source code of :meth:`~DataFrame.pipe`.
-.. _dplyr: https://github.com/hadley/dplyr
-.. _magrittr: https://github.com/smbache/magrittr
+.. _dplyr: https://github.com/tidyverse/dplyr
+.. _magrittr: https://github.com/tidyverse/magrittr
.. _R: https://www.r-project.org
diff --git a/doc/source/user_guide/cookbook.rst b/doc/source/user_guide/cookbook.rst
index 7dcc2576e69c2..f88f4a9708c45 100644
--- a/doc/source/user_guide/cookbook.rst
+++ b/doc/source/user_guide/cookbook.rst
@@ -228,7 +228,7 @@ Ambiguity arises when an index consists of integers with a non-zero start or non
df2.loc[1:3] # Label-oriented
`Using inverse operator (~) to take the complement of a mask
-`__
+`__
.. ipython:: python
@@ -258,7 +258,7 @@ New columns
df
`Keep other columns when using min() with groupby
-`__
+`__
.. ipython:: python
@@ -388,7 +388,7 @@ Sorting
*******
`Sort by specific column or an ordered list of columns, with a MultiIndex
-`__
+`__
.. ipython:: python
@@ -403,7 +403,7 @@ Levels
`__
`Flatten Hierarchical columns
-`__
+`__
.. _cookbook.missing_data:
@@ -554,7 +554,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
ts
`Create a value counts column and reassign back to the DataFrame
-`__
+`__
.. ipython:: python
@@ -661,7 +661,7 @@ Pivot
The :ref:`Pivot ` docs.
`Partial sums and subtotals
-`__
+`__
.. ipython:: python
@@ -868,7 +868,7 @@ Timeseries
`__
`Constructing a datetime range that excludes weekends and includes only certain times
-`__
+`__
`Vectorized Lookup
`__
@@ -1034,7 +1034,7 @@ Data in/out
-----------
`Performance comparison of SQL vs HDF5
-`__
+`__
.. _cookbook.csv:
@@ -1068,12 +1068,6 @@ using that handle to read.
Dealing with bad lines :issue:`2886`
-`Dealing with bad lines II
-`__
-
-`Reading CSV with Unix timestamps and converting to local timezone
-`__
-
`Write a multi-row index CSV without writing duplicates
`__
@@ -1246,7 +1240,7 @@ csv file and creating a store by chunks, with date parsing as well.
`__
`Large Data work flows
-`__
+`__
`Reading in a sequence of files, then providing a global unique index to a store while appending
`__
@@ -1377,7 +1371,7 @@ Computation
-----------
`Numerical integration (sample-based) of a time series
-`__
+`__
Correlation
***********
diff --git a/doc/source/user_guide/enhancingperf.rst b/doc/source/user_guide/enhancingperf.rst
index c78d972f33d65..eef41eb4be80e 100644
--- a/doc/source/user_guide/enhancingperf.rst
+++ b/doc/source/user_guide/enhancingperf.rst
@@ -35,7 +35,7 @@ by trying to remove for-loops and making use of NumPy vectorization. It's always
optimising in Python first.
This tutorial walks through a "typical" process of cythonizing a slow computation.
-We use an `example from the Cython documentation `__
+We use an `example from the Cython documentation `__
but in the context of pandas. Our final cythonized solution is around 100 times
faster than the pure Python solution.
diff --git a/doc/source/user_guide/gotchas.rst b/doc/source/user_guide/gotchas.rst
index 1de978b195382..bf764316df373 100644
--- a/doc/source/user_guide/gotchas.rst
+++ b/doc/source/user_guide/gotchas.rst
@@ -341,7 +341,7 @@ Why not make NumPy like R?
Many people have suggested that NumPy should simply emulate the ``NA`` support
present in the more domain-specific statistical programming language `R
-`__. Part of the reason is the NumPy type hierarchy:
+`__. Part of the reason is the NumPy type hierarchy:
.. csv-table::
:header: "Typeclass","Dtypes"
diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst
index f3be3277003ee..be761bb97f320 100644
--- a/doc/source/user_guide/io.rst
+++ b/doc/source/user_guide/io.rst
@@ -26,7 +26,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
text;`XML `__;:ref:`read_xml`;:ref:`to_xml`
text; Local clipboard;:ref:`read_clipboard`;:ref:`to_clipboard`
binary;`MS Excel `__;:ref:`read_excel`;:ref:`to_excel`
- binary;`OpenDocument `__;:ref:`read_excel`;
+ binary;`OpenDocument `__;:ref:`read_excel`;
binary;`HDF5 Format `__;:ref:`read_hdf`;:ref:`to_hdf`
binary;`Feather Format `__;:ref:`read_feather`;:ref:`to_feather`
binary;`Parquet Format `__;:ref:`read_parquet`;:ref:`to_parquet`
@@ -2622,7 +2622,7 @@ You can even pass in an instance of ``StringIO`` if you so desire:
that having so many network-accessing functions slows down the documentation
build. If you spot an error or an example that doesn't run, please do not
hesitate to report it over on `pandas GitHub issues page
- `__.
+ `__.
Read a URL and match a table that contains specific text:
@@ -4992,7 +4992,7 @@ control compression: ``complevel`` and ``complib``.
rates but is somewhat slow.
- `lzo `_: Fast
compression and decompression.
- - `bzip2 `_: Good compression rates.
+ - `bzip2 `_: Good compression rates.
- `blosc `_: Fast compression and
decompression.
@@ -5001,10 +5001,10 @@ control compression: ``complevel`` and ``complib``.
- `blosc:blosclz `_ This is the
default compressor for ``blosc``
- `blosc:lz4
- `_:
+ `_:
A compact, very popular and fast compressor.
- `blosc:lz4hc
- `_:
+ `_:
A tweaked version of LZ4, produces better
compression ratios at the expense of speed.
- `blosc:snappy `_:
@@ -5588,7 +5588,7 @@ SQL queries
The :mod:`pandas.io.sql` module provides a collection of query wrappers to both
facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction
is provided by SQLAlchemy if installed. In addition you will need a driver library for
-your database. Examples of such drivers are `psycopg2 `__
+your database. Examples of such drivers are `psycopg2 `__
for PostgreSQL or `pymysql `__ for MySQL.
For `SQLite `__ this is
included in Python's standard library by default.
@@ -5620,7 +5620,7 @@ The key functions are:
the provided input (database table name or sql query).
Table names do not need to be quoted if they have special characters.
-In the following example, we use the `SQlite `__ SQL database
+In the following example, we use the `SQlite `__ SQL database
engine. You can use a temporary SQLite database where data are stored in
"memory".
@@ -5784,7 +5784,7 @@ Possible values are:
specific backend dialect features.
Example of a callable using PostgreSQL `COPY clause
-`__::
+`__::
# Alternative to_sql() *method* for DBs that support COPY FROM
import csv
@@ -6046,7 +6046,7 @@ pandas integrates with this external package. if ``pandas-gbq`` is installed, yo
use the pandas methods ``pd.read_gbq`` and ``DataFrame.to_gbq``, which will call the
respective functions from ``pandas-gbq``.
-Full documentation can be found `here `__.
+Full documentation can be found `here `__.
.. _io.stata:
@@ -6254,7 +6254,7 @@ Obtain an iterator and read an XPORT file 100,000 lines at a time:
The specification_ for the xport file format is available from the SAS
web site.
-.. _specification: https://support.sas.com/techsup/technote/ts140.pdf
+.. _specification: https://support.sas.com/content/dam/SAS/support/en/technical-papers/record-layout-of-a-sas-version-5-or-6-data-set-in-sas-transport-xport-format.pdf
No official documentation is available for the SAS7BDAT format.
@@ -6296,7 +6296,7 @@ avoid converting categorical columns into ``pd.Categorical``:
More information about the SAV and ZSAV file formats is available here_.
-.. _here: https://www.ibm.com/support/knowledgecenter/en/SSLVMB_22.0.0/com.ibm.spss.statistics.help/spss/base/savedatatypes.htm
+.. _here: https://www.ibm.com/docs/en/spss-statistics/22.0.0
.. _io.other:
@@ -6314,7 +6314,7 @@ xarray_ provides data structures inspired by the pandas ``DataFrame`` for workin
with multi-dimensional datasets, with a focus on the netCDF file format and
easy conversion to and from pandas.
-.. _xarray: https://xarray.pydata.org/
+.. _xarray: https://xarray.pydata.org/en/stable/
.. _io.perf:
diff --git a/doc/source/user_guide/missing_data.rst b/doc/source/user_guide/missing_data.rst
index 1621b37f31b23..3052ee3001681 100644
--- a/doc/source/user_guide/missing_data.rst
+++ b/doc/source/user_guide/missing_data.rst
@@ -470,7 +470,7 @@ at the new values.
interp_s = ser.reindex(new_index).interpolate(method="pchip")
interp_s[49:51]
-.. _scipy: https://www.scipy.org
+.. _scipy: https://scipy.org/
.. _documentation: https://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation
.. _guide: https://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
@@ -580,7 +580,7 @@ String/regular expression replacement
backslashes than strings without this prefix. Backslashes in raw strings
will be interpreted as an escaped backslash, e.g., ``r'\' == '\\'``. You
should `read about them
- `__
+ `__
if this is unclear.
Replace the '.' with ``NaN`` (str -> str):
diff --git a/doc/source/user_guide/style.ipynb b/doc/source/user_guide/style.ipynb
index 1c7b710553dec..2dc40e67338b4 100644
--- a/doc/source/user_guide/style.ipynb
+++ b/doc/source/user_guide/style.ipynb
@@ -1196,7 +1196,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "You can create \"heatmaps\" with the `background_gradient` and `text_gradient` methods. These require matplotlib, and we'll use [Seaborn](https://stanford.edu/~mwaskom/software/seaborn/) to get a nice colormap."
+ "You can create \"heatmaps\" with the `background_gradient` and `text_gradient` methods. These require matplotlib, and we'll use [Seaborn](http://seaborn.pydata.org/) to get a nice colormap."
]
},
{
diff --git a/doc/source/user_guide/visualization.rst b/doc/source/user_guide/visualization.rst
index de5058466693e..404914dbc7a69 100644
--- a/doc/source/user_guide/visualization.rst
+++ b/doc/source/user_guide/visualization.rst
@@ -272,7 +272,7 @@ horizontal and cumulative histograms can be drawn by
plt.close("all")
See the :meth:`hist ` method and the
-`matplotlib hist documentation `__ for more.
+`matplotlib hist documentation `__ for more.
The existing interface ``DataFrame.hist`` to plot histogram still can be used.
@@ -410,7 +410,7 @@ For example, horizontal and custom-positioned boxplot can be drawn by
See the :meth:`boxplot ` method and the
-`matplotlib boxplot documentation `__ for more.
+`matplotlib boxplot documentation `__ for more.
The existing interface ``DataFrame.boxplot`` to plot boxplot still can be used.
@@ -674,7 +674,7 @@ bubble chart using a column of the ``DataFrame`` as the bubble size.
plt.close("all")
See the :meth:`scatter ` method and the
-`matplotlib scatter documentation `__ for more.
+`matplotlib scatter documentation `__ for more.
.. _visualization.hexbin:
@@ -734,7 +734,7 @@ given by column ``z``. The bins are aggregated with NumPy's ``max`` function.
plt.close("all")
See the :meth:`hexbin ` method and the
-`matplotlib hexbin documentation `__ for more.
+`matplotlib hexbin documentation `__ for more.
.. _visualization.pie:
@@ -839,7 +839,7 @@ If you pass values whose sum total is less than 1.0, matplotlib draws a semicirc
@savefig series_pie_plot_semi.png
series.plot.pie(figsize=(6, 6));
-See the `matplotlib pie documentation `__ for more.
+See the `matplotlib pie documentation `__ for more.
.. ipython:: python
:suppress:
@@ -956,7 +956,7 @@ for more information. By coloring these curves differently for each class
it is possible to visualize data clustering. Curves belonging to samples
of the same class will usually be closer together and form larger structures.
-**Note**: The "Iris" dataset is available `here `__.
+**Note**: The "Iris" dataset is available `here `__.
.. ipython:: python
@@ -1113,10 +1113,10 @@ unit interval). The point in the plane, where our sample settles to (where the
forces acting on our sample are at an equilibrium) is where a dot representing
our sample will be drawn. Depending on which class that sample belongs it will
be colored differently.
-See the R package `Radviz `__
+See the R package `Radviz `__
for more information.
-**Note**: The "Iris" dataset is available `here `__.
+**Note**: The "Iris" dataset is available `here `__.
.. ipython:: python
@@ -1384,7 +1384,7 @@ tick locator methods, it is useful to call the automatic
date tick adjustment from matplotlib for figures whose ticklabels overlap.
See the :meth:`autofmt_xdate ` method and the
-`matplotlib documentation `__ for more.
+`matplotlib documentation `__ for more.
Subplots
~~~~~~~~
@@ -1620,7 +1620,7 @@ as seen in the example below.
There also exists a helper function ``pandas.plotting.table``, which creates a
table from :class:`DataFrame` or :class:`Series`, and adds it to an
``matplotlib.Axes`` instance. This function can accept keywords which the
-matplotlib `table `__ has.
+matplotlib `table `__ has.
.. ipython:: python
diff --git a/doc/source/user_guide/window.rst b/doc/source/user_guide/window.rst
index dea3e8f3089e2..d1244f62cc1e4 100644
--- a/doc/source/user_guide/window.rst
+++ b/doc/source/user_guide/window.rst
@@ -287,7 +287,7 @@ and we want to use an expanding window where ``use_expanding`` is ``True`` other
3 3.0
4 10.0
-You can view other examples of ``BaseIndexer`` subclasses `here `__
+You can view other examples of ``BaseIndexer`` subclasses `here `__
.. versionadded:: 1.1
diff --git a/doc/source/whatsnew/v0.16.2.rst b/doc/source/whatsnew/v0.16.2.rst
index 40d764e880c9c..c6c134a383e11 100644
--- a/doc/source/whatsnew/v0.16.2.rst
+++ b/doc/source/whatsnew/v0.16.2.rst
@@ -83,7 +83,7 @@ popular ``(%>%)`` pipe operator for R_.
See the :ref:`documentation ` for more. (:issue:`10129`)
-.. _dplyr: https://github.com/hadley/dplyr
+.. _dplyr: https://github.com/tidyverse/dplyr
.. _magrittr: https://github.com/smbache/magrittr
.. _R: http://www.r-project.org
diff --git a/doc/source/whatsnew/v0.20.0.rst b/doc/source/whatsnew/v0.20.0.rst
index cdd10014e71f0..faf4b1ac44d5b 100644
--- a/doc/source/whatsnew/v0.20.0.rst
+++ b/doc/source/whatsnew/v0.20.0.rst
@@ -328,7 +328,7 @@ more information about the data.
You must enable this by setting the ``display.html.table_schema`` option to ``True``.
.. _Table Schema: http://specs.frictionlessdata.io/json-table-schema/
-.. _nteract: http://nteract.io/
+.. _nteract: https://nteract.io/
.. _whatsnew_0200.enhancements.scipy_sparse:
diff --git a/doc/source/whatsnew/v1.1.0.rst b/doc/source/whatsnew/v1.1.0.rst
index 9f3ccb3e14116..ebd76d97e78b3 100644
--- a/doc/source/whatsnew/v1.1.0.rst
+++ b/doc/source/whatsnew/v1.1.0.rst
@@ -265,7 +265,7 @@ SSH, FTP, dropbox and github. For docs and capabilities, see the `fsspec docs`_.
The existing capability to interface with S3 and GCS will be unaffected by this
change, as ``fsspec`` will still bring in the same packages as before.
-.. _Azure Datalake and Blob: https://github.com/dask/adlfs
+.. _Azure Datalake and Blob: https://github.com/fsspec/adlfs
.. _fsspec docs: https://filesystem-spec.readthedocs.io/en/latest/
diff --git a/doc/source/whatsnew/v1.4.0.rst b/doc/source/whatsnew/v1.4.0.rst
index bb93ce1a12b2a..1ca4e8cc97df0 100644
--- a/doc/source/whatsnew/v1.4.0.rst
+++ b/doc/source/whatsnew/v1.4.0.rst
@@ -972,7 +972,7 @@ Groupby/resample/rolling
- Bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` when using a :class:`pandas.api.indexers.BaseIndexer` subclass that returned unequal start and end arrays would segfault instead of raising a ``ValueError`` (:issue:`44470`)
- Bug in :meth:`Groupby.nunique` not respecting ``observed=True`` for Categorical grouping columns (:issue:`45128`)
- Bug in :meth:`GroupBy.head` and :meth:`GroupBy.tail` not dropping groups with ``NaN`` when ``dropna=True`` (:issue:`45089`)
-- Fixed bug in :meth:`GroupBy.__iter__` after selecting a subset of columns in a :class:`GroupBy` object, which returned all columns instead of the chosen subset (:issue:`#44821`)
+- Fixed bug in :meth:`GroupBy.__iter__` after selecting a subset of columns in a :class:`GroupBy` object, which returned all columns instead of the chosen subset (:issue:`44821`)
- Bug in :meth:`Groupby.rolling` when non-monotonic data passed, fails to correctly raise ``ValueError`` (:issue:`43909`)
- Fixed bug where grouping by a :class:`Series` that has a categorical data type and length unequal to the axis of grouping raised ``ValueError`` (:issue:`44179`)
diff --git a/pandas/io/sas/sas_xport.py b/pandas/io/sas/sas_xport.py
index d8a3412e05d05..eefb619b0fd9f 100644
--- a/pandas/io/sas/sas_xport.py
+++ b/pandas/io/sas/sas_xport.py
@@ -5,7 +5,7 @@
The file format is defined here:
-https://support.sas.com/techsup/technote/ts140.pdf
+https://support.sas.com/content/dam/SAS/support/en/technical-papers/record-layout-of-a-sas-version-5-or-6-data-set-in-sas-transport-xport-format.pdf
"""
from __future__ import annotations