Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4735,6 +4735,7 @@ Write to a feather file.
Read from a feather file.

.. ipython:: python
:okwarning:

result = pd.read_feather("example.feather")
result
Expand Down Expand Up @@ -4818,6 +4819,7 @@ Write to a parquet file.
Read from a parquet file.

.. ipython:: python
:okwarning:

result = pd.read_parquet("example_fp.parquet", engine="fastparquet")
result = pd.read_parquet("example_pa.parquet", engine="pyarrow")
Expand All @@ -4827,6 +4829,7 @@ Read from a parquet file.
Read only certain columns of a parquet file.

.. ipython:: python
:okwarning:

result = pd.read_parquet(
"example_fp.parquet",
Expand Down Expand Up @@ -4895,6 +4898,7 @@ Partitioning Parquet files
Parquet supports partitioning of data based on the values of one or more columns.

.. ipython:: python
:okwarning:

df = pd.DataFrame({"a": [0, 0, 1, 1], "b": [0, 1, 0, 1]})
df.to_parquet(path="test", engine="pyarrow", partition_cols=["a"], compression=None)
Expand Down
7 changes: 7 additions & 0 deletions doc/source/user_guide/scale.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ To load the columns we want, we have two options.
Option 1 loads in all the data and then filters to what we need.

.. ipython:: python
:okwarning:

columns = ["id_0", "name_0", "x_0", "y_0"]

Expand All @@ -79,6 +80,7 @@ Option 1 loads in all the data and then filters to what we need.
Option 2 only loads the columns we request.

.. ipython:: python
:okwarning:

pd.read_parquet("timeseries_wide.parquet", columns=columns)

Expand All @@ -98,6 +100,7 @@ referred to as "low-cardinality" data). By using more efficient data types, you
can store larger datasets in memory.

.. ipython:: python
:okwarning:

ts = pd.read_parquet("timeseries.parquet")
ts
Expand Down Expand Up @@ -206,6 +209,7 @@ counts up to this point. As long as each individual file fits in memory, this wi
work for arbitrary-sized datasets.

.. ipython:: python
:okwarning:

%%time
files = pathlib.Path("data/timeseries/").glob("ts*.parquet")
Expand Down Expand Up @@ -289,6 +293,7 @@ returns a Dask Series with the same dtype and the same name.
To get the actual result you can call ``.compute()``.

.. ipython:: python
:okwarning:

%time ddf["name"].value_counts().compute()

Expand Down Expand Up @@ -322,6 +327,7 @@ Dask implements the most used parts of the pandas API. For example, we can do
a familiar groupby aggregation.

.. ipython:: python
:okwarning:

%time ddf.groupby("name")[["x", "y"]].mean().compute().head()

Expand All @@ -345,6 +351,7 @@ we need to supply the divisions manually.
Now we can do things like fast random access with ``.loc``.

.. ipython:: python
:okwarning:

ddf.loc["2002-01-01 12:01":"2002-01-01 12:05"].compute()

Expand Down
47 changes: 18 additions & 29 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,16 +134,20 @@ def __init__(self, values, placement, ndim: int):
1 for SingleBlockManager/Series, 2 for BlockManager/DataFrame
"""
# TODO(EA2D): ndim will be unnecessary with 2D EAs
self.ndim = self._check_ndim(values, ndim)
self.mgr_locs = placement
self.values = self._maybe_coerce_values(values)
self.ndim = self._check_ndim(values, ndim)

if self._validate_ndim and self.ndim and len(self.mgr_locs) != len(self.values):
raise ValueError(
f"Wrong number of items passed {len(self.values)}, "
f"placement implies {len(self.mgr_locs)}"
)

if self.is_extension and self.ndim == 2 and len(self.mgr_locs) != 1:
# TODO(EA2D): check unnecessary with 2D EAs
raise AssertionError("block.size != values.size")

def _maybe_coerce_values(self, values):
"""
Ensure we have correctly-typed values.
Expand Down Expand Up @@ -180,7 +184,19 @@ def _check_ndim(self, values, ndim):
ValueError : the number of dimensions do not match
"""
if ndim is None:
ndim = values.ndim
warnings.warn(
"Accepting ndim=None in the Block constructor is deprecated, "
"this will raise in a future version.",
FutureWarning,
stacklevel=3,
)
if self.is_extension:
if len(self.mgr_locs) != 1:
ndim = 1
else:
ndim = 2
else:
ndim = values.ndim

if self._validate_ndim and values.ndim != ndim:
raise ValueError(
Expand Down Expand Up @@ -1667,33 +1683,6 @@ class ExtensionBlock(Block):

values: ExtensionArray

def __init__(self, values, placement, ndim: int):
"""
Initialize a non-consolidatable block.

'ndim' may be inferred from 'placement'.

This will call continue to call __init__ for the other base
classes mixed in with this Mixin.
"""

# Placement must be converted to BlockPlacement so that we can check
# its length
if not isinstance(placement, libinternals.BlockPlacement):
placement = libinternals.BlockPlacement(placement)

# Maybe infer ndim from placement
if ndim is None:
if len(placement) != 1:
ndim = 1
else:
ndim = 2
super().__init__(values, placement, ndim=ndim)

if self.ndim == 2 and len(self.mgr_locs) != 1:
# TODO(EA2D): check unnecessary with 2D EAs
raise AssertionError("block.size != values.size")

@property
def shape(self):
# TODO(EA2D): override unnecessary with 2D EAs
Expand Down