Skip to content

Make special exception in referenceFS #1120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 52 additions & 24 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,56 +8,47 @@ on:

jobs:
linux:
name: ${{ matrix.TOXENV }}-pytest
name: ${{ matrix.PY }}-pytest
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
TOXENV: [py38, py39, py310, s3fs, gcsfs]
PY: ["3.8", "3.9", "3.10"]

env:
TOXENV: ${{ matrix.TOXENV }}
CIRUN: true

steps:
- name: Checkout
uses: actions/checkout@v2

- name: Setup Miniconda
uses: conda-incubator/setup-miniconda@v2
- name: Setup conda
uses: mamba-org/provision-with-micromamba@main
with:
auto-update-conda: true
auto-activate-base: false
activate-environment: test_env
environment-file: ci/environment-py38.yml
extra-specs: python=${{ matrix.PY }}

- name: Run Tests
shell: bash -l {0}
run: |
tox -v
pytest -v

win:
name: ${{ matrix.TOXENV }}-pytest-win
name: pytest-win
runs-on: windows-2019
strategy:
fail-fast: false
matrix:
TOXENV: [py39]

env:
TOXENV: ${{ matrix.TOXENV }}
CIRUN: true

steps:
- name: Checkout
uses: actions/checkout@v2

- name: Setup Miniconda
uses: conda-incubator/setup-miniconda@v2
- name: Setup conda
uses: mamba-org/provision-with-micromamba@main
with:
auto-update-conda: true
auto-activate-base: false
activate-environment: test_env
environment-file: ci/environment-win.yml

- name: Run Tests
Expand All @@ -81,12 +72,9 @@ jobs:
- name: Checkout
uses: actions/checkout@v2

- name: Setup Miniconda
uses: conda-incubator/setup-miniconda@v2
- name: Setup conda
uses: mamba-org/provision-with-micromamba@main
with:
auto-update-conda: true
auto-activate-base: false
activate-environment: test_env
environment-file: ci/environment-downstream.yml

- name: Local install
Expand All @@ -97,10 +85,13 @@ jobs:
git tag -a 3000 -m "fake"
pip install -e .

- name: Clone s3fs
shell: bash -l {0}
run: git clone https://github.com/fsspec/s3fs

- name: Install s3fs
shell: bash -l {0}
run: |
git clone https://github.com/fsspec/s3fs
pip install -e ./s3fs --no-deps

- name: Run fsspec tests
Expand All @@ -117,3 +108,40 @@ jobs:
shell: bash -l {0}
run: |
pytest -v dask/dask/bytes

fsspec_friends:
name: ${{ matrix.FRIEND }}-pytest
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
FRIEND: [gcsfs, s3fs]

env:
CIRUN: true
BOTO_CONFIG: /dev/null
AWS_ACCESS_KEY_ID: foobar_key
AWS_SECRET_ACCESS_KEY: foobar_secret

steps:
- name: Checkout
uses: actions/checkout@v2

- name: Setup conda
uses: mamba-org/provision-with-micromamba@main
with:
environment-file: ci/environment-friends.yml

- name: Clone
shell: bash -l {0}
run: git clone https://github.com/fsspec/${{ matrix.FRIEND }}

- name: Install
shell: bash -l {0}
run: |
pip install -e . --no-deps
pip install -e ./${{ matrix.FRIEND }} --no-deps

- name: Test
shell: bash -l {0}
run: pytest -v ${{ matrix.FRIEND }}
28 changes: 13 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,26 +33,25 @@ Please refer to [RTD](https://filesystem-spec.readthedocs.io/en/latest/?badge=la

## Develop

fsspec uses [tox](https://tox.readthedocs.io/en/latest/) and
[tox-conda](https://github.com/tox-dev/tox-conda) to manage dev and test
environments. First, install conda with tox and tox-conda in a base environment
(eg. ``conda install -c conda-forge tox tox-conda``). Calls to ``tox`` can then be
used to configure a development environment and run tests.

First, setup a development conda environment via ``tox -e {env}`` where ``env`` is one of ``{py38,py39,py310}``.
This will install fsspec dependencies, test & dev tools, and install fsspec in develop
mode. You may activate the dev environment under ``.tox/{env}`` via ``conda activate .tox/{env}``.
fsspec uses GitHub Actions for CI. Environment files can be found
in the "ci/" directory. Note that the main environment is called "py38",
but it is expected that the version of python installed be adjustable at
CI runtime. For local use, pick a version suitable for you.

### Testing

Tests can be run in the dev environment, if activated, via ``pytest fsspec``.

Alternatively, the full fsspec test suite can also be run via ``tox``, which will
also build the appropriate environment (see above), with the environment specified
by the TOXENV environment variable.

The full fsspec suite requires a system-level docker, docker-compose, and fuse
installation.
installation. If only making changes to one backend implementation, it is
not generally necessary to run all tests locally.

It is expected that contributors ensure that any change to fsspec does not
cause issues or regressions for either other fsspec-related packages such
as gcsfs and s3fs, nor for downstream users of fsspec. The "downstream" CI
run and corresponding environment file run a set of tests from the dask
test suite, and very minimal tests against pandas and zarr from the test_dowstream.py
Copy link
Contributor

@joshmoore joshmoore Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit late, but "test_downstream.py"? But otherwise, ❤️ for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I just meant the spelling in the README. Sorry, no excuse not to have opened a PR: #1121

module in this repo.

### Code Formatting

Expand All @@ -62,7 +61,6 @@ Run ``black fsspec`` from the root of the filesystem_spec repository to
auto-format your code. Additionally, many editors have plugins that will apply
``black`` as you edit files. ``black`` is included in the ``tox`` environments.


Optionally, you may wish to setup [pre-commit hooks](https://pre-commit.com) to
automatically run ``black`` when you make a git commit.
Run ``pre-commit install --install-hooks`` from the root of the
Expand Down
1 change: 0 additions & 1 deletion ci/environment-downstream.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
name: test_env
channels:
- conda-forge
- defaults
dependencies:
- python=3.9
- dask
Expand Down
28 changes: 28 additions & 0 deletions ci/environment-friends.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: test_env
channels:
- conda-forge
dependencies:
- python=3.9
- pytest
- pytest-asyncio
- pytest-benchmark
- pytest-cov
- pytest-mock
- pytest-vcr
- pip
- pytest
- ujson
- requests
- decorator
- google-auth
- aiohttp
- google-auth-oauthlib
- flake8
- black
- google-cloud-core
- google-api-core
- google-api-python-client
- httpretty
- aiobotocore
- "moto>=4"
- flask
40 changes: 36 additions & 4 deletions ci/environment-py38.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,40 @@
name: test_env
channels:
- conda-forge
- defaults
dependencies:
- python=3.8
- tox
- tox-conda
# - python=3.8 # set by env
- pip
- paramiko
- requests
- zstandard
- python-snappy
- aiohttp
- lz4
- distributed
- dask
- pyarrow
- panel
- notebook
- pygit2
- git
- s3fs
- pyftpdlib
- cloudpickle
- pytest
- pytest-asyncio
- pytest-benchmark
- pytest-cov
- pytest-mock
- pytest-vcr
- py
- fusepy
- tomli < 2
- msgpack-python
- python-libarchive-c
- numpy
- nomkl
- jinja2
- tqdm
- pip:
- hadoop-test-cluster
- smbprotocol
2 changes: 1 addition & 1 deletion ci/environment-win.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
name: test_env
channels:
- conda-forge
- defaults
dependencies:
- python=3.9
- aiohttp
- pip
- requests
Expand Down
35 changes: 32 additions & 3 deletions fsspec/implementations/reference.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,16 @@
logger = logging.getLogger("fsspec.reference")


class ReferenceNotReachable(RuntimeError):
def __init__(self, reference, target, *args):
super().__init__(*args)
self.reference = reference
self.target = target

def __str__(self):
return f'Reference "{self.reference}" failed to fetch target {self.target}'


def _first(d):
return list(d.values())[0]

Expand Down Expand Up @@ -213,7 +223,10 @@ def loop(self):
def _cat_common(self, path, start=None, end=None):
path = self._strip_protocol(path)
logger.debug(f"cat: {path}")
part = self.references[path]
try:
part = self.references[path]
except KeyError:
raise FileNotFoundError(path)
if isinstance(part, str):
part = part.encode()
if isinstance(part, bytes):
Expand Down Expand Up @@ -254,15 +267,21 @@ async def _cat_file(self, path, start=None, end=None, **kwargs):
if isinstance(part_or_url, bytes):
return part_or_url[start:end]
protocol, _ = split_protocol(part_or_url)
return await self.fss[protocol]._cat_file(part_or_url, start=start, end=end)
try:
await self.fss[protocol]._cat_file(part_or_url, start=start, end=end)
except Exception as e:
raise ReferenceNotReachable(path, part_or_url) from e

def cat_file(self, path, start=None, end=None, **kwargs):
part_or_url, start0, end0 = self._cat_common(path, start=start, end=end)
if isinstance(part_or_url, bytes):
return part_or_url[start:end]
protocol, _ = split_protocol(part_or_url)
# TODO: start and end should be passed to cat_file, not sliced
return self.fss[protocol].cat_file(part_or_url, start=start0, end=end0)
try:
return self.fss[protocol].cat_file(part_or_url, start=start0, end=end0)
except Exception as e:
raise ReferenceNotReachable(path, part_or_url) from e

def pipe_file(self, path, value, **_):
"""Temporarily add binary data or reference as a file"""
Expand Down Expand Up @@ -360,6 +379,16 @@ def cat(self, path, recursive=False, on_error="raise", **kwargs):
elif np == u and s >= ns and e <= ne:
out[p] = b[s - ns : (e - ne) or None]

for k, v in out.copy().items():
if isinstance(v, Exception):
ex = out[k]
new_ex = ReferenceNotReachable(k, self.references[k])
new_ex.__cause__ = ex
if on_error == "raise":
raise new_ex
elif on_error != "omit":
out[k] = new_ex

if len(out) == 1 and isinstance(path, str) and "*" not in path:
return _first(out)
return out
Expand Down
10 changes: 6 additions & 4 deletions fsspec/implementations/tests/test_cached.py
Original file line number Diff line number Diff line change
Expand Up @@ -884,7 +884,7 @@ def test_expiry():
assert detail["time"] - start_time > 0.09


def test_equality():
def test_equality(tmpdir):
"""Test sane behaviour for equality and hashing.

Make sure that different CachingFileSystem only test equal to each other
Expand All @@ -897,9 +897,11 @@ def test_equality():
from fsspec.implementations.local import LocalFileSystem

lfs = LocalFileSystem()
cfs1 = CachingFileSystem(fs=lfs, cache_storage="raspberry")
cfs2 = CachingFileSystem(fs=lfs, cache_storage="banana")
cfs3 = CachingFileSystem(fs=lfs, cache_storage="banana")
dir1 = f"{tmpdir}/raspberry"
dir2 = f"{tmpdir}/banana"
cfs1 = CachingFileSystem(fs=lfs, cache_storage=dir1)
cfs2 = CachingFileSystem(fs=lfs, cache_storage=dir2)
cfs3 = CachingFileSystem(fs=lfs, cache_storage=dir2)
assert cfs1 == cfs1
assert cfs1 != cfs2
assert cfs1 != cfs3
Expand Down
Loading