Skip to content

Pull of ABSStore #287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tjcrone opened this issue Aug 2, 2018 · 13 comments
Closed

Pull of ABSStore #287

tjcrone opened this issue Aug 2, 2018 · 13 comments

Comments

@tjcrone
Copy link
Member

tjcrone commented Aug 2, 2018

I would like to have my abs_store branch pulled not into master, but into a new branch for testing purposes. This branch implements an ABSStore MutableMapping for Azure Blob Storage that has been tested to some degree, but I do not think is ready for master. However I think it would be easiest for me to continue development on a branch here, and I would be willing to maintain this bit of code going forward. Please let me know if this sounds like a good idea, or if there are better ways to go about this. Thanks.

cc @mzjp2

@mzjp2
Copy link
Member

mzjp2 commented Aug 2, 2018

Agree with @tjcrone on this re:ease of development.

I'm working with @dazzag24 and we'd both be happy to help test and maintain and help with development as part of our work with zarr and Azure as well, if tjcrone is willing. :)

@alimanfoo
Copy link
Member

Thanks @tjcrone, sounds good to me. I've invited you, @mzjp2 and @dazzag24 to the zarr-developers org and @zarr-developers/core-devs team.

@jakirkham
Copy link
Member

Yeah it would be great to grow what Zarr has to offer in terms of native cloud storage support.

I'd be slightly inclined to get it into master with a bunch of under development notes in the docs or similar warnings, but that's just me. :)

@martindurant
Copy link
Member

I don't know if this is a good place to mention it, but my conception of filesystems-spec includes a mapping of the type used by zarr that, if a FS adheres to the spec, would come automatically.

I realise Blob is a bit different (because you are only ever reading whole files, at least in this implementation), but a) I don't think filesystem abstractions really belong in zarr and b) it would be cool to come to some kind of consensus about the interface, consistent across backends.

@alimanfoo
Copy link
Member

It might be worth developing in a branch other than master for now, I'd like to keep master for code that has 100% test coverage, even if the feature is experimental. That said I guess there is an open question about how to unit test stores that access remote services.

Btw I think it would be reasonable to mark this as an experimental feature for at least the first release, if you'd like to proceed cautiously.

@alimanfoo
Copy link
Member

Thanks @martindurant, the filesystems-spec looks very useful. If the desire is to implement a full fs-like interface to Azure then I think you're right, that should live outside zarr. I can also see the value of implementing just an Azure mapping/store class for zarr, which is a smaller and simpler piece of code and can include any optimisations for zarr, and thus could live inside zarr. Personally I'm happy with either direction and happy to defer judgement.

@shikharsg
Copy link
Contributor

Hi everyone. I work with @mzjp2 and @dazzag24. For testing unit test stores, there is this option for azure blob: https://github.com/azure/azurite. I'm testing this now and will do a PR soon.

@martindurant
Copy link
Member

That is a good find, @shikharsg . I wonder if it's fully-featured enough to build a full filesystem, as I would like.

@shikharsg
Copy link
Contributor

shikharsg commented Aug 13, 2018

well if you see the todo items for that repo: https://github.com/azure/azurite#blob-storage---api-implementation-status, there are no major features missing which would stop you from making a full filesystem, and the missing ones can be added to your full filesytem as and when they are completed.

Basically there is

  1. SAS Support (azure equivalent of managing access permissions to s3)
  2. Set Blob Tier (azure equivalent of amazon s3 storage classes)
  3. Get Blob Service Stats (this gets info about geo replication of data)

What I was actually thinking about is (as @alimanfoo suggested) it would be great to have a small piece of code inside zarr that implements the azure mapping(not full file system like), while also having a full file system like library outside of zarr. The former is almost already implemented and I will be happy to contribute to the latter.

@martindurant
Copy link
Member

Sounds very reasonable.
I would make a FileSystem that is essentially the same as gcsfs/s3fs (adhering to the spec), except that for reading, the block-size is predetermined (and you need to care about the type of the blob), and for writing, the way you create file parts and then finalise is a bit different.

@martindurant
Copy link
Member

I can confirm that azurite in a docker container allows you to put a blob and read it back again. I didn't use any security in trying that.

@shikharsg
Copy link
Contributor

shikharsg commented Aug 14, 2018

So I've filled in most gaps in @tjcrone's implementation of ABSStore with tests included in a fork here. All tests pass(apart from some pickle problems in python 2.7 which I'm trying to fix now) and work with Azurite. I'm still hesitant to submit a PR as it's not decided yet if we want to put this in the main zarr repo or in a new repo like s3fs.

@jakirkham
Copy link
Member

Done in PR ( #288 ) and integrated into master with PR ( #345 ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants