Skip to content

Mutable mapping for Azure Blob #345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 90 commits into from
Mar 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
fd5bfb5
basic implementation working
rabernat Mar 29, 2018
a26752c
docs and cleanup
rabernat Mar 29, 2018
23dd8f6
fixed client_kwargs bug
rabernat Mar 30, 2018
dec75dd
Add ABSStore mutable mapping
Apr 15, 2018
13c2077
Fix import syntax
Apr 15, 2018
86603fa
Get open_zarr() working
tjcrone Apr 16, 2018
4b999ba
Change account variable names
tjcrone Apr 16, 2018
677ec1c
Fix client.exists() logging issue
tjcrone Apr 16, 2018
d9be9ba
Minor comment changes
tjcrone Apr 16, 2018
13b1ee8
Get to_zarr() working
tjcrone Apr 16, 2018
e5564c3
Remove state['container'] delete
tjcrone Apr 17, 2018
a85e559
Implement rmdir
tjcrone Apr 17, 2018
272d234
Add docstring for ABSStore
tjcrone Aug 2, 2018
bb406a0
Remove GCSStore from this branch
tjcrone Aug 2, 2018
937d162
Fixed missing argument in getsize of ABStore
mzjp2 Aug 2, 2018
74920c4
Specified prefix argument in rmdir for abstore
mzjp2 Aug 2, 2018
bd1648b
Fixed join string error in dir_path in ABStore
mzjp2 Aug 2, 2018
e79cb4f
Merge pull request #2 from mzjp2/abs_store
tjcrone Aug 2, 2018
0e71f70
Remove logging work-around as the issue was fixed in azure-storage 1.3.0
tjcrone Aug 3, 2018
de5bb9c
Clean up docstring
tjcrone Aug 3, 2018
13a6d30
Remove more GCSStore code
tjcrone Aug 3, 2018
7b52e39
Move utility functions into ABSStore class
tjcrone Aug 6, 2018
29a8697
Merge pull request #288 from friedrichknuth/abs_store
tjcrone Aug 6, 2018
36139cb
implemented the rest of the mutable mapping functions. tests pass wit…
shikharsg Aug 12, 2018
bda0c3f
using local blob emulator for storage.ABSStore testing
shikharsg Aug 14, 2018
447c473
fixed PY2 array.array error in storage.ABSStore
shikharsg Aug 14, 2018
c6858ed
create test container if not exists in ABSStore test
shikharsg Aug 14, 2018
8e51b3b
added more tests for ABSStore
shikharsg Aug 14, 2018
ec4e3f1
reverted blob client creation to inside of ABSStore
shikharsg Aug 15, 2018
bde7b5e
added group test for ABSStore
shikharsg Aug 15, 2018
b86cf53
emulator connection string not needed
shikharsg Aug 15, 2018
f66dadd
fixed import statement location and put azure-storage-blob in require…
shikharsg Aug 15, 2018
b8f60fe
fixed pickle tests
shikharsg Aug 15, 2018
edd5a71
fixed listdir in ABSStore
shikharsg Aug 15, 2018
3fbe589
fixed getsize
shikharsg Aug 16, 2018
4b8560e
Fixed PY2 pickle test. python 2 pickle can't pickle instance methods
shikharsg Aug 16, 2018
631051c
implemented the suggestion from here: https://github.com/zarr-develop…
shikharsg Sep 4, 2018
ea93352
flake-8 fixes
shikharsg Sep 5, 2018
dd17cd4
merged master with abs_store
shikharsg Nov 27, 2018
08fe155
added azure-storage-blob
shikharsg Nov 27, 2018
957b405
first attempt at docker build with azurite
shikharsg Nov 30, 2018
9c128db
azure storage emulator in appveyor
shikharsg Nov 30, 2018
2da2453
syntax correction
shikharsg Nov 30, 2018
a09e2c9
checking if emulator is preinstalled
shikharsg Nov 30, 2018
5ce6a4c
syntax fix
shikharsg Nov 30, 2018
bf8aa37
syntax fix
shikharsg Nov 30, 2018
730255c
syntax fix
shikharsg Nov 30, 2018
8dc2f5d
removed wrong syntax
shikharsg Dec 12, 2018
85a5670
storage emulator with docker
shikharsg Dec 12, 2018
b2cda56
merged abs_store with upstream
shikharsg Dec 12, 2018
a09fb61
trying different appveyor image
shikharsg Dec 13, 2018
168ba50
flake 8 fixes
shikharsg Dec 13, 2018
e0de99b
full coverage
shikharsg Dec 13, 2018
3efe802
verbose logs for pip install to see appveyor error
shikharsg Dec 14, 2018
8f85315
trying to run locally installed emulator
shikharsg Dec 14, 2018
d1bb9ce
single-double quote yaml fix
shikharsg Dec 14, 2018
735c661
cmd prefix
shikharsg Dec 14, 2018
979a438
double quotes around exe file path
shikharsg Dec 14, 2018
5beace1
double quotes within single quotes with environment variable substitu…
shikharsg Dec 14, 2018
68bda4e
trying appveyor build with VS2015 image
shikharsg Dec 14, 2018
77db637
added comment and removed verbosity option for pip install
shikharsg Dec 14, 2018
bcdc839
list_abs_directory to list only directory blob using delimiter option…
shikharsg Dec 14, 2018
ac286ce
fixed ABSStore docs
shikharsg Dec 14, 2018
cdaceb7
fixed windows path listdir error
shikharsg Dec 16, 2018
b6b3024
ABSStore refactoring
shikharsg Dec 16, 2018
b6eebc8
moved py2 array.array checking to numcodecs ensure bytes
shikharsg Dec 23, 2018
101be91
Merge branch 'master' into abs_store
shikharsg Jan 22, 2019
3abe79d
syntax fix
shikharsg Jan 22, 2019
3ad6d9c
flake8 fix
shikharsg Jan 22, 2019
ab38119
fixed ABSStore parameter name container
shikharsg Jan 23, 2019
05aab41
removed context manager from ABSStore
shikharsg Jan 23, 2019
90b5e3a
ABSStore.__delitem__ now takes only 1 azure storage API call
shikharsg Jan 23, 2019
4636d5d
docs
shikharsg Jan 23, 2019
8c3863f
Update zarr/storage.py
alimanfoo Jan 23, 2019
b238f0b
removed global import of azure storage library
shikharsg Feb 1, 2019
9770876
added ABSStore to zarr root import
shikharsg Feb 2, 2019
3ed4814
added ABSStore to tutorial.rst
shikharsg Feb 2, 2019
7b08aba
fixed docs
shikharsg Feb 2, 2019
6fc869d
trying to fix tutorial.rst
shikharsg Feb 2, 2019
e9a402e
flake8 fix
shikharsg Feb 2, 2019
8aa3a01
fixing tutorial.rst
shikharsg Feb 2, 2019
a9940a2
fixed ABSStore in tutorial
shikharsg Feb 4, 2019
4d5b6d1
docs
shikharsg Feb 4, 2019
45b0642
Merge branch 'master' into abs_store
shikharsg Feb 9, 2019
3a4f4d9
small change to docs
shikharsg Feb 12, 2019
1da44b1
Merge branch 'abs_store' of github.com:shikharsg/zarr into abs_store
shikharsg Feb 12, 2019
b51fb78
cleaned create blob code
Feb 21, 2019
4af5ebe
flake8 fix
Feb 21, 2019
13a7dc4
Update docs/release.rst
alimanfoo Mar 1, 2019
a062e18
Apply suggestions from code review
alimanfoo Mar 1, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ addons:
- libdb-dev

services:
- docker
- redis-server
- mongodb

Expand All @@ -24,6 +25,10 @@ matrix:
dist: xenial
sudo: true

before_install:
- docker pull arafato/azurite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, this may move to an azure org on Docker Hub ( Azure/Azurite#94 ), but it hasn't happened yet. Happy with leaving this as is. Just raising awareness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing out

- mkdir ~/blob_emulator
- docker run -e executable=blob -d -t -p 10000:10000 -v ~/blob_emulator:/opt/azurite/folder arafato/azurite
before_script:
- mongo mydb_test --eval 'db.createUser({user:"travis",pwd:"test",roles:["readWrite"]});'

Expand Down
11 changes: 11 additions & 0 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,18 @@ branches:
only:
- master

# the VS C++ compiler path, doesn't seem to exist in the PATH environment variable of
# the Visual Studio 2017 build VM, due to which the pyosreplace package fails to build
image: Visual Studio 2015

environment:

global:
# SDK v7.0 MSVC Express 2008's SetEnv.cmd script will fail if the
# /E:ON and /V:ON options are not enabled in the batch script intepreter
# See: http://stackoverflow.com/a/13751649/163740
CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\build.cmd"
EMULATOR_LOC: C:\\Program Files (x86)\\Microsoft SDKs\\Azure\\Storage Emulator\\AzureStorageEmulator.exe

matrix:

Expand Down Expand Up @@ -36,5 +41,11 @@ install:

build: off

before_test:
- '"%EMULATOR_LOC%" start'

test_script:
- "%CMD_IN_ENV% python -m pytest -v --pyargs zarr"

after_test:
- '"%EMULATOR_LOC%" stop'
2 changes: 2 additions & 0 deletions docs/api/storage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ Storage (``zarr.storage``)
.. automethod:: invalidate_values
.. automethod:: invalidate_keys

.. autoclass:: ABSStore

.. autoclass:: ConsolidatedMetadataStore

.. autofunction:: init_array
Expand Down
3 changes: 3 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ Release notes
Enhancements
~~~~~~~~~~~~

* New storage backend, backed by Azure Blob Storage, class :class:`zarr.storage.ABSStore`.
All data is stored as block blobs. By :user:`Shikhar Goenka <shikarsg>` and :user:`Tim Crone <tjcrone>`, :issue:`345`.

* Add "consolidated" metadata as an experimental feature: use
:func:`zarr.convenience.consolidate_metadata` to copy all metadata from the various
metadata keys within a dataset hierarchy under a single key, and
Expand Down
15 changes: 15 additions & 0 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -787,6 +787,21 @@ Here is an example using S3Map to read an array created previously::
>>> z[:].tostring()
b'Hello from the cloud!'

Zarr now also has a builtin storage backend for Azure Blob Storage.
The class is :class:`zarr.storage.ABSStore` (requires
`azure-storage-blob <https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python>`_
to be installed)::

>>> store = zarr.ABSStore(container='test', prefix='zarr-testing', blob_service_kwargs={'is_emulated': True})
>>> root = zarr.group(store=store, overwrite=True)
>>> z = root.zeros('foo/bar', shape=(1000, 1000), chunks=(100, 100), dtype='i4')
>>> z[:] = 42

When using an actual storage account, provide ``account_name`` and
``account_key`` arguments to :class:`zarr.storage.ABSStore`, the
above client is just testing against the emulator. Please also note
that this is an experimental feature.

Note that retrieving data from a remote service via the network can be significantly
slower than retrieving data from a local file system, and will depend on network latency
and bandwidth between the client and server systems. If you are experiencing poor
Expand Down
1 change: 1 addition & 0 deletions requirements_test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ pytest-cov
s3fs
setuptools-scm
tox
azure-storage-blob
2 changes: 1 addition & 1 deletion zarr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
ones_like, full_like, open_array, open_like, create)
from zarr.storage import (DictStore, DirectoryStore, ZipStore, TempStore,
NestedDirectoryStore, DBMStore, LMDBStore, SQLiteStore,
LRUStoreCache, RedisStore, MongoDBStore)
LRUStoreCache, ABSStore, RedisStore, MongoDBStore)
from zarr.hierarchy import group, open_group, Group
from zarr.sync import ThreadSynchronizer, ProcessSynchronizer
from zarr.codecs import *
Expand Down
147 changes: 147 additions & 0 deletions zarr/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -1879,6 +1879,153 @@ def __delitem__(self, key):
self._invalidate_value(key)


class ABSStore(MutableMapping):
"""Storage class using Azure Blob Storage (ABS).

Parameters
----------
container : string
The name of the ABS container to use.
prefix : string
Location of the "directory" to use as the root of the storage hierarchy
within the container.
account_name : string
The Azure blob storage account name.
account_key : string
The Azure blob storage account access key.
blob_service_kwargs : dictionary
Extra arguments to be passed into the azure blob client, for e.g. when
using the emulator, pass in blob_service_kwargs={'is_emulated': True}.

Notes
-----
In order to use this store, you must install the Microsoft Azure Storage SDK for Python.
"""

def __init__(self, container, prefix, account_name=None, account_key=None,
blob_service_kwargs=None):
from azure.storage.blob import BlockBlobService
self.container = container
self.prefix = normalize_storage_path(prefix)
self.account_name = account_name
self.account_key = account_key
if blob_service_kwargs is not None:
self.blob_service_kwargs = blob_service_kwargs
else: # pragma: no cover
self.blob_service_kwargs = dict()
self.client = BlockBlobService(self.account_name, self.account_key,
**self.blob_service_kwargs)

# needed for pickling
def __getstate__(self):
state = self.__dict__.copy()
del state['client']
return state

def __setstate__(self, state):
from azure.storage.blob import BlockBlobService
self.__dict__.update(state)
self.client = BlockBlobService(self.account_name, self.account_key,
**self.blob_service_kwargs)

@staticmethod
def _append_path_to_prefix(path, prefix):
return '/'.join([normalize_storage_path(prefix),
normalize_storage_path(path)])

@staticmethod
def _strip_prefix_from_path(path, prefix):
# normalized things will not have any leading or trailing slashes
path_norm = normalize_storage_path(path)
prefix_norm = normalize_storage_path(prefix)
return path_norm[(len(prefix_norm)+1):]

def __getitem__(self, key):
from azure.common import AzureMissingResourceHttpError
blob_name = '/'.join([self.prefix, key])
try:
blob = self.client.get_blob_to_bytes(self.container, blob_name)
return blob.content
except AzureMissingResourceHttpError:
raise KeyError('Blob %s not found' % blob_name)

def __setitem__(self, key, value):
value = ensure_bytes(value)
blob_name = '/'.join([self.prefix, key])
self.client.create_blob_from_bytes(self.container, blob_name, value)

def __delitem__(self, key):
from azure.common import AzureMissingResourceHttpError
try:
self.client.delete_blob(self.container, '/'.join([self.prefix, key]))
except AzureMissingResourceHttpError:
raise KeyError('Blob %s not found' % key)

def __eq__(self, other):
return (
isinstance(other, ABSStore) and
self.container == other.container and
self.prefix == other.prefix
)

def keys(self):
return list(self.__iter__())

def __iter__(self):
for blob in self.client.list_blobs(self.container, self.prefix + '/'):
yield self._strip_prefix_from_path(blob.name, self.prefix)

def __len__(self):
return len(self.keys())

def __contains__(self, key):
blob_name = '/'.join([self.prefix, key])
if self.client.exists(self.container, blob_name):
return True
else:
return False

def listdir(self, path=None):
store_path = normalize_storage_path(path)
# prefix is normalized to not have a trailing slash
dir_path = self.prefix
if store_path:
dir_path = dir_path + '/' + store_path
dir_path += '/'
items = list()
for blob in self.client.list_blobs(self.container, prefix=dir_path, delimiter='/'):
if '/' in blob.name[len(dir_path):]:
items.append(self._strip_prefix_from_path(
blob.name[:blob.name.find('/', len(dir_path))], dir_path))
else:
items.append(self._strip_prefix_from_path(blob.name, dir_path))
return items

def rmdir(self, path=None):
dir_path = normalize_storage_path(self._append_path_to_prefix(path, self.prefix)) + '/'
for blob in self.client.list_blobs(self.container, prefix=dir_path):
self.client.delete_blob(self.container, blob.name)

def getsize(self, path=None):
store_path = normalize_storage_path(path)
fs_path = self.prefix
if store_path:
fs_path = self._append_path_to_prefix(store_path, self.prefix)
if self.client.exists(self.container, fs_path):
return self.client.get_blob_properties(self.container,
fs_path).properties.content_length
else:
size = 0
for blob in self.client.list_blobs(self.container, prefix=fs_path + '/',
delimiter='/'):
if '/' not in blob.name[len(fs_path + '/'):]:
size += blob.properties.content_length
return size

def clear(self):
self.rmdir()


class SQLiteStore(MutableMapping):
"""Storage class using SQLite.

Expand Down
27 changes: 25 additions & 2 deletions zarr/tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,12 @@
import numpy as np
from numpy.testing import assert_array_equal, assert_array_almost_equal
import pytest
from azure.storage.blob import BlockBlobService


from zarr.storage import (DirectoryStore, init_array, init_group, NestedDirectoryStore,
DBMStore, LMDBStore, SQLiteStore, atexit_rmtree, atexit_rmglob,
LRUStoreCache)
DBMStore, LMDBStore, SQLiteStore, ABSStore, atexit_rmtree,
atexit_rmglob, LRUStoreCache)
from zarr.core import Array
from zarr.errors import PermissionError
from zarr.compat import PY2, text_type, binary_type, zip_longest
Expand Down Expand Up @@ -1322,6 +1323,28 @@ def test_nbytes_stored(self):
assert expect_nbytes_stored == z.nbytes_stored


class TestArrayWithABSStore(TestArray):

@staticmethod
def absstore():
blob_client = BlockBlobService(is_emulated=True)
blob_client.delete_container('test')
blob_client.create_container('test')
store = ABSStore(container='test', prefix='zarrtesting/', account_name='foo',
account_key='bar', blob_service_kwargs={'is_emulated': True})
store.rmdir()
return store

def create_array(self, read_only=False, **kwargs):
store = self.absstore()
kwargs.setdefault('compressor', Zlib(1))
cache_metadata = kwargs.pop('cache_metadata', True)
cache_attrs = kwargs.pop('cache_attrs', True)
init_array(store, **kwargs)
return Array(store, read_only=read_only, cache_metadata=cache_metadata,
cache_attrs=cache_attrs)


class TestArrayWithNestedDirectoryStore(TestArrayWithDirectoryStore):

@staticmethod
Expand Down
16 changes: 15 additions & 1 deletion zarr/tests/test_hierarchy.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,13 @@
import numpy as np
from numpy.testing import assert_array_equal
import pytest
from azure.storage.blob import BlockBlobService


from zarr.storage import (DictStore, DirectoryStore, ZipStore, init_group, init_array,
array_meta_key, group_meta_key, atexit_rmtree,
NestedDirectoryStore, DBMStore, LMDBStore, SQLiteStore,
atexit_rmglob, LRUStoreCache)
ABSStore, atexit_rmglob, LRUStoreCache)
from zarr.core import Array
from zarr.compat import PY2, text_type
from zarr.hierarchy import Group, group, open_group
Expand Down Expand Up @@ -864,6 +865,19 @@ def create_store():
return store, None


class TestGroupWithABSStore(TestGroup):

@staticmethod
def create_store():
blob_client = BlockBlobService(is_emulated=True)
blob_client.delete_container('test')
blob_client.create_container('test')
store = ABSStore(container='test', prefix='zarrtesting/', account_name='foo',
account_key='bar', blob_service_kwargs={'is_emulated': True})
store.rmdir()
return store, None


class TestGroupWithNestedDirectoryStore(TestGroup):

@staticmethod
Expand Down
17 changes: 15 additions & 2 deletions zarr/tests/test_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,15 @@
import numpy as np
from numpy.testing import assert_array_equal, assert_array_almost_equal
import pytest
from azure.storage.blob import BlockBlobService


from zarr.storage import (init_array, array_meta_key, attrs_key, DictStore,
DirectoryStore, ZipStore, init_group, group_meta_key,
getsize, migrate_1to2, TempStore, atexit_rmtree,
NestedDirectoryStore, default_compressor, DBMStore,
LMDBStore, SQLiteStore, MongoDBStore, RedisStore,
atexit_rmglob, LRUStoreCache, ConsolidatedMetadataStore)
LMDBStore, SQLiteStore, ABSStore, atexit_rmglob, LRUStoreCache,
ConsolidatedMetadataStore, MongoDBStore, RedisStore)
from zarr.meta import (decode_array_metadata, encode_array_metadata, ZARR_FORMAT,
decode_group_metadata, encode_group_metadata)
from zarr.compat import PY2
Expand Down Expand Up @@ -1370,6 +1371,18 @@ def test_format_compatibility():
assert compressor.get_config() == z.compressor.get_config()


class TestABSStore(StoreTests, unittest.TestCase):

def create_store(self):
blob_client = BlockBlobService(is_emulated=True)
blob_client.delete_container('test')
blob_client.create_container('test')
store = ABSStore(container='test', prefix='zarrtesting/', account_name='foo',
account_key='bar', blob_service_kwargs={'is_emulated': True})
store.rmdir()
return store


class TestConsolidatedMetadataStore(unittest.TestCase):

def test_bad_format(self):
Expand Down