Skip to content

Commit 2d331ba

Browse files
Mutable mapping for Azure Blob (#345)
* basic implementation working * docs and cleanup * fixed client_kwargs bug * Add ABSStore mutable mapping * Fix import syntax * Get open_zarr() working * Change account variable names * Fix client.exists() logging issue * Minor comment changes * Get to_zarr() working * Remove state['container'] delete * Implement rmdir * Add docstring for ABSStore * Remove GCSStore from this branch * Fixed missing argument in getsize of ABStore Was missing self.container_name as an argument * Specified prefix argument in rmdir for abstore * Fixed join string error in dir_path in ABStore Join only accepts one argument, using os.path.join(x,y) formats the string as a valid file path for us. * Remove logging work-around as the issue was fixed in azure-storage 1.3.0 * Clean up docstring * Remove more GCSStore code * Move utility functions into ABSStore class * implemented the rest of the mutable mapping functions. tests pass with python 3.5 * using local blob emulator for storage.ABSStore testing * fixed PY2 array.array error in storage.ABSStore * create test container if not exists in ABSStore test * added more tests for ABSStore * reverted blob client creation to inside of ABSStore * added group test for ABSStore * emulator connection string not needed * fixed import statement location and put azure-storage-blob in requirements * fixed pickle tests * fixed listdir in ABSStore * fixed getsize * Fixed PY2 pickle test. python 2 pickle can't pickle instance methods * implemented the suggestion from here: zarr-developers/zarr-python#293 (comment) * flake-8 fixes * added azure-storage-blob * first attempt at docker build with azurite * azure storage emulator in appveyor * syntax correction * checking if emulator is preinstalled * syntax fix * syntax fix * syntax fix * removed wrong syntax * storage emulator with docker * trying different appveyor image * flake 8 fixes * full coverage * verbose logs for pip install to see appveyor error * trying to run locally installed emulator * single-double quote yaml fix * cmd prefix * double quotes around exe file path * double quotes within single quotes with environment variable substitution * trying appveyor build with VS2015 image ; * added comment and removed verbosity option for pip install * list_abs_directory to list only directory blob using delimiter option in azure blob client * fixed ABSStore docs * fixed windows path listdir error * ABSStore refactoring * moved py2 array.array checking to numcodecs ensure bytes * syntax fix * flake8 fix * fixed ABSStore parameter name container * removed context manager from ABSStore * ABSStore.__delitem__ now takes only 1 azure storage API call * docs * Update zarr/storage.py Co-Authored-By: shikharsg <[email protected]> * removed global import of azure storage library * added ABSStore to zarr root import * added ABSStore to tutorial.rst * fixed docs * trying to fix tutorial.rst * flake8 fix * fixing tutorial.rst * fixed ABSStore in tutorial * docs * small change to docs * cleaned create blob code * flake8 fix * Update docs/release.rst Co-Authored-By: shikharsg <[email protected]> * Apply suggestions from code review Co-Authored-By: shikharsg <[email protected]>
1 parent af9a548 commit 2d331ba

File tree

11 files changed

+240
-6
lines changed

11 files changed

+240
-6
lines changed

.travis.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ addons:
1212
- libdb-dev
1313

1414
services:
15+
- docker
1516
- redis-server
1617
- mongodb
1718

@@ -24,6 +25,10 @@ matrix:
2425
dist: xenial
2526
sudo: true
2627

28+
before_install:
29+
- docker pull arafato/azurite
30+
- mkdir ~/blob_emulator
31+
- docker run -e executable=blob -d -t -p 10000:10000 -v ~/blob_emulator:/opt/azurite/folder arafato/azurite
2732
before_script:
2833
- mongo mydb_test --eval 'db.createUser({user:"travis",pwd:"test",roles:["readWrite"]});'
2934

appveyor.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,18 @@ branches:
22
only:
33
- master
44

5+
# the VS C++ compiler path, doesn't seem to exist in the PATH environment variable of
6+
# the Visual Studio 2017 build VM, due to which the pyosreplace package fails to build
7+
image: Visual Studio 2015
8+
59
environment:
610

711
global:
812
# SDK v7.0 MSVC Express 2008's SetEnv.cmd script will fail if the
913
# /E:ON and /V:ON options are not enabled in the batch script intepreter
1014
# See: http://stackoverflow.com/a/13751649/163740
1115
CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\build.cmd"
16+
EMULATOR_LOC: C:\\Program Files (x86)\\Microsoft SDKs\\Azure\\Storage Emulator\\AzureStorageEmulator.exe
1217

1318
matrix:
1419

@@ -36,5 +41,11 @@ install:
3641

3742
build: off
3843

44+
before_test:
45+
- '"%EMULATOR_LOC%" start'
46+
3947
test_script:
4048
- "%CMD_IN_ENV% python -m pytest -v --pyargs zarr"
49+
50+
after_test:
51+
- '"%EMULATOR_LOC%" stop'

docs/api/storage.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ Storage (``zarr.storage``)
3333
.. automethod:: invalidate_values
3434
.. automethod:: invalidate_keys
3535

36+
.. autoclass:: ABSStore
37+
3638
.. autoclass:: ConsolidatedMetadataStore
3739

3840
.. autofunction:: init_array

docs/release.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@ Release notes
99
Enhancements
1010
~~~~~~~~~~~~
1111

12+
* New storage backend, backed by Azure Blob Storage, class :class:`zarr.storage.ABSStore`.
13+
All data is stored as block blobs. By :user:`Shikhar Goenka <shikarsg>` and :user:`Tim Crone <tjcrone>`, :issue:`345`.
14+
1215
* Add "consolidated" metadata as an experimental feature: use
1316
:func:`zarr.convenience.consolidate_metadata` to copy all metadata from the various
1417
metadata keys within a dataset hierarchy under a single key, and

docs/tutorial.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -801,6 +801,21 @@ Here is an example using S3Map to read an array created previously::
801801
>>> z[:].tostring()
802802
b'Hello from the cloud!'
803803

804+
Zarr now also has a builtin storage backend for Azure Blob Storage.
805+
The class is :class:`zarr.storage.ABSStore` (requires
806+
`azure-storage-blob <https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python>`_
807+
to be installed)::
808+
809+
>>> store = zarr.ABSStore(container='test', prefix='zarr-testing', blob_service_kwargs={'is_emulated': True})
810+
>>> root = zarr.group(store=store, overwrite=True)
811+
>>> z = root.zeros('foo/bar', shape=(1000, 1000), chunks=(100, 100), dtype='i4')
812+
>>> z[:] = 42
813+
814+
When using an actual storage account, provide ``account_name`` and
815+
``account_key`` arguments to :class:`zarr.storage.ABSStore`, the
816+
above client is just testing against the emulator. Please also note
817+
that this is an experimental feature.
818+
804819
Note that retrieving data from a remote service via the network can be significantly
805820
slower than retrieving data from a local file system, and will depend on network latency
806821
and bandwidth between the client and server systems. If you are experiencing poor

requirements_test.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ pytest-cov
99
s3fs
1010
setuptools-scm
1111
tox
12+
azure-storage-blob

zarr/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
ones_like, full_like, open_array, open_like, create)
99
from zarr.storage import (DictStore, DirectoryStore, ZipStore, TempStore,
1010
NestedDirectoryStore, DBMStore, LMDBStore, SQLiteStore,
11-
LRUStoreCache, RedisStore, MongoDBStore)
11+
LRUStoreCache, ABSStore, RedisStore, MongoDBStore)
1212
from zarr.hierarchy import group, open_group, Group
1313
from zarr.sync import ThreadSynchronizer, ProcessSynchronizer
1414
from zarr.codecs import *

zarr/storage.py

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1879,6 +1879,153 @@ def __delitem__(self, key):
18791879
self._invalidate_value(key)
18801880

18811881

1882+
class ABSStore(MutableMapping):
1883+
"""Storage class using Azure Blob Storage (ABS).
1884+
1885+
Parameters
1886+
----------
1887+
container : string
1888+
The name of the ABS container to use.
1889+
prefix : string
1890+
Location of the "directory" to use as the root of the storage hierarchy
1891+
within the container.
1892+
account_name : string
1893+
The Azure blob storage account name.
1894+
account_key : string
1895+
The Azure blob storage account access key.
1896+
blob_service_kwargs : dictionary
1897+
Extra arguments to be passed into the azure blob client, for e.g. when
1898+
using the emulator, pass in blob_service_kwargs={'is_emulated': True}.
1899+
1900+
Notes
1901+
-----
1902+
In order to use this store, you must install the Microsoft Azure Storage SDK for Python.
1903+
"""
1904+
1905+
def __init__(self, container, prefix, account_name=None, account_key=None,
1906+
blob_service_kwargs=None):
1907+
from azure.storage.blob import BlockBlobService
1908+
self.container = container
1909+
self.prefix = normalize_storage_path(prefix)
1910+
self.account_name = account_name
1911+
self.account_key = account_key
1912+
if blob_service_kwargs is not None:
1913+
self.blob_service_kwargs = blob_service_kwargs
1914+
else: # pragma: no cover
1915+
self.blob_service_kwargs = dict()
1916+
self.client = BlockBlobService(self.account_name, self.account_key,
1917+
**self.blob_service_kwargs)
1918+
1919+
# needed for pickling
1920+
def __getstate__(self):
1921+
state = self.__dict__.copy()
1922+
del state['client']
1923+
return state
1924+
1925+
def __setstate__(self, state):
1926+
from azure.storage.blob import BlockBlobService
1927+
self.__dict__.update(state)
1928+
self.client = BlockBlobService(self.account_name, self.account_key,
1929+
**self.blob_service_kwargs)
1930+
1931+
@staticmethod
1932+
def _append_path_to_prefix(path, prefix):
1933+
return '/'.join([normalize_storage_path(prefix),
1934+
normalize_storage_path(path)])
1935+
1936+
@staticmethod
1937+
def _strip_prefix_from_path(path, prefix):
1938+
# normalized things will not have any leading or trailing slashes
1939+
path_norm = normalize_storage_path(path)
1940+
prefix_norm = normalize_storage_path(prefix)
1941+
return path_norm[(len(prefix_norm)+1):]
1942+
1943+
def __getitem__(self, key):
1944+
from azure.common import AzureMissingResourceHttpError
1945+
blob_name = '/'.join([self.prefix, key])
1946+
try:
1947+
blob = self.client.get_blob_to_bytes(self.container, blob_name)
1948+
return blob.content
1949+
except AzureMissingResourceHttpError:
1950+
raise KeyError('Blob %s not found' % blob_name)
1951+
1952+
def __setitem__(self, key, value):
1953+
value = ensure_bytes(value)
1954+
blob_name = '/'.join([self.prefix, key])
1955+
self.client.create_blob_from_bytes(self.container, blob_name, value)
1956+
1957+
def __delitem__(self, key):
1958+
from azure.common import AzureMissingResourceHttpError
1959+
try:
1960+
self.client.delete_blob(self.container, '/'.join([self.prefix, key]))
1961+
except AzureMissingResourceHttpError:
1962+
raise KeyError('Blob %s not found' % key)
1963+
1964+
def __eq__(self, other):
1965+
return (
1966+
isinstance(other, ABSStore) and
1967+
self.container == other.container and
1968+
self.prefix == other.prefix
1969+
)
1970+
1971+
def keys(self):
1972+
return list(self.__iter__())
1973+
1974+
def __iter__(self):
1975+
for blob in self.client.list_blobs(self.container, self.prefix + '/'):
1976+
yield self._strip_prefix_from_path(blob.name, self.prefix)
1977+
1978+
def __len__(self):
1979+
return len(self.keys())
1980+
1981+
def __contains__(self, key):
1982+
blob_name = '/'.join([self.prefix, key])
1983+
if self.client.exists(self.container, blob_name):
1984+
return True
1985+
else:
1986+
return False
1987+
1988+
def listdir(self, path=None):
1989+
store_path = normalize_storage_path(path)
1990+
# prefix is normalized to not have a trailing slash
1991+
dir_path = self.prefix
1992+
if store_path:
1993+
dir_path = dir_path + '/' + store_path
1994+
dir_path += '/'
1995+
items = list()
1996+
for blob in self.client.list_blobs(self.container, prefix=dir_path, delimiter='/'):
1997+
if '/' in blob.name[len(dir_path):]:
1998+
items.append(self._strip_prefix_from_path(
1999+
blob.name[:blob.name.find('/', len(dir_path))], dir_path))
2000+
else:
2001+
items.append(self._strip_prefix_from_path(blob.name, dir_path))
2002+
return items
2003+
2004+
def rmdir(self, path=None):
2005+
dir_path = normalize_storage_path(self._append_path_to_prefix(path, self.prefix)) + '/'
2006+
for blob in self.client.list_blobs(self.container, prefix=dir_path):
2007+
self.client.delete_blob(self.container, blob.name)
2008+
2009+
def getsize(self, path=None):
2010+
store_path = normalize_storage_path(path)
2011+
fs_path = self.prefix
2012+
if store_path:
2013+
fs_path = self._append_path_to_prefix(store_path, self.prefix)
2014+
if self.client.exists(self.container, fs_path):
2015+
return self.client.get_blob_properties(self.container,
2016+
fs_path).properties.content_length
2017+
else:
2018+
size = 0
2019+
for blob in self.client.list_blobs(self.container, prefix=fs_path + '/',
2020+
delimiter='/'):
2021+
if '/' not in blob.name[len(fs_path + '/'):]:
2022+
size += blob.properties.content_length
2023+
return size
2024+
2025+
def clear(self):
2026+
self.rmdir()
2027+
2028+
18822029
class SQLiteStore(MutableMapping):
18832030
"""Storage class using SQLite.
18842031

zarr/tests/test_core.py

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,12 @@
1313
import numpy as np
1414
from numpy.testing import assert_array_equal, assert_array_almost_equal
1515
import pytest
16+
from azure.storage.blob import BlockBlobService
1617

1718

1819
from zarr.storage import (DirectoryStore, init_array, init_group, NestedDirectoryStore,
19-
DBMStore, LMDBStore, SQLiteStore, atexit_rmtree, atexit_rmglob,
20-
LRUStoreCache)
20+
DBMStore, LMDBStore, SQLiteStore, ABSStore, atexit_rmtree,
21+
atexit_rmglob, LRUStoreCache)
2122
from zarr.core import Array
2223
from zarr.errors import PermissionError
2324
from zarr.compat import PY2, text_type, binary_type, zip_longest
@@ -1408,6 +1409,28 @@ def test_nbytes_stored(self):
14081409
assert expect_nbytes_stored == z.nbytes_stored
14091410

14101411

1412+
class TestArrayWithABSStore(TestArray):
1413+
1414+
@staticmethod
1415+
def absstore():
1416+
blob_client = BlockBlobService(is_emulated=True)
1417+
blob_client.delete_container('test')
1418+
blob_client.create_container('test')
1419+
store = ABSStore(container='test', prefix='zarrtesting/', account_name='foo',
1420+
account_key='bar', blob_service_kwargs={'is_emulated': True})
1421+
store.rmdir()
1422+
return store
1423+
1424+
def create_array(self, read_only=False, **kwargs):
1425+
store = self.absstore()
1426+
kwargs.setdefault('compressor', Zlib(1))
1427+
cache_metadata = kwargs.pop('cache_metadata', True)
1428+
cache_attrs = kwargs.pop('cache_attrs', True)
1429+
init_array(store, **kwargs)
1430+
return Array(store, read_only=read_only, cache_metadata=cache_metadata,
1431+
cache_attrs=cache_attrs)
1432+
1433+
14111434
class TestArrayWithNestedDirectoryStore(TestArrayWithDirectoryStore):
14121435

14131436
@staticmethod

zarr/tests/test_hierarchy.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,13 @@
1313
import numpy as np
1414
from numpy.testing import assert_array_equal
1515
import pytest
16+
from azure.storage.blob import BlockBlobService
1617

1718

1819
from zarr.storage import (DictStore, DirectoryStore, ZipStore, init_group, init_array,
1920
array_meta_key, group_meta_key, atexit_rmtree,
2021
NestedDirectoryStore, DBMStore, LMDBStore, SQLiteStore,
21-
atexit_rmglob, LRUStoreCache)
22+
ABSStore, atexit_rmglob, LRUStoreCache)
2223
from zarr.core import Array
2324
from zarr.compat import PY2, text_type
2425
from zarr.hierarchy import Group, group, open_group
@@ -864,6 +865,19 @@ def create_store():
864865
return store, None
865866

866867

868+
class TestGroupWithABSStore(TestGroup):
869+
870+
@staticmethod
871+
def create_store():
872+
blob_client = BlockBlobService(is_emulated=True)
873+
blob_client.delete_container('test')
874+
blob_client.create_container('test')
875+
store = ABSStore(container='test', prefix='zarrtesting/', account_name='foo',
876+
account_key='bar', blob_service_kwargs={'is_emulated': True})
877+
store.rmdir()
878+
return store, None
879+
880+
867881
class TestGroupWithNestedDirectoryStore(TestGroup):
868882

869883
@staticmethod

zarr/tests/test_storage.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,15 @@
1515
import numpy as np
1616
from numpy.testing import assert_array_equal, assert_array_almost_equal
1717
import pytest
18+
from azure.storage.blob import BlockBlobService
1819

1920

2021
from zarr.storage import (init_array, array_meta_key, attrs_key, DictStore,
2122
DirectoryStore, ZipStore, init_group, group_meta_key,
2223
getsize, migrate_1to2, TempStore, atexit_rmtree,
2324
NestedDirectoryStore, default_compressor, DBMStore,
24-
LMDBStore, SQLiteStore, MongoDBStore, RedisStore,
25-
atexit_rmglob, LRUStoreCache, ConsolidatedMetadataStore)
25+
LMDBStore, SQLiteStore, ABSStore, atexit_rmglob, LRUStoreCache,
26+
ConsolidatedMetadataStore, MongoDBStore, RedisStore)
2627
from zarr.meta import (decode_array_metadata, encode_array_metadata, ZARR_FORMAT,
2728
decode_group_metadata, encode_group_metadata)
2829
from zarr.compat import PY2
@@ -1513,6 +1514,18 @@ def test_format_compatibility():
15131514
assert compressor.get_config() == z.compressor.get_config()
15141515

15151516

1517+
class TestABSStore(StoreTests, unittest.TestCase):
1518+
1519+
def create_store(self):
1520+
blob_client = BlockBlobService(is_emulated=True)
1521+
blob_client.delete_container('test')
1522+
blob_client.create_container('test')
1523+
store = ABSStore(container='test', prefix='zarrtesting/', account_name='foo',
1524+
account_key='bar', blob_service_kwargs={'is_emulated': True})
1525+
store.rmdir()
1526+
return store
1527+
1528+
15161529
class TestConsolidatedMetadataStore(unittest.TestCase):
15171530

15181531
def test_bad_format(self):

0 commit comments

Comments
 (0)