Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
bcb21a1
Add model-based implementation
itrushkin Sep 10, 2021
63e1da6
Shared implementation for both APIs
itrushkin Oct 4, 2021
4d3a014
Move FedCurv logic in ingle entry point
itrushkin Oct 4, 2021
e45aeaa
Merge branch 'develop' into fedcurv
itrushkin Oct 4, 2021
403247a
Update implementation
itrushkin Oct 5, 2021
2509d95
Fix linter
itrushkin Oct 8, 2021
01c1fc6
Merge branch 'develop' into fedcurv
itrushkin Oct 8, 2021
c6e7be9
Add FedCurv tutorial
itrushkin Oct 15, 2021
fdfed5e
Merge branch 'develop' into fedcurv
itrushkin Oct 15, 2021
c1fe0b0
Fix linter
itrushkin Oct 15, 2021
366e9ee
Automate envoys population
itrushkin Oct 18, 2021
cc5657d
Update envoy folders generation
itrushkin Oct 22, 2021
a67a148
Allow multiple devices
itrushkin Oct 28, 2021
c4af719
Allow multiple devices
itrushkin Oct 28, 2021
0225fb1
Debugging
itrushkin Oct 28, 2021
7874bb9
Cast local FedCurv parameters to model device
itrushkin Oct 28, 2021
8699153
Unify devices in FIM calculation
itrushkin Oct 29, 2021
2486df6
Change loss to NLL
itrushkin Jan 12, 2022
637c03a
Sync devices for tasks
itrushkin Jan 18, 2022
57cde85
Clear notebook
itrushkin Jan 20, 2022
43b4f2f
Merge branch 'develop' into fedcurv
itrushkin Jan 20, 2022
316be87
Fix envoy config structure
itrushkin Jan 24, 2022
7572681
Adjust data splitter
itrushkin Feb 11, 2022
cc89231
Fix aggregation function decorator
itrushkin Feb 11, 2022
197d86b
Remove debug lines
itrushkin Feb 11, 2022
393c1a3
Update shard descriptor
itrushkin Feb 11, 2022
f496832
Fix linter
itrushkin Feb 11, 2022
2b0fca1
tmp
itrushkin Feb 15, 2022
7d366b7
Merge branch 'develop' into fedcurv
itrushkin Feb 15, 2022
cfd4483
Shuffle data in lognormal split
itrushkin Feb 17, 2022
5c51e86
Add debugging information in data_splitter
itrushkin Feb 17, 2022
a8a7c04
Remove shuffling
itrushkin Feb 17, 2022
dc2c0ee
Fix linter
itrushkin Feb 18, 2022
1443256
Add debug information
itrushkin Feb 18, 2022
9a70a54
Avoid double loss backward
itrushkin Feb 18, 2022
352862a
Add file debugging
itrushkin Feb 21, 2022
edefe7f
Detach variables from first backward pass
itrushkin Feb 21, 2022
606cd46
Detach PyTorch graph variables
itrushkin Feb 21, 2022
221aba6
Retain PyTorch graph
itrushkin Feb 22, 2022
3be3c63
Introduce separate PyTorch graph variables for FedCurv
itrushkin Feb 22, 2022
d9986ff
Use data prop of gradients
itrushkin Feb 22, 2022
0c2023d
Revert backward call
itrushkin Feb 22, 2022
e14c289
fix
itrushkin Feb 22, 2022
f3291f7
Detach FedCurv variables
itrushkin Feb 22, 2022
f49bb78
Extend logging
itrushkin Feb 22, 2022
ec0266f
Extend logs
itrushkin Feb 22, 2022
dc8b9ad
Specify file mode
itrushkin Feb 22, 2022
1e02179
Debug changes
itrushkin Feb 24, 2022
9d16ae9
Debug changes
itrushkin Feb 24, 2022
6bf55bd
Debug changes
itrushkin Feb 24, 2022
d2f3679
Debug changes
itrushkin Feb 24, 2022
765700d
Change saving mechanism
itrushkin Feb 25, 2022
080359c
Detach FedCurv variables from PyTorch graph
itrushkin Feb 25, 2022
6f2923f
Detach global variables
itrushkin Feb 25, 2022
6b96fc5
Add const term to penalty function
itrushkin Feb 28, 2022
f1f81af
Fix linter
itrushkin Feb 28, 2022
5239bcb
Aggregate constant term by summation
itrushkin Feb 28, 2022
949c7c4
Initialize constant buffer
itrushkin Feb 28, 2022
78101a1
Remove TinyImageNetFedCurv workspace
itrushkin Feb 28, 2022
c65a7c8
Restore PyTorch Histology workspace
itrushkin Feb 28, 2022
7c866c3
Add FedCurv Histology tutorial
itrushkin Feb 28, 2022
25f5f9b
Remove debugging lines
itrushkin Feb 28, 2022
a40fb05
Fix linter
itrushkin Feb 28, 2022
4a8145a
Remove CUDA device specification
itrushkin Feb 28, 2022
6944d8e
Add readme
itrushkin Feb 28, 2022
85b65a4
Revert UNet and TinyImageNet workspaces
itrushkin Feb 28, 2022
326b6a5
Add license header
itrushkin Mar 1, 2022
80ed4cb
Merge branch 'develop' into fedcurv
itrushkin Mar 1, 2022
fb245a4
Move aggregation function to aggregation_functions module
itrushkin Mar 1, 2022
e3c230d
Fix import module path
itrushkin Mar 1, 2022
e4ee0de
Merge branch 'develop' into fedcurv
itrushkin Mar 1, 2022
d4f0ce7
Fix merge
itrushkin Mar 1, 2022
25ab073
Apply suggestion from review
itrushkin Mar 1, 2022
4390624
Apply suggestion from review
itrushkin Mar 1, 2022
ca6c798
Apply suggestion to review
itrushkin Mar 1, 2022
3ffaa5b
Change os.walk to Path.glob
itrushkin Mar 1, 2022
107ad22
Parametrize director address
itrushkin Mar 1, 2022
3f7072d
Update readme
itrushkin Mar 1, 2022
0587b14
Create separate Python environment for each envoy
itrushkin Mar 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
set -e

fx envoy start -n env_one --disable-tls --envoy-config-path envoy_config.yaml -dh localhost -dp 50051
fx envoy start -n env_one --disable-tls --envoy-config-path envoy_config.yaml -dh localhost -dp 50051
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# PyTorch tutorial for FedCurv Federated Learning method on Histology dataset

To show results on non-iid data distribution, this tutorial contains shard descriptor with custom data splitter where data is split log-normally. Federation consists of 8 envoys.

Your Python environment must have OpenFL installed.

1. Run Director instance:
```
cd director
bash start_director.sh
```

2. In a separate terminal, execute:
```
cd envoy
bash populate_envoys.sh # This creates all envoys folders in current directory
bash start_envoys.sh # This launches all envoy instances
```

3. In a separate terminal, launch a Jupyter Lab:
```
cd workspace
jupyter lab
```

4. Open your browser at corresponding port and open `pytorch_histology.ipynb` from Jupyter web interface.

5. Execute all cells in order.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
settings:
listen_host: localhost
listen_port: 50051
sample_shape: ['150', '150']
target_shape: ['1']
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx director start --disable-tls -c director_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e
FQDN=$1
fx director start -c director_config.yaml -rc cert/root_ca.crt -pk cert/"${FQDN}".key -oc cert/"${FQDN}".crt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
histology_data
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
params:
cuda_devices: []

optional_plugin_components: {}

shard_descriptor:
template: histology_shard_descriptor.HistologyShardDescriptor
params:
data_folder: histology_data
rank_worldsize: 1,8
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Copyright (C) 2020-2021 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""Histology Shard Descriptor."""

import logging
import os
from pathlib import Path
from typing import Tuple
from urllib.request import urlretrieve
from zipfile import ZipFile

import numpy as np
from PIL import Image

from openfl.interface.interactive_api.shard_descriptor import ShardDataset
from openfl.interface.interactive_api.shard_descriptor import ShardDescriptor
from openfl.utilities import tqdm_report_hook
from openfl.utilities import validate_file_hash
from openfl.utilities.data_splitters.numpy import LogNormalNumPyDataSplitter


logger = logging.getLogger(__name__)


class HistologyShardDataset(ShardDataset):
"""Histology shard dataset class."""

TRAIN_SPLIT_RATIO = 0.8

def __init__(self, data_folder: Path, data_type='train', rank=1, worldsize=1):
"""Histology shard dataset class."""
self.data_type = data_type
root = Path(data_folder) / 'Kather_texture_2016_image_tiles_5000'
classes = [d.name for d in root.iterdir() if d.is_dir()]
class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)}
self.samples = []
root = root.absolute()
for target_class in sorted(class_to_idx.keys()):
class_index = class_to_idx[target_class]
target_dir = root / target_class
for path in sorted(target_dir.glob('*')):
item = path, class_index
self.samples.append(item)
np.random.seed(0)
np.random.shuffle(self.samples)
idx_range = list(range(len(self.samples)))
idx_sep = int(len(idx_range) * HistologyShardDataset.TRAIN_SPLIT_RATIO)
train_idx, test_idx = np.split(idx_range, [idx_sep])
data_splitter = LogNormalNumPyDataSplitter(
mu=0,
sigma=2,
num_classes=8,
classes_per_col=2,
min_samples_per_class=5)
if data_type == 'train':
labels = np.array(self.samples)[train_idx][:, 1].astype(int)
self.idx = data_splitter.split(labels, worldsize)[rank - 1]
else:
labels = np.array(self.samples)[test_idx][:, 1].astype(int)
self.idx = data_splitter.split(labels, worldsize)[rank - 1]

def __len__(self) -> int:
"""Return the len of the shard dataset."""
return len(self.idx)

def load_pil(self, path):
"""Load image."""
with open(path, 'rb') as f:
img = Image.open(f)
return img.convert('RGB')

def __getitem__(self, index: int) -> Tuple['Image', int]:
"""Return an item by the index."""
path, target = self.samples[self.idx[index]]
sample = self.load_pil(path)
return sample, target


class HistologyShardDescriptor(ShardDescriptor):
"""Shard descriptor class."""

URL = ('https://zenodo.org/record/53169/files/Kather_'
'texture_2016_image_tiles_5000.zip?download=1')
FILENAME = 'Kather_texture_2016_image_tiles_5000.zip'
ZIP_SHA384 = ('7d86abe1d04e68b77c055820c2a4c582a1d25d2983e38ab724e'
'ac75affce8b7cb2cbf5ba68848dcfd9d84005d87d6790')
DEFAULT_PATH = Path('.') / 'data'

def __init__(
self,
data_folder: Path = DEFAULT_PATH,
rank_worldsize: str = '1,1',
**kwargs
):
"""Initialize HistologyShardDescriptor."""
self.data_folder = Path.cwd() / data_folder
self.download_data()
self.rank, self.worldsize = tuple(int(num) for num in rank_worldsize.split(','))

def download_data(self):
"""Download prepared shard dataset."""
os.makedirs(self.data_folder, exist_ok=True)
filepath = self.data_folder / HistologyShardDescriptor.FILENAME
if not filepath.exists():
reporthook = tqdm_report_hook()
urlretrieve(HistologyShardDescriptor.URL, filepath, reporthook) # nosec
validate_file_hash(filepath, HistologyShardDescriptor.ZIP_SHA384)
with ZipFile(filepath, 'r') as f:
f.extractall(self.data_folder)

def get_dataset(self, dataset_type):
"""Return a shard dataset by type."""
return HistologyShardDataset(
data_folder=self.data_folder,
data_type=dataset_type,
rank=self.rank,
worldsize=self.worldsize
)

@property
def sample_shape(self):
"""Return the sample shape info."""
shape = self.get_dataset('train')[0][0].size
return [str(dim) for dim in shape]

@property
def target_shape(self):
"""Return the target shape info."""
target = self.get_dataset('train')[0][1]
shape = np.array([target]).shape
return [str(dim) for dim in shape]

@property
def dataset_description(self) -> str:
"""Return the shard dataset description."""
return (f'Histology dataset, shard number {self.rank}'
f' out of {self.worldsize}')
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash
DIRECTOR_HOST=${1:-'localhost'}
DIRECTOR_PORT=${2:-'50051'}
PYTHON=${3:-'python3.8'}

for i in {1..8}
do
mkdir $i
cd $i
echo "shard_descriptor:
template: histology_shard_descriptor.HistologyShardDescriptor
params:
data_folder: histology_data
rank_worldsize: $i,8
" > envoy_config.yaml

eval ${PYTHON} '-m venv venv'
echo "source venv/bin/activate
pip install ../../../../.. # install OpenFL
pip install -r requirements.txt
fx envoy start -n env_$i --disable-tls --envoy-config-path envoy_config.yaml -dh ${DIRECTOR_HOST} -dp ${DIRECTOR_PORT}
" > start_envoy.sh
cp ../requirements.txt .
cp ../histology_shard_descriptor.py .
cd ..
done
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Pillow==8.3.2
tqdm==4.48.2
numpy==1.19.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx envoy start -n env_one --disable-tls --envoy-config-path envoy_config.yaml -dh localhost -dp 50051
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
set -e
ENVOY_NAME=$1
DIRECTOR_FQDN=$2

fx envoy start -n "$ENVOY_NAME" --envoy-config-path envoy_config.yaml -dh "$DIRECTOR_FQDN" -dp 50051 -rc cert/root_ca.crt -pk cert/"$ENVOY_NAME".key -oc cert/"$ENVOY_NAME".crt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash
set -e

cd 1 && bash start_envoy.sh &
cd 2 && bash start_envoy.sh &
cd 3 && bash start_envoy.sh &
cd 4 && bash start_envoy.sh &
cd 5 && bash start_envoy.sh &
cd 6 && bash start_envoy.sh &
cd 7 && bash start_envoy.sh &
cd 8 && bash start_envoy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
logs
Loading