-
Notifications
You must be signed in to change notification settings - Fork 228
Gramine + OpenFL #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gramine + OpenFL #339
Conversation
In readme, step number 4 should be fx workspace graminize and not dockerize. |
fixed |
``` | ||
export KEY_LOCATION=. | ||
|
||
openssl genrsa -3 -out $KEY_LOCATION/key.pem 3072 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This week the gramine project added a function to generate the signing key with the cryptography package: gramineproject/gramine@3def085
We should reuse/copy this functionality to avoid adding OpenSSL as another dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just checked and this command is not in the package available via apt yet
# there is an issue for libprotobuf-c in gramine repo, install from apt for now | ||
|
||
# graminelibos is under this dir | ||
ENV PYTHONPATH=/usr/local/lib/python3.8/site-packages/:/usr/lib/python3/dist-packages/: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path might be different for different users. Maybe we could add a note for users to know that we are using this as PYTHONPATH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path is actually fixed as it Is inside the container (or image 🤷♂️)
Approved |
OpenFL + Gramine
This manual will help you run OpenFL with Aggregator-based workflow inside SGX enclave with Gramine.
Prerequisites
Building machine:
Machines that will run an Aggregator and Collaborator containers should have the following:
/dev/sgx_enclave
/var/run/aesmd/aesm.socket
)This is a short list, see more in Gramine docs.
Workflow
The user will mainly interact with OpenFL CLI, docker CLI, and other command-line tools. But the user is also expected to modify plan.yaml file and Python code under workspace/src folder to set up an FL Experiment.
On the building machine (Data Scientist's node):
Modify the code and the plan.yaml, set up your training procedure.
Pay attention to the following:
Default workspaces (templates) in OpenFL differ in their data downloading procedures. Workspaces with data loading flow that do not require changes to run with Gramine include:
Find out the FQDN of the aggregator machine and use it for plan initialization.
For example, on Unix-like OS try the following command:
(In case this FQDN does not work for your federation, try putting the machine IP instead)
Then pass the result as
AGG_FQDN
parameter to:It will be used to calculate hashes of trusted files. If you plan to test the application without SGX (gramine-direct) you also do not need a signer key.
This key will not be packed into the final Docker image.
This command will build and save a Docker image with your Experiment. The saved image will contain all the required files to start a process in an enclave.
If a signing key is not provided, the application will be built without SGX support, but it still can be started with gramine-direct executable.
Image distribution:
Data scientist (builder) now must transfer the Docker image to the aggregator and collaborator machines. The Aggregator will also need initial model weights.
If there is a connection between machines, you may use
scp
. In other cases use the transfer channel that suits your situation.Send files to the aggregator machine:
Send the image archive to collaborator machines:
Please, keep in mind, if you run a test Federation, with data downloaded from the internet, you should also transfer/download data to collaborator machines.
On the running machines (Aggregator and Collaborator nodes):
Execute the following command on all running machines:
Certificates exchange is a big separate topic. To run an experiment following OpenFL Aggregator-based workflow, a user must follow the established procedure, please refer to the docs.
Following the above-mentioned procedure, running machines will acquire certificates. Moreover, as the result of this procedure, the aggregator machine will also obtain a
cols.yaml
file (required to start an experiment) with registered collaborators' names, and the collaborator machines will obtaindata.yaml
files.We recommend replicating the OpenFL workspace folder structure on all the machines and following the usual certifying procedure. Finally, on the aggregator node you should have the following folder structure:
On collaborator nodes:
To speed up the certification process for one-node test runs, it makes sense to utilize the OpenFL integration test script [make this a link after merge] openfl/tests/github/test_graminize.sh, that will create required folders and certify an aggregator and two collaborators.
Run the Federation in enclaves
On the Aggregator machine run:
On the Collaborator machines run:
No SGX run (
gramine-direct
):The user may run an experiment under gramine without SGX. Note how we do not mount
sgx_enclave
device and pass a--security-opt
instead that allows syscalls required bygramine-direct
On the Aggregator machine run:
On the Collaborator machines run:
The Routine
Gramine+OpenFL PR brings in
openfl-gramine
folder, that contains the following files:There is a files access peculiarity that should be kept in mind during debugging and development.
Both Dockerfiles are read from the bare-metal OpenFL installation, i.e. from an OpenFL package on a building machine.
While the gramine manifest template and the Makefile are read in image build time from the local (in-image) OpenFL package.
Thus, if one wants to make changes to the gramine manifest template or the Makefile, they should change the OpenFL installation procedure in Dockerfile.gramine, so their changes may be pulled to the base image. One option is to push the changes to a GitHub fork and install OpenFL from this fork.
In this case, to rebuild the image, use
fx workspace dockerize --rebuild
with--rebuild
flag that will pass '--no-cache' to docker build command.Another option is to copy OpenFL source files from an on-disk cloned repo, but it would mean that the user must build the graminized image from the repo directory using Docker CLI.
Known issues:
During cert sign request generation cols.yaml on collaborators remain empty, data.yaml is extended if needed. On aggregator, cols.yaml are updated during signing procedure, data.yaml remains unmodified
error: Disallowing access to file '/usr/local/lib/python3.8/__pycache__/signal.cpython-38.pyc.3423950304'; file is not protected, trusted or allowed.
TO-DO:
fx workspace create --prefix WORKSPACE_NAME
command without --template option to the OpenFL CLI, which will create just an empty workspace with the right folder structure.fx *actor* start --from image