-
Notifications
You must be signed in to change notification settings - Fork 0
Oms in cloud
OpenM++ web-service (oms) can provide basic computational resources management for your local computer or cluster of servers on local network or in cloud. It can manage model runs queue if your computational resources (CPU and memory) are limited and also can automatically start and stop cloud servers.
Examples below assuming you are familiar with basics of Oms: openM++ web-service.
If you want to have model runs queue, or using openM++ in cloud and want automatically scale up and down cloud resources,
e.g. start and stop virtual machines for model runs then start oms
with job control option:
oms -oms.JobDir job
Following directory structure expected:
./ -> oms "root" directory, by default it is current directory
html/ -> web-UI directory with HTML, js, css, images...
disk.ini -> (optional) disk usage control settings to set storage quotas
etc/ -> config files directory, contain template(s) to run models
log/ -> recommended log files directory
models/
bin/ -> default model.exe and model.sqlite directory
log/ -> default directory for models run log files
doc/ -> models documentation directory
home/ -> user personal home directory
io/download -> user directory for download files
io/upload -> user directory to upload files
job/ -> model run jobs control directory
job.ini -> job control settings
active/ -> active model run state files
history/ -> model run history files
past/ -> (optional) shadow copy of history folder, invisible to the end user
queue/ -> model run queue files
state/ -> jobs state and computational servers state files
jobs.queue.paused -> if such file exists then jobs queue is paused
jobs.queue.all.paused -> if such file exists then all jobs in all queues are paused
By default oms
assumes:
- all models are running on
localhost
- there are no limits on CPU cores or memory usage
You can create model run queue on your local computer by setting a limit on number of CPU cores available.
To do it modify job.ini
file in a job
directory, for example:
[Common]
LocalCpu = 8 ; localhost CPU cores limit, localhost limits are applied only to non-MPI jobs
LocalMemory = 0 ; gigabytes, localhost memory limit, zero means no limits
You don't have to set memory limits until model run memory requirements are known.
CPU cores which are you limiting in job.ini
does not need to be an actual cores.
You can have 8 cores on your PC and set LocalCpu = 16
which allow 200% overload and may significantly slow down your local machine.
Or if you set LocalCpu = 4
then your models would be able to use only half of actual cores.
Example of local network (LAN) cluster:
- small front-end server with 4 cores
- 4 back-end servers: cpc-1, cpc-2, cpc-3, cpc-4 with 16 cores each
[Common]
LocalCpu = 4 ; localhost CPU cores limit, localhost limits are applied only to non-MPI jobs
LocalMemory = 0 ; gigabytes, localhost memory limit, zero means no limits
MpiCpu = 40 ; max MPI cpu cores available for each oms instance, zero means oms instances can use all cpu's available
MpiMemory = 0 ; gigabytes, max MPI memory available for each oms instance, zero means oms instances can use all memory available
MpiMaxThreads = 8 ; max number of modelling threads per MPI process
MaxErrors = 10 ; errors threshold for compute server or cluster
Servers = cpc-1, cpc-2, cpc-3, cpc-4 ; computational servers or clusters
[cpc-1]
Cpu = 16 ; default: 1 CPU core
Memory = 0 ; zero means no limits
[cpc-2]
Cpu = 16 ; default: 1 CPU core
Memory = 0 ; zero means no limits
[cpc-3]
Cpu = 16 ; default: 1 CPU core
Memory = 0 ; zero means no limits
[cpc-4]
Cpu = 16 ; default: 1 CPU core
Memory = 0 ; zero means no limits
; OpenMPI hostfile (on Linux)
;
; cpm slots=1 max_slots=1
; cpc-1 slots=2
; cpc-3 slots=4
;
[hostfile]
HostFileDir = models/log
HostName = @-HOST-@
CpuCores = @-CORES-@
RootLine = cpm slots=1 max_slots=1
HostLine = @-HOST-@ slots=@-CORES-@
; MS-MPI machinefile (on Windows with Microsoft MPI)
;
; cpm:1
; cpc-1:2
; cpc-3:4
;
; [hostfile]
; HostFileDir = models\log
; HostName = @-HOST-@
; CpuCores = @-CORES-@
; RootLine = cpm:1
; HostLine = @-HOST-@:@-CORES-@
Based on job.ini
above oms will create MPI hostfile
with back-end servers assignment for each particular model run.
In order to use that hostfile
you should modify model run template(s) in openM++ etc/
directory.
For example on Linux with openMPI:
{{/*
oms web-service:
Template to run modelName_mpi executable on Linux using OpenMPI
It is not recommended to use root process for modelling
Oms web-service using template for exec.Command(exeName, Args...):
- skip empty lines
- substitute template arguments
- first non-empty line is a name of executable to run
- each other line is a command line argument for executable
Arguments of template:
ModelName string // model name
ExeStem string // base part of model exe name, usually modelName
Dir string // work directory to run the model
BinDir string // bin directory where model exe is located
MpiNp int // number of MPI processes
HostFile string // if not empty then path to hostfile
Args []string // model command line arguments
Env map[string]string // environment variables to run the model
Example of result:
mpirun --hostfile host.ini --bind-to none --oversubscribe -wdir models/bin -x key=value ./modelName_mpi -OpenM.LogToFile false
*/}}
mpirun
--bind-to
none
--oversubscribe
{{with .HostFile}}
--hostfile
{{.}}
{{end}}
{{with .Dir}}
-wdir
{{.}}
{{end}}
{{range $key, $val := .Env}}
-x
{{$key}}={{$val}}
{{end}}
{{.BinDir}}/{{.ExeStem}}_mpi
{{range .Args}}
{{.}}
{{end}}
Note: If you are using OpenMPI then it is a good idea to have --oversubscribe --bind-to none
as above in order to avoid MPI models run failure or performance degradation.
If you are using Microsoft MPI on Windows servers then modify etc\
model template file(s) to have it similar to:
{{/*
oms web-service:
Template to run modelName_mpi.exe on Windows Microsoft MPI using machinefile
To use this template rename it into:
mpi.ModelRun.template.txt
Oms web-service using template for exec.Command(exeName, Args...):
- skip empty lines
- substitute template arguments
- first non-empty line is a name of executable to run
- each other line is a command line argument for executable
Arguments of template:
ModelName string // model name
ExeStem string // base part of model exe name, usually modelName
Dir string // work directory to run the model
BinDir string // bin directory where model exe is located
DbPath string // absolute path to sqlite database file: models/bin/model.sqlite
MpiNp int // number of MPI processes
HostFile string // if not empty then path to hostfile
Args []string // model command line arguments
Env map[string]string // environment variables to run the model
Example of result:
mpiexec -machinefile hosts.ini -wdir models\bin -env key value ..\bin\modelName_mpi -OpenM.LogToFile false
*/}}
mpiexec
{{with .HostFile}}
-machinefile
{{.}}
{{end}}
{{with .Dir}}
-wdir
{{.}}
{{end}}
{{range $key, $val := .Env}}
-env
{{$key}}
{{$val}}
{{end}}
{{.BinDir}}\{{.ExeStem}}_mpi
{{range .Args}}
{{.}}
{{end}}
Use oms
jobs control abilities to organize model runs queue and, if required, automatically scale up down cloud resources, e.g.: start and stop virtual machines or nodes.
For example, if you want to have two users: Alice and Bob who are running models then start oms
as:
bin/oms -l localhost:4050 -oms.RootDir alice -oms.Name alice -ini oms.ini
bin/oms -l localhost:4060 -oms.RootDir bob -oms.Name bob -ini oms.ini
where content of oms.ini
is:
[oms]
JobDir = ../job
EtcDir = ../etc
HomeDir = models/home
AllowDownload = true
AllowUpload = true
LogRequest = true
[OpenM]
LogFilePath = log/oms.log
LogToFile = true
LogUseDailyStamp = true
LogToConsole = false
Above assume following directory structure:
./ -> current directory
bin/
oms -> oms web service executable, on Windows: `oms.exe`
dbcopy -> dbcopy utility executable, on Windows: `dbcopy.exe`
html/ -> web-UI directory with HTML, js, css, images...
disk.ini -> (optional) disk usage control settings to set storage quotas for Bob and Alice
etc/ -> config files directory, contain template(s) to run models
alice/ -> user Alice "root" directory
log/ -> recommended Alice's log files directory
models/
bin/ -> Alice's model.exe and model.sqlite directory
log/ -> Alice's directory for models run log files
doc/ -> models documentation directory
home/ -> Alice's personal home directory
io/download -> Alice's directory for download files
io/upload -> Alice's directory to upload files
bob/ -> user Bob "root" directory
log/ -> recommended Bob's log files directory
models/
bin/ -> Bob's model.exe and model.sqlite directory
log/ -> Bob's directory for models run log files
doc/ -> models documentation directory
home/ -> Bob's personal home directory
io/download -> Bob's directory for download files
io/upload -> Bob's directory to upload files
job/ -> model run jobs control directory, it must be shared between all users
job.ini -> (optional) job control settings
active/ -> active model run state files
history/ -> model run history files
past/ -> (optional) shadow copy of history folder, invisible to the end user
queue/ -> model run queue files
state/ -> jobs state and computational servers state files
jobs.queue.paused -> if such file exists then jobs queue is paused
jobs.queue.all.paused -> if such file exists then all jobs in all queues are paused
You don't have to follow that directory structure, it is flexible and can be customized through oms
run options.
IMPORTANT: Job directory must be in a SHARED location and accessible to all users who are using the same queue and the same computational resources (servers, nodes, clusters).
You don't need to create OS users, e.g. Alice and Bob does not need a login accounts on your server (cloud, Active Directory, etc.). All you need is to setup some authentication mechanism and reverse proxy which would allow Alice to access localhost:4050
and Bob localhost:4060
on your front-end. Actual OS user can have any name, e.g. oms
:
sudo -u oms OM_ROOT=/shared/alice bash -c 'source ~/.bashrc; bin/oms -l localhost:4050 -oms.RootDir alice -oms.Name alice -ini oms.ini &'
sudo -u oms OM_ROOT=/shared/bob bash -c 'source ~/.bashrc; bin/oms -l localhost:4060 -oms.RootDir bob -oms.Name bob -ini oms.ini &'
You may want to set the limits on disk space usage and enforce storage cleanup by users. It can be done through etc/disk.ini
file.
If etc/disk.ini
exists then oms
web-service will monitor and report disk usage by user(s) and may set a limit on storage space.
You can set a limit for individual user, group of users and grand total space limit on storage space used by all users.
If user exceeding disk space quotas then she/he cannot run the model or upload files to cloud, only download is available.
User can Cleanup Disk Space through UI.
Example of disk.ini
:
; Example of storage usage control settings
; "user" term below means oms instance
; "user name" is oms instance name, for example: "localhost_4040"
;
; if etc/disk.ini file exists then storage usage control is active
[Common]
; seconds, storage scan interval, if too small then default value used
;
ScanInterval = 0
; GBytes, user storage quota, default: 0 (unlimited)
;
UserLimit = 0
; GBytes, total storage quota for all users, default: 0 (unlimited)
; if non-zero then it restricts the total storage size of all users
;
AllUsersLimit = 128
; Database cleanup script:
; creates new model.sqlite database and copy model data
;
DbCleanup = etc/db-cleanup_linux.sh
; user groups can be created to simplify settings
;
Groups = Low, High, Others
[Low]
Users = localhost_4040, bob, alice
UserLimit = 2
[High]
Users = king, boss, cheif
UserLimit = 20
[king]
UserLimit = 100 ; override storage settings for oms instance "king"
; "me" is not a member of any group
;
[me]
UserLimit = 0 ; unlimited
There is a small front-end server with 4 cores and 4 back-end servers: cpc-1, cpc-2, cpc-3, cpc-4 with 16 cores each. You are using public cloud and want to pay only for actual usage of back end servers:
- server(s) must be started automatically when user (Alice or Bob) want to run the model;
- server(s) must stop after model run completed to reduce cloud cost
Scripts below are also available at our GitHub↗
[Common]
LocalCpu = 4 ; localhost CPU cores limit, localhost limits are applied only to non-MPI jobs
LocalMemory = 0 ; gigabytes, localhost memory limit, zero means no limits
MpiMaxThreads = 8 ; max number of modelling threads per MPI process
MaxErrors = 10 ; errors threshold for compute server or cluster
IdleTimeout = 900 ; seconds, idle time before stopping server or cluster
StartTimeout = 180 ; seconds, max time to start server or cluster
StopTimeout = 180 ; seconds, max time to stop server or cluster
Servers = cpc-1, cpc-2, cpc-3, cpc-4 ; computational servers or clusters
StartExe = /bin/bash ; default executable to start server, if empty then server is always ready, no startup
StopExe = /bin/bash ; default executable to stop server, if empty then server is always ready, no shutdown
ArgsBreak = -@- ; arguments delimiter in StartArgs or StopArgs line
; delimiter can NOT contain ; or # chars, which are reserved for # comments
; it can be any other delimiter of your choice, e.g.: +++
; StartArgs = ../etc/compute-start.sh ; default command line arguments to start server, server name will be appended
; StopArgs = ../etc/compute-stop.sh ; default command line arguments to stop server, server name will be appended
[cpc-1]
Cpu = 16 ; default: 1 CPU core
Memory = 0 ; zero means no limits
StartArgs = ../etc/compute-start-4.sh-@-us-zone-b-@-cpc-1
StopArgs = ../etc/compute-stop-4.sh-@-us-zone-b-@-cpc-1
[cpc-2]
Cpu = 16 ; default: 1 CPU core
Memory = 0 ; zero means no limits
StartArgs = ../etc/compute-start-4.sh-@-us-zone-c-@-cpc-2
StopArgs = ../etc/compute-stop-4.sh-@-us-zone-c-@-cpc-2
[cpc-3]
Cpu = 16 ; default: 1 CPU core
Memory = 0 ; zero means no limits
StartArgs = ../etc/compute-start-4.sh-@-us-zone-d-@-cpc-3
StopArgs = ../etc/compute-stop-4.sh-@-us-zone-d-@-cpc-3
[cpc-4]
Cpu = 16 ; default: 1 CPU core
Memory = 0 ; zero means no limits
StartArgs = ../etc/compute-start-4.sh-@-us-zone-a-@-cpc-4
StopArgs = ../etc/compute-stop-4.sh-@-us-zone-a-@-cpc-4
; OpenMPI hostfile
;
; cpm slots=1 max_slots=1
; cpc-1 slots=2
; cpc-3 slots=4
;
[hostfile]
HostFileDir = models/log
HostName = @-HOST-@
CpuCores = @-CORES-@
RootLine = cpm slots=1 max_slots=1
HostLine = @-HOST-@ slots=@-CORES-@
; MS-MPI machinefile (on Windows with Microsoft MPI)
;
; cpm:1
; cpc-1:2
; cpc-3:4
;
; [hostfile]
; HostFileDir = models\log
; HostName = @-HOST-@
; CpuCores = @-CORES-@
; RootLine = cpm:1
; HostLine = @-HOST-@:@-CORES-@
Oms is using StartExe
and StartArgs
in order to start each server. On Linux result of above job.ini
is:
/bin/bash etc/compute-start.sh cpc-1
On Windows you can use cmd
or PowerShell in order to control servers. Related part of job.ini
can look like:
StartExe = cmd ; default executable to start server, if empty then server is always ready, no startup
StopExe = cmd ; default executable to stop server, if empty then server is always ready, no shutdown
StartArgs = /C-@-etc\compute-start.bat ; default command line arguments to start server, server name will be appended
StopArgs = /C-@-etc\compute-stop.bat ; default command line arguments to stop server, server name will be appended
which result in following command to start server:
cmd /C etc\compute-start.bat cpc-1
Start and stop scripts can look like (Google cloud version):
#!/bin/bash
#
# start computational server, run as:
#
# sudo -u $USER-NAME compute-start.sh host-name
srv_zone="us-zone-b"
srv_name="$1"
if [ -z "$srv_name" ] || [ -z "$srv_zone" ] ;
then
echo "ERROR: invalid (empty) server name or zone: $srv_name $srv_zone"
exit 1
fi
gcloud compute instances start $srv_name --zone $srv_zone
status=$?
if [ $status -ne 0 ];
then
echo "ERROR $status at start of: $srv_name"
exit $status
fi
# wait until MPI is ready
for i in 1 2 3 4; do
sleep 10
echo "[$i] mpirun -n 1 -H $srv_name hostname"
mpirun -n 1 -H $srv_name hostname
status=$?
if [ $status -eq 0 ] ; then break; fi
done
if [ $status -ne 0 ];
then
echo "ERROR $status from MPI at start of: $srv_name"
exit $status
fi
echo "Start OK: $srv_name"
#!/bin/bash
#
# stop computational server, run as:
#
# sudo -u $USER-NAME compute-stop.sh host-name
# set -e
srv_zone="us-zone-b"
srv_name="$1"
if [ -z "$srv_name" ] || [ -z "$srv_zone" ] ;
then
echo "ERROR: invalid (empty) server name or zone: $srv_name $srv_zone"
exit 1
fi
for i in 1 2 3 4 5 6 7; do
gcloud compute instances stop $srv_name --zone $srv_zone
status=$?
if [ $status -eq 0 ] ; then break; fi
sleep 10
done
if [ $status -ne 0 ];
then
echo "ERROR $status at stop of: $srv_name"
exit $status
fi
echo "Stop OK: $srv_name"
There is a small front-end server with 4 cores and 2 back-end servers: dc1, dc2 with 4 cores each. You are using public cloud and want to pay only for actual usage of back end servers:
- server(s) must be started automatically when user (Alice or Bob) want to run the model;
- server(s) must stop after model run completed to reduce cloud cost
Scripts below are also available at our GitHub↗
[Common]
LocalCpu = 4 ; localhost CPU cores limit, localhost limits are applied only to non-MPI jobs
LocalMemory = 0 ; gigabytes, localhost memory limit, zero means unlimited
MpiMaxThreads = 8 ; max number of modelling threads per MPI process
MaxErrors = 10 ; errors threshold for compute server or cluster
IdleTimeout = 900 ; seconds, idle time before stopping server or cluster
StartTimeout = 90 ; seconds, max time to start server or cluster
StopTimeout = 90 ; seconds, max time to stop server or cluster
Servers = dc1, dc2 ; computational servers or clusters for MPI jobs
StartExe = /bin/bash ; default executable to start server, if empty then server is always ready, no startup
StopExe = /bin/bash ; default executable to stop server, if empty then server is always ready, no shutdown
StartArgs = ../etc/az-start.sh-@-dm_group ; default command line arguments to start server, server name will be appended
StopArgs = ../etc/az-stop.sh-@-dm_group ; default command line arguments to stop server, server name will be appended
ArgsBreak = -@- ; arguments delimiter in StartArgs or StopArgs line
; delimiter can NOT contain ; or # chars, which are reserved for # comments
; it can be any other delimiter of your choice, e.g.: +++
[dc1]
Cpu = 4 ; default: 1 CPU core
Memory = 0
[dc2]
Cpu = 4 ; default: 1 CPU core
Memory = 0
; OpenMPI hostfile
;
; dcm slots=1 max_slots=1
; dc1 slots=2
; dc2 slots=4
;
[hostfile]
HostFileDir = models/log
HostName = @-HOST-@
CpuCores = @-CORES-@
RootLine = dm slots=1 max_slots=1
HostLine = @-HOST-@ slots=@-CORES-@
Oms is using StartExe
and StartArgs
in order to start each server. On Linux result of above job.ini
is similar to:
/bin/bash etc/az-start.sh dm_group dc1
Start and stop scripts can look like (Azure cloud version):
#!/bin/bash
#
# start Azure server, run as:
#
# sudo -u $USER-NAME az-start.sh resource-group host-name
# set -e
res_group="$1"
srv_name="$2"
if [ -z "$srv_name" ] || [ -z "$res_group" ] ;
then
echo "ERROR: invalid (empty) server name or resource group: $srv_name $res_group"
exit 1
fi
# login
az login --identity
status=$?
if [ $status -ne 0 ];
then
echo "ERROR $status from az login at start of: $res_group $srv_name"
exit $status
fi
# Azure VM start
az vm start -g "$res_group" -n "$srv_name"
status=$?
if [ $status -ne 0 ];
then
echo "ERROR $status at: az vm start -g $res_group -n $srv_name"
exit $status
fi
# wait until MPI is ready
for i in 1 2 3 4 5; do
sleep 10
echo "[$i] mpirun -n 1 -H $srv_name hostname"
mpirun -n 1 -H $srv_name hostname
status=$?
if [ $status -eq 0 ] ; then break; fi
done
if [ $status -ne 0 ];
then
echo "ERROR $status from MPI at start of: $srv_name"
exit $status
fi
echo "Start OK: $srv_name"
#!/bin/bash
#
# stop Azure server, run as:
#
# sudo -u $USER-NAME az-stop.sh resource-group host-name
# set -e
res_group="$1"
srv_name="$2"
if [ -z "$srv_name" ] || [ -z "$res_group" ] ;
then
echo "ERROR: invalid (empty) server name or resource group: $srv_name $res_group"
exit 1
fi
# login
az login --identity
status=$?
if [ $status -ne 0 ];
then
echo "ERROR $status from az login at start of: $res_group $srv_name"
exit $status
fi
# Azure VM stop
for i in 1 2 3 4; do
az vm deallocate -g "$res_group" -n "$srv_name"
if [ $status -eq 0 ] ; then break; fi
sleep 10
done
if [ $status -ne 0 ];
then
echo "ERROR $status at stop of: $srv_name"
exit $status
fi
echo "Stop OK: $srv_name"
Security consideration:
In wiki I am describing the most simple but least secure configuration, for your production environment you may want to:
- use a separate web front-end server, separate
oms
control server with firewall in between - never use front-end web-server OS user as
oms
control server OS user - do not use the same OS user, like
oms
, but create a different for each of your model users, like Alice and Bob in example above.
Of course web front-end UI of your production environment must be protected by https://
with proper authentication and authorization.
All that is out of scope of our wiki, please consult your organization security guidelines for it.
Also I am not describing here how to configure web-servers, how to create reverse proxy, install SSL certificates, etc. There are a lot of great materials on those topics around, just please think about security in a first place.
Cloud examples here assume Debian or Ubuntu Linux servers setup, you can use it for RedHat Linux with minimal adjustment. OpenM++ do support Microsoft Windows clusters, but configuring it is a more complex task and out of scope for that wiki.
Our simple cluster consist of from-end web-UI server with host name dm
and multiple back-end computational servers: dc1, dc2,...
.
Front-end server OS setup
Front-end dm
server must have some web-server installed, Apache or nginx for example, static IP and DNS records for your domain.
Choose Debian-11, Ubuntu 22.04 or RedHat 9 (Rocky, AlmaLinux) as your base system and create dm
cloud virtual machine, at least 4 cores recommended.
We will create two disks on dm
: boot disk and fast SSD data disk where all users data and models are stored.
Set timezone, install openMPI and (optional) SQLite:
sudo timedatectl set-timezone America/Toronto
sudo apt-get install openmpi-bin
sudo apt-get install sqlite3
# check result:
mpirun hostname -A
Create and mount on /mirror
SSD data disk to store all users data and models:
# init new SSD, use lsblk to find which /dev it is
lsblk
sudo mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sda
sudo mkdir /mirror
sudo mount -o discard,defaults /dev/sda /mirror
# check results:
ls -la /mirror
# add new disk to fstab, mount by UUID:
sudo blkid /dev/sda
sudo nano /etc/fstab
# add your UUID mount:
UUID=98765432-d09a-4936-b85f-a61da123456789 /mirror ext4 discard,defaults 0 2
Create NFS shares:
sudo mkdir -p /mirror/home
sudo mkdir -p /mirror/data
sudo apt install nfs-kernel-server
# add shares into exports:
sudo nano /etc/exports
# export user homes and data, data can be exported read-only, rw is not required
/mirror/home *(rw,sync,no_root_squash,no_subtree_check)
/mirror/data *(rw,sync,no_root_squash,no_subtree_check)
sudo systemctl restart nfs-kernel-server
# check results:
/sbin/showmount -e dm
systemctl status nfs-kernel-server
Create 'oms' service account, login disabled. I am using 1108 as user id and group id, but it is an example only and 1108 have no special meaning:
export OMS_UID=1108
export OMS_GID=1108
sudo addgroup --gid $OMS_GID oms
sudo adduser --home /mirror/home/oms --disabled-password --gecos "" --gid $OMS_GID -u $OMS_UID oms
sudo chown -R oms:oms /mirror/data
# increase stack size for models to 65 MB = 65536
sudo -u oms nano /mirror/home/oms/.bashrc
# ~/.bashrc: executed by bash(1) for non-login shells.
# openM++
# some models require stack size:
#
ulimit -S -s 65536
#
# end of openM++
Password-less ssh for oms
service account:
sudo su -l oms
cd ~
mkdir .ssh
ssh-keygen -f .ssh/id_rsa -t rsa -N '' -C oms
# create .ssh/config with content below:
nano .ssh/config
Host *
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
LogLevel ERROR
cp -p .ssh/id_rsa.pub .ssh/authorized_keys
chmod 700 .ssh
chmod 600 .ssh/id_rsa
chmod 644 .ssh/id_rsa.pub
chmod 644 .ssh/config
chmod 644 .ssh/authorized_keys
exit # logout from 'oms' user
# check ssh for oms user, it should work without any prompts, without any Yes/No questions:
sudo -u oms ssh dm
Check openMPI under 'oms' service account:
sudo -u oms mpirun hostname
sudo -u oms mpirun -H dm hostname
Done with dm
server OS setup, reboot it and start dc1, dc2,...
creating back-end servers.
Back-end computational servers setup
I am describing it for dc1
, assuming you will create base image from it and use for all other back-end servers.
On Azure it is make sense to create virtual machine scale set instead of individual servers.
Choose Debian-11, Ubuntu 22.04 or RedHat 9 (Rocky, AlmaLinux) as your base system and create dc1
cloud virtual machine, at least 16 cores recommended.
It does not require a fast SSD, use regular small HDD because there are no model data stored in back-end, it is only OS boot disk, nothing else.
Back-end servers should not be visible from the internet, it should be visible only from front-end dm
server.
Set timezone and install openMPI::
sudo timedatectl set-timezone America/Toronto
sudo apt-get install openmpi-bin
# check result:
mpirun hostname -A
Mount NFS shares from dm
server:
sudo mkdir -p /mirror/home
sudo mkdir -p /mirror/data
sudo apt install nfs-common
/sbin/showmount -e dm
sudo mount -t nfs dm:/mirror/home /mirror/home
sudo mount -t nfs dm:/mirror/data /mirror/data
systemctl status mirror-home.mount
systemctl status mirror-data.mount
# if above OK then add nfs share mounts into fstab:
sudo nano /etc/fstab
# fstab records:
dm:/mirror/home /mirror/home nfs defaults 0 0
dm:/mirror/data /mirror/data nfs defaults 0 0
# (optional) reboot node and make sure shares are mounted:
systemctl status mirror-home.mount
systemctl status mirror-data.mount
Create 'oms' service account, login disabled.
It must have exactly the same user id and group id as oms
user on dm
, I am using 1108 as an example:
export OMS_UID=1108
export OMS_GID=1108
sudo /sbin/addgroup --gid $OMS_GID oms
sudo adduser --no-create-home --home /mirror/home/oms --disabled-password --gecos "" --gid $OMS_GID -u $OMS_UID oms
# check 'oms' sevice account access to shared files:
sudo -u oms -- ls -la /mirror/home/oms/.ssh/
Optional: if you are using Azure virtual machine scale set then cloud.init config can be:
#cloud-config
#
runcmd:
- addgroup --gid 1108 oms
- adduser --no-create-home --home /mirror/home/oms --disabled-password --gecos "" --gid 1108 -u 1108 oms
Check openMPI under 'oms' service account:
sudo -u oms mpirun hostname
sudo -u oms mpirun -H dc1 hostname
sudo -u oms mpirun -H dm hostname
Done with dc1
OS setup, clone it for all other back-end servers.
After you created all back-end servers check openMPI from entire cluster, for example:
sudo -u oms mpirun -H dm,dc1,dc2,dc3,dc4,dc5,dc6,dc7,dc8,dc9,dc10 hostname
Now login back to your dm
front-end and create standard openM++ directory structure at /mirror/data/
, copy models, create user directories as it is described for "users" Alice and Bob above.
Bob and Alice are your model users, they should not have OS login, user oms
with disabled login is used to run the models on behalf of Alice and Bob.
I would also recommend to have at least one "user" for your own tests, to verify system status and test and run the models when you publish it.
For that I am usually creating "user" test
.
/mirror/data/
bin/
oms -> oms web service executable
dbcopy -> dbcopy utility executable
html/ -> web-UI directory with HTML, js, css, images...
etc/ -> config files directory, contain template(s) to run models
disk.ini -> (optional) disk usage control settings to set storage quotas Bob and Alice
log/ -> recommended log files directory
alice/ -> user Alice "root" directory
log/ -> recommended Alice's log files directory
models/
bin/ -> Alice's model.exe and model.sqlite directory
log/ -> Alice's directory for models run log files
doc/ -> models documentation directory
home/ -> Alice's personal home directory
io/download -> Alice's directory for download files
io/upload -> Alice's directory to upload files
bob/ -> user Bob "root" directory
log/ -> recommended Bob's log files directory
models/
bin/ -> Bob's model.exe and model.sqlite directory
log/ -> Bob's directory for models run log files
doc/ -> models documentation directory
home/ -> Bob's personal home directory
io/download -> Bob's directory for download files
io/upload -> Bob's directory to upload files
job/ -> model run jobs control directory, it must be shared between all users
job.ini -> (optional) job control settings
active/ -> active model run state files
history/ -> model run history files
past/ -> (optional) shadow copy of history folder, invisible to the end user
queue/ -> model run queue files
state/ -> jobs state and computational servers state files
oms/ -> oms init.d files, see examples on our GitHub
oms.ini -> oms config, see content above
test/ -> user test "root" directory, for admin internal use
-> .... user test subdirectories here
Above there is also oms/
directory with init.d
files to restart oms
when front-end dm
server is rebooted.
You can find examples of it at our GitHub↗.
- Windows: Quick Start for Model Users
- Windows: Quick Start for Model Developers
- Linux: Quick Start for Model Users
- Linux: Quick Start for Model Developers
- MacOS: Quick Start for Model Users
- MacOS: Quick Start for Model Developers
- Model Run: How to Run the Model
- MIT License, Copyright and Contribution
- Model Code: Programming a model
- Windows: Create and Debug Models
- Linux: Create and Debug Models
- MacOS: Create and Debug Models
- MacOS: Create and Debug Models using Xcode
- Modgen: Convert case-based model to openM++
- Modgen: Convert time-based model to openM++
- Modgen: Convert Modgen models and usage of C++ in openM++ code
- Model Localization: Translation of model messages
- How To: Set Model Parameters and Get Results
- Model Run: How model finds input parameters
- Model Output Expressions
- Model Run Options and ini-file
- OpenM++ Compiler (omc) Run Options
- OpenM++ ini-file format
- UI: How to start user interface
- UI: openM++ user interface
- UI: Create new or edit scenario
- UI: Upload input scenario or parameters
- UI: Run the Model
- UI: Use ini-files or CSV parameter files
- UI: Compare model run results
- UI: Aggregate and Compare Microdata
- UI: Filter run results by value
- UI: Disk space usage and cleanup
- UI Localization: Translation of openM++
-
Highlight: hook to self-scheduling or trigger attribute
-
Highlight: The End of Start
-
Highlight: Enumeration index validity and the
index_errors
option -
Highlight: Simplified iteration of range, classification, partition
-
Highlight: Parameter, table, and attribute groups can be populated by module declarations
- Oms: openM++ web-service
- Oms: openM++ web-service API
- Oms: How to prepare model input parameters
- Oms: Cloud and model runs queue
- Use R to save output table into CSV file
- Use R to save output table into Excel
- Run model from R: simple loop in cloud
- Run RiskPaths model from R: advanced run in cloud
- Run RiskPaths model in cloud from local PC
- Run model from R and save results in CSV file
- Run model from R: simple loop over model parameter
- Run RiskPaths model from R: advanced parameters scaling
- Run model from Python: simple loop over model parameter
- Run RiskPaths model from Python: advanced parameters scaling
- Windows: Use Docker to get latest version of OpenM++
- Linux: Use Docker to get latest version of OpenM++
- RedHat 8: Use Docker to get latest version of OpenM++
- Quick Start for OpenM++ Developers
- Setup Development Environment
- 2018, June: OpenM++ HPC cluster: Test Lab
- Development Notes: Defines, UTF-8, Databases, etc.
- 2012, December: OpenM++ Design
- 2012, December: OpenM++ Model Architecture, December 2012
- 2012, December: Roadmap, Phase 1
- 2013, May: Prototype version
- 2013, September: Alpha version
- 2014, March: Project Status, Phase 1 completed
- 2016, December: Task List
- 2017, January: Design Notes. Subsample As Parameter problem. Completed
GET Model Metadata
- GET model list
- GET model list including text (description and notes)
- GET model definition metadata
- GET model metadata including text (description and notes)
- GET model metadata including text in all languages
GET Model Extras
GET Model Run results metadata
- GET list of model runs
- GET list of model runs including text (description and notes)
- GET status of model run
- GET status of model run list
- GET status of first model run
- GET status of last model run
- GET status of last completed model run
- GET model run metadata and status
- GET model run including text (description and notes)
- GET model run including text in all languages
GET Model Workset metadata: set of input parameters
- GET list of model worksets
- GET list of model worksets including text (description and notes)
- GET workset status
- GET model default workset status
- GET workset including text (description and notes)
- GET workset including text in all languages
Read Parameters, Output Tables or Microdata values
- Read parameter values from workset
- Read parameter values from workset (enum id's)
- Read parameter values from model run
- Read parameter values from model run (enum id's)
- Read output table values from model run
- Read output table values from model run (enum id's)
- Read output table calculated values from model run
- Read output table calculated values from model run (enum id's)
- Read output table values and compare model runs
- Read output table values and compare model runs (enun id's)
- Read microdata values from model run
- Read microdata values from model run (enum id's)
- Read aggregated microdata from model run
- Read aggregated microdata from model run (enum id's)
- Read microdata run comparison
- Read microdata run comparison (enum id's)
GET Parameters, Output Tables or Microdata values
- GET parameter values from workset
- GET parameter values from model run
- GET output table expression(s) from model run
- GET output table calculated expression(s) from model run
- GET output table values and compare model runs
- GET output table accumulator(s) from model run
- GET output table all accumulators from model run
- GET microdata values from model run
- GET aggregated microdata from model run
- GET microdata run comparison
GET Parameters, Output Tables or Microdata as CSV
- GET csv parameter values from workset
- GET csv parameter values from workset (enum id's)
- GET csv parameter values from model run
- GET csv parameter values from model run (enum id's)
- GET csv output table expressions from model run
- GET csv output table expressions from model run (enum id's)
- GET csv output table accumulators from model run
- GET csv output table accumulators from model run (enum id's)
- GET csv output table all accumulators from model run
- GET csv output table all accumulators from model run (enum id's)
- GET csv calculated table expressions from model run
- GET csv calculated table expressions from model run (enum id's)
- GET csv model runs comparison table expressions
- GET csv model runs comparison table expressions (enum id's)
- GET csv microdata values from model run
- GET csv microdata values from model run (enum id's)
- GET csv aggregated microdata from model run
- GET csv aggregated microdata from model run (enum id's)
- GET csv microdata run comparison
- GET csv microdata run comparison (enum id's)
GET Modeling Task metadata and task run history
- GET list of modeling tasks
- GET list of modeling tasks including text (description and notes)
- GET modeling task input worksets
- GET modeling task run history
- GET status of modeling task run
- GET status of modeling task run list
- GET status of modeling task first run
- GET status of modeling task last run
- GET status of modeling task last completed run
- GET modeling task including text (description and notes)
- GET modeling task text in all languages
Update Model Profile: set of key-value options
- PATCH create or replace profile
- DELETE profile
- POST create or replace profile option
- DELETE profile option
Update Model Workset: set of input parameters
- POST update workset read-only status
- PUT create new workset
- PUT create or replace workset
- PATCH create or merge workset
- DELETE workset
- POST delete multiple worksets
- DELETE parameter from workset
- PATCH update workset parameter values
- PATCH update workset parameter values (enum id's)
- PATCH update workset parameter(s) value notes
- PUT copy parameter from model run into workset
- PATCH merge parameter from model run into workset
- PUT copy parameter from workset to another
- PATCH merge parameter from workset to another
Update Model Runs
- PATCH update model run text (description and notes)
- DELETE model run
- POST delete model runs
- PATCH update run parameter(s) value notes
Update Modeling Tasks
Run Models: run models and monitor progress
Download model, model run results or input parameters
- GET download log file
- GET model download log files
- GET all download log files
- GET download files tree
- POST initiate entire model download
- POST initiate model run download
- POST initiate model workset download
- DELETE download files
- DELETE all download files
Upload model runs or worksets (input scenarios)
- GET upload log file
- GET all upload log files for the model
- GET all upload log files
- GET upload files tree
- POST initiate model run upload
- POST initiate workset upload
- DELETE upload files
- DELETE all upload files
Download and upload user files
- GET user files tree
- POST upload to user files
- PUT create user files folder
- DELETE file or folder from user files
- DELETE all user files
User: manage user settings
Model run jobs and service state
- GET service configuration
- GET job service state
- GET disk usage state
- POST refresh disk space usage info
- GET state of active model run job
- GET state of model run job from queue
- GET state of model run job from history
- PUT model run job into other queue position
- DELETE state of model run job from history
Administrative: manage web-service state
- POST a request to refresh models catalog
- POST a request to close models catalog
- POST a request to close model database
- POST a request to delete the model
- POST a request to open database file
- POST a request to cleanup database file
- GET the list of database cleanup log(s)
- GET database cleanup log file(s)
- POST a request to pause model run queue
- POST a request to pause all model runs queue
- PUT a request to shutdown web-service