-
Notifications
You must be signed in to change notification settings - Fork 26
mOS for HPC v0.4 Administrator's Guide
See the mOS for HPC v0.4 Release Notes for information about platform requirements.
The mOS for HPC source can be checked out from GitHub at https://github.com/01org/mOS.
[admin@head-1~]$ git clone https://github.com/01org/mOS.git
Cloning into 'mOS'...
remote: Counting objects: 5049465, done.
remote: Total 5049465 (delta 0), reused 0 (delta 0), pack-reused 5049465
Receiving objects: 100% (5049465/5049465), 1.07 GiB | 1.25 MiB/s, done.
Resolving deltas: 100% (4155499/4155499), done.
Checking out files: 100% (56357/56357), done.
[admin@head-1 ~]$ cd mOS
[admin@head-1 mOS]$ git checkout 4.9.5_0.4.mos
Branch 4.9.5_0.4.mos set up to track remote branch 4.9.5_0.4.mos from origin.
Switched to a new branch '4.9.5_0.4.mos'
[admin@head-1 mOS]$
It is recommended that you build kernel RPMs for installation of mOS for HPC. This is best done on a head node with plenty of memory and CPU resources. The minimum build system requirements can be found at https://www.kernel.org/doc/html/latest/process/changes.html . Please run the following commands from the directory where you checked out mOS for HPC:
[admin@head-1 mOS]$ cp .config.mos .config
[admin@head-1 mOS]$ make olddefconfig
HOSTCC scripts/basic/fixdep
HOSTCC scripts/kconfig/conf.o
SHIPPED scripts/kconfig/zconf.tab.c
SHIPPED scripts/kconfig/zconf.lex.c
SHIPPED scripts/kconfig/zconf.hash.c
HOSTCC scripts/kconfig/zconf.tab.o
HOSTLD scripts/kconfig/conf
scripts/kconfig/conf --olddefconfig Kconfig
#
# configuration written to .config
#
[admin@head-1 mOS]$ make -j 32 binrpm-pkg
scripts/kconfig/conf --silentoldconfig Kconfig
CHK include/config/kernel.release
UPD include/config/kernel.release
make KBUILD_SRC=
SYSTBL arch/x86/entry/syscalls/../../include/generated/asm/syscalls_32.h
SYSHDR arch/x86/entry/syscalls/../../include/generated/asm/unistd_32_ia32.h
SYSHDR arch/x86/entry/syscalls/../../include/generated/asm/unistd_64_x32.h
SYSTBL arch/x86/entry/syscalls/../../include/generated/asm/syscalls_64.h
SYSHDR arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_32.h
SYSHDR arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_64.h
SYSHDR arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_x32.h
HOSTCC scripts/basic/bin2c
CHK include/config/kernel.release
WRAP arch/x86/include/generated/asm/clkdev.h.
.
.
+ cp System.map /nfshome/admin/rpmbuild/BUILDROOT/kernel-4.9.5_0.4.mos_gac28c00-2.x86_64/boot/System.map-4.9.5_0.4.mos-gac28c00
+ cp .config /nfshome/admin/rpmbuild/BUILDROOT/kernel-4.9.5_0.4.mos_gac28c00-2.x86_64/boot/config-4.9.5_0.4.mos-gac28c00
+ bzip2 -9 --keep vmlinux
+ mv vmlinux.bz2 /nfshome/admin/rpmbuild/BUILDROOT/kernel-4.9.5_0.4.mos_gac28c00-2.x86_64/boot/vmlinux-4.9.5_0.4.mos-gac28c00.bz2
+ /usr/lib/rpm/brp-compress
Processing files: kernel-4.9.5_0.4.mos_gac28c00-2.x86_64
Provides: kernel = 4.9.5_0.4.mos_gac28c00-2 kernel(x86-64) = 4.9.5_0.4.mos_gac28c00-2 kernel-4.9.5_0.4.mos-gac28c00
Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires(postun): /bin/sh
Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.14)(64bit) libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3)(64bit) libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.6)(64bit) libc.so.6(GLIBC_2.7)(64bit) libm.so.6()(64bit) libm.so.6(GLIBC_2.2.5)(64bit) rtld(GNU_HASH)
Processing files: kernel-headers-4.9.5_0.4.mos_gac28c00-2.x86_64
Provides: kernel-headers = 4.9.5_0.4.mos_gac28c00 kernel-headers = 4.9.5_0.4.mos_gac28c00-2 kernel-headers(x86-64) = 4.9.5_0.4.mos_gac28c00-2
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Obsoletes: kernel-headers
Processing files: kernel-mOS-4.9.5_0.4.mos_gac28c00-2.x86_64
Provides: kernel-mOS = 4.9.5_0.4.mos_gac28c00-2 kernel-mOS(x86-64) = 4.9.5_0.4.mos_gac28c00-2
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1
Checking for unpackaged file(s): /usr/lib/rpm/check-files /nfshome/admin/rpmbuild/BUILDROOT/kernel-4.9.5_0.4.mos_gac28c00-2.x86_64
Wrote: /nfshome/admin/rpmbuild/RPMS/x86_64/kernel-4.9.5_0.4.mos_gac28c00-2.x86_64.rpm
Wrote: /nfshome/admin/rpmbuild/RPMS/x86_64/kernel-headers-4.9.5_0.4.mos_gac28c00-2.x86_64.rpm
Wrote: /nfshome/admin/rpmbuild/RPMS/x86_64/kernel-mOS-4.9.5_0.4.mos_gac28c00-2.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.TuY4ZP
+ umask 022
+ cd .
+ rm -rf /nfshome/admin/rpmbuild/BUILDROOT/kernel-4.9.5_0.4.mos_gac28c00-2.x86_64
+ exit 0
rm binkernel.spec
[admin@head-1 mOS]$
The following installation instructions are for an Intel(R) Xeon Phi(TM) processor based compute node (named knl in this example) with a CentOS distribution. Red Hat distributions will be very similar. Other distributions may have different methods of installation. Please consult the documentation for your distribution to determine appropriate procedures. You will need root authority to install the kernel.
At a minimum install the kernel-4.9.5_0.4.mos_gac28c00-2.x86_64.rpm and kernel-mOS-4.9.5_0.4.mos_gac28c00-2.x86_64.rpm RPMs into your compute node image. The exact RPM names may vary depending on the state of the code, whether a local version name is specified (in something like make menuconfig), and how many times you've built mOS for HPC. However, the 4.9.5_0.4.mos part of the name should remain constant.
[admin@knl ~]$ sudo rpm -ivh /nfshome/admin/rpmbuild/RPMS/x86_64/kernel-4.9.5_0.4.mos_gac28c00-2.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:kernel-4.9.5_0.4.mos_gac28c00-2 ################################# [100%]
[admin@knl ~]$ sudo rpm -ivh /nfshome/admin/rpmbuild/RPMS/x86_64/kernel-mOS-4.9.5_0.4.mos_gac28c00-2.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:kernel-mOS-4.9.5_0.4.mos_gac28c00################################# [100%]
[admin@knl ~]$
After RPM installation the kernel needs to be added to the grub menu. The kernel parameters needed for mOS for HPC are taken from /etc/defaults/grub via the GRUB_CMDLINE_LINUX variable in that file. Please update or replace the GRUB_CMDLINE_LINUX variable in that file as follows:
GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8 selinux=0 rd.lvm.lv=centos/root rd.lvm.lv=centos/swap intel_idle.max_cstate=1 intel_pstate=disable nmi_watchdog=0 mce=ignore_ce tsc=reliable transparent_hugepage=never isolcpus=1,19,69,87,137,155,205,223 lwkcpus=1.52-67,256-271:69.120-135,188-203:137.2-17,206-221:205.70-85,138-153:19.20-35,224-239:87.88-103,156-171:155.36-51,240-255:223.104-119,172-187 lwkmem=0:16G,1:16G,2:16G,3:16G,4:3990M,5:3990M,6:3990M,7:3990M lwkmem_debug=0"
Note: The above GRUB_CMDLINE_LINUX setting is for a 68 core Xeon Phi processor in SNC-4 cluster mode, Flat memory mode. Please see the Uncore Configuration menu in the BIOS setup. The minimum parameters required for mOS for HPC are the parameters starting with isolcpus.
The last step is to update the grub configuration using the grub2-mkconfig command. Please ensure that appropriate rd.lvm.lv settings are specified for your system. You will need root authority to update grub. The grub configuration file is grub.cfg. The location of this file varies. The example below shows a system where it is located in /boot/efi/EFI/centos/grub.cfg. Other systems might have it in /boot/grub2/grub.cfg. You may want to save a backup copy of your grub.cfg file.
[admin@knl ~]$ sudo grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
Note: This command will add the kernel parameters in GRUB_CMD_LINUX to every entry in the grub menu. You should preserve and restore the existing kernel entries in grub.cfg after running grub2-mkconfig.
(Caution: not all combinations and variations of boot parameters have been validated and tested.
It is possible for the OS to not boot if for example lwkcpus and lwkmem are not properly set for your system)
Name | Recommended Value | Description |
---|---|---|
tsc | reliable | This parameter marks the tsc as reliable. If not explicitly set, the kernel will continually be checking that the TSC is stabilized and in sync across the cpus, setting a 0.5 second timer on a CPU, validating, and then setting the timer on the next CPU in the mask of available CPUs. This flow continues as the system runs, introducing unwanted noise. |
mce | ignore_ce |
If this value is not set, at 5 minute intervals, machine check code wakes up and polls all of the machine check register status banks and potentially logs correctable machine check info that was stored into these banks by the hardware. An alternate approach to controlling this and other related machine check controls is to write to files in the /sys/devices/system/machinecheck/ directory. The mOS kernel will need provide support to read and log correctable machine check info stored in these banks in a way to not disrupt an HPC application every 5 minutes. The current disabling the correctable error handling is not alone an acceptable long term solution. |
intel_idle.max_cstate | 1 | Do not allow idle CPUs to drop below cstate 1. The allows the CPUs to be quickly awakened from halt |
nmi_watchdog | 0 | Disable the nmi watchdog interrupt from occurring in order to eliminate this additional source of noise on the CPUs. An alternative method of turning off the watchdog is writing a zero to the system file /proc/sys/kernel/nmi_watchdog. This approach would eliminate the need to set it here. |
intel_pstate | disable | Do not allow the system to dynamically adjust the frequency of the CPUs. When running HPC applications, we want a stable, consistent CPU frequency across the entire job. |
transparent_hugepage | never | |
isolcpus | (see description) | Isolate these CPUs from normal Linux scheduling activities. The value provided should be the list of Linux CPUs that are designated as the CPUs to execute the migrated syscalls and the Linux CPUs used to host utility threads. The list of LWK CPUs need not be specified here since these CPUs are implicitly included in the isolcpus mask. |
lwkcpus
|
topology dependent |
CPUs to be controlled by mOS. This includes the CPUs that will be exclusively owned by mOS (implicitly marked as 'isolated') and also Linux CPUs that will be used by mOS to host utility threads and to execute migrated system calls. The format of the entries is of the form: For example: lwkcpus=28.1-13,29-41:42.15-27,43-55 In this configuration, there are two Linux CPUs, 28 and 42, designated to handle syscalls. CPU 28 will host syscalls for LWK CPUs 1-13 and 29-41. CPU 42 will host syscalls for LWK CPUs 15-27 and 43-55. |
lwkmem |
topology dependent |
Designate memory for use by mOS. The amount of memory requested is specified in parse_mem format (K.M,G). Or designate memory via NUMA domain. Example: lwkmem=126G Example: lwkmem=0:58G,1:58G |
Expectation is that the following config options are used in the mOS for HPC build:
- CONFIG_NO_HZ_FULL=y
- CONFIG_RCU_NOCB_CPU=y
- CONFIG_NO_HZ_FULL_ALL=y
- CONFIG_RCU_NOCB_CPU_ALL=y
If these options are not used, additional kernel boot parameters may need to be specified.
If mOS for HPC has been properly installed and configured the grub boot menu should have an entry for mOS. Please select the 4.9.5_0.4.mos entry during boot.
CentOS Linux (4.9.5_0.4.mos-gac28c00) 7 (Core)
CentOS Linux (3.10.0-327.36.3.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-327.el7.x86_64) 7 (Core)Use the ^ and v keys to change the selection.
Press 'e' to edit the selected item, or 'c' for a command prompt.
To test that yod is functional, launch a simple application using yod:
$ yod /bin/echo hello
hello
If mOS for HPC LWK memory is active, you should see some LWK entries in the mapping:
$ yod cat /proc/self/maps | grep LWK
0060b000-0060d000 rw-p 00000000 00:00 0 LWK
00800000-00a00000 rw-p 00000000 00:00 0 [heap] LWK
2aaaaaaab000-2aaaaaaaf000 rw-p 00000000 00:00 0 LWK
Check the dmesg log for mOS entries:
$ sudo dmesg | grep mOS
[ 0.000000] mOS: lwkcpus_mask: "2-17,20-67,70-85,88-135,138-153,156-203,206-221,224-271"
[ 0.000000] mOS-mem: There are 8 on-line NUMA domains.
[ 0.000000] mOS-mem: Designated 82309021696 bytes of LWK memory.
[ 0.694013] mOS: CPUs 0-1,18-19,68-69,86-87,136-137,154-155,204-205,222-223,272-279 will not move syscalls
[ 0.704880] mOS: CPUs 2-17,206-221 will move syscalls onto CPUs 137
[ 0.711914] mOS: CPUs 20-35,224-239 will move syscalls onto CPUs 19
[ 0.718960] mOS: CPUs 36-51,240-255 will move syscalls onto CPUs 155
[ 0.726084] mOS: CPUs 52-67,256-271 will move syscalls onto CPUs 1
[ 0.733026] mOS: CPUs 70-85,138-153 will move syscalls onto CPUs 205
[ 0.740145] mOS: CPUs 88-103,156-171 will move syscalls onto CPUs 87
[ 0.747267] mOS: CPUs 104-119,172-187 will move syscalls onto CPUs 223
[ 0.754572] mOS: CPUs 120-135,188-203 will move syscalls onto CPUs 69
[ 0.761781] mOS: These CPUs are isolated: 1-17,19-67,69-85,87-135,137-153,155-203,205-221,223-271
[ 41.863294] mOS: set unbound workqueue to 0-1,18-19,68-69,86-87,136-137,154-155,204-205,222-223,272-279 rc=0
[ 41.918227] mOS: Assigned LWK CPUs: 2-17,20-67,70-85,88-135,138-153,156-203,206-221,224-271
Check to validate that yod is using all the specified LWK CPUs:
$ [
$(yod cat /sys/kernel/mOS/lwkcpus_reserved) == $ (cat /sys/kernel/mOS/lwkcpus) ] && echo "mOS for HPC is operational" || echo "mOS for HPC not operational"
mOS for HPC is operational
When mOS is booted and managing resources, an obvious question is what common system tools tell you about the machine state. Here is some information.
Command / Tool | |
---|---|
top, htop | Behaves as expected showing CPU utilization, process placement across CPUs |
/proc/meminfo free |
Shows memory allocated to Linux only (mOS takes the designated system memory away from Linux during boot) The free command displays Linux memory information only (see /proc/meminfo) |
dmesg | mOS kernel will write information to the syslog, a good place to check for operational health |
debugging and profiling tools | mOS maintains compatibility with Linux so that tools such as ptrace, strace, and gdb continue to work as expected. In addition, Intel Parallel Studio XE tools such as Intel(R) Vtune(TM) Amplifier and Intel(R) Advisor also work as designed |