-
Notifications
You must be signed in to change notification settings - Fork 315
Parallelcluster 2.1.1 with raid 0 config on Cent OS 7 fails in create cluster #823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelcluster 2.1.1 with raid 0 config on Cent OS 7 fails in create cluster #823
Comments
Thanks for the detailed bug report, problem is the attachment point
We get the device from: https://github.com/aws/aws-parallelcluster-cookbook/blob/develop/files/default/attachVolume.py#L43 It appears that line is failing to get the devices currently running on Centos7. |
Yes, /dev/sdb has that 20GB disk that wasn't expected to be there in the first place. |
The block device returned by the parallelcluster-ebsnvme-id script must be in format suitable for udev rules This fix aws/aws-parallelcluster#823 Signed-off-by: Luca Carrogu <[email protected]>
The block device returned by the parallelcluster-ebsnvme-id script must be in format suitable for udev rules E.g. - without -u flag parallelcluster-ebsnvme-id -b /dev/nvme0n1 return sda1 parallelcluster-ebsnvme-id -b /dev/nvme1n1 return /dev/sdb - with -u flag parallelcluster-ebsnvme-id -u -b /dev/nvme0n1 return sda1 parallelcluster-ebsnvme-id -u -b /dev/nvme1n1 return sdb This fix aws/aws-parallelcluster#823 Signed-off-by: Luca Carrogu <[email protected]>
Hi, This bug only effects NVME based instances, c5 and m5, so as a temporary workaround you can use a non-nvme based instance such as M4. We've patched the issue in aws/aws-parallelcluster-cookbook#253 which will be part of parallelcluster in the next release. Thanks for the bug report! |
I changed master_instance_type to m4.large but got similar failure for MasterServer:
|
Please re-create the cluster with the I was able to launch an m4 based Raid 0 cluster using parallelcluster 2.1.1 with no issues. |
I attached cfn-init.log I see two problems:
|
Hi,
|
Thanks for the clarification for item 1. For item 2, the cluster creation completes successfully for m4.large master instance type and without encrypted and ebs_kms_key_id options. I double checked that my username can access the KMS key in the console but not sure why the cfn-init.log has this error:
|
The issue is the kms key doesn't have IAM permissions to be retrieved on the master. Since it needs these IAM permissions for cluster creation (and it needs the name of the role), you need to use a custom
|
We've added a tutorial to the docs explaining how to do this in better detail: https://aws-parallelcluster.readthedocs.io/en/develop/tutorials/04_encrypted_ebs.html (I know it's confusing) |
Hit exactly the same issue when attaching two EBS volumes. I think aws/aws-parallelcluster-cookbook#253 will fix the problem. Just post the problem here for record. pcluster version: 2.1.1 Major error message:
Configuration file:
No error with only one EBS volume. No error when using |
yes, same error. Please use m4/c4's until the next release of ParallelCluster |
@sean-smith I followed the steps to create the ParallelClusterInstancePolicy, ParallelClusterInstanceRole, added the role to the key users, and can confirm that the raid disks are encrypted and attached to the master. However, the other 2 disks 15 GB master os disk and 20 GB shared disk were unencrypted. I guess this is a different issue that is not related to the raid configuration. Feel free to close this issue. |
@ahmedelz They can be encrypted with https://aws-parallelcluster.readthedocs.io/en/latest/configuration.html#encrypted You'll need an ebs section, even if you're not using ebs, to encrypt that 20 GB drive. For example:
On the docs page, if you click on |
@sean-smith ebs section helped encrypt the 20 GB drive indeed. Any method to encrypt the master OS disk? |
@ahmedelz At this moment there's no way to do so. We'll update this thread should we add functionality in the future. |
enrico-usai/aws-parallelcluster-cookbook@15e2f84
@enrico-usai I'm not sure I understand why this was closed. The check-in referenced addresses the first bug mentioned above but @sean-smith mentioned there is no way to encrypt the master OS disk and added enhancement tag. |
Hi @ahmedelz, BTW I think we can keep this issue closed since is related to the raid configuration. |
Environment:
Bug description and how to reproduce:
Deploying a ParallelCluster 2.1.1 with Raid 0 configuration fails with this error.
I thought the failure could be because I'm using encrypted EBS volumes with custom KMS key but I commented out both encrypted and ebs_kms_key_id settings but still the same failure.
Additional context:
Any other context about the problem. E.g.:
When I created the cluster with --norollback option, I can see that the master has a 20GB disk mounted and exported under /shared and also noticed that the 2 disks for the raid0 configuration are not attached to the master.
Attachments:
cfn-init.log
cloud-init.log
The text was updated successfully, but these errors were encountered: