Skip to content

Configure backoff and retry options for credentials provider authentication refresh #59

Closed
@olileach

Description

@olileach

We are running a large number of processes on EMR. We have 10 YARN jobs, with each YARN job spawning 8 processes using a Java Futures object, and these 10 YARN jobs are running on one EC2 instance. We have several EC2 instances running within our EMR cluster of which some don't exhibit problems authenticating and some do. We are seeing intermittent authentication failures after the EMR jobs are running for a few hours, where the aws-msk-iam-auth library is trying to refresh the IAM token in order to continue processing messages from MSK in EMR. Here's the error message we receive:


ExtractionConsumer:116 - Extraction Kafka processor has failed: topic=develop_headstate_adjustment  
org.apache.kafka.common.errors.SaslAuthenticationException: An error: (java.security.PrivilegedActionException:   
javax.security.sasl.SaslException: Failed to find AWS IAM Credentials [Caused by com.amazonaws.SdkClientException: 
Unable to load AWS credentials from any provider in the chain [com.amazonaws.auth.AWSCredentialsProviderChain@523d0202: 
Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: 
Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID 
(or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), 
SystemPropertiesCredentialsProvider: 
Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), 
WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName,
[software.amazon.msk.auth.iam.internals.EnhancedProfileCredentialsProvider@1f17510]
(mailto:software.amazon.msk.auth.iam.internals.EnhancedProfileCredentialsProvider@1f17510): 
Profile file contained no credentials for profile 'default': ProfileFile(profiles=
[]),[com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@525bf518]
(mailto:com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@525bf518): null]]]) 
occurred when evaluating SASL token received from the Kafka Broker. 
Kafka Client will go to AUTHENTICATION_FAILED state.
Caused by: javax.security.sasl.SaslException: Failed to find AWS IAM Credentials

The credentials provider should be using the EC2 instance profile attached to the EC2 instance. If you follow the errors above , you can see the process matches this chain https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html, but doesn't find a credentials provider and then fails. The key is that this is an intermittent issue whereby most of the time, the auth works. However, when there are no default credentials provider found, the YARN job fails and EMR jobs fail.

I can see where the token refresh callback is:

protected void handleCallback(AWSCredentialsCallback callback) throws IOException {

It would be great to have some config that allows us configure a backoff and retry to refresh the IAM credentials to handle situations where there is potential throttling happening when querying the metadata service where there is particular high load.

Similar to the backoff for the number of connections to MSK, we would like options to configure the retries and backoff in ms (say 1000 or 2000) and retry attempts

So if the option is specified, sleep 1 or 2 seconds (or time based on the provided configuration) and retry 3 times?

Thanks in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions