Description
Key information
- RFC PR: feat: add parameter utility #96
- Related issue(s), if known:
- Area: Utilities
- Meet tenets: Yes
Summary
Add a utility to facilitate retrieval of parameters from the SSM Parameter store or Secrets Manager. This would have support for caching parameter values, preventing retrieving the value at every execution.
Motivation
Many Lambda users use either the Parameter Store or Secrets Manager to store things such as feature flags, third-party API keys, etc. In a serverless context, a simple approach is to either fetch at every invocation (which might be costly/run into limits) or fetch at initialisation (meaning no control over expiration and refresh). This would simplify that experience for customers.
Proposal
This utility should provide a very simple to use interface for customers that don't want to deep-dive into how this utility works. To prevent hitting the throughput limit for the Parameter Store, this should have a default cache value in the single digit seconds (e.g. 5).
Basic usage
# For SSM Parameter
from aws_lambda_powertools.utilities import get_parameter
# For Secrets Manager
from aws_lambda_powertools.utilities import get_secret
def handler(event, context):
param = get_parameter("my-parameter")
secret = get_secret("my-secret")
Changing the default cache duration
from aws_lambda_powertools.utilities import get_parameter
def handler(event, context):
# Only refresh after 300 seconds
param = get_parameter("my-parameter", max_age=300)
Convert from specific format
from aws_lambda_powertools.utilities import get_parameter
def handler(event, context):
# Transform into a dict from a json string
param = get_parameter("my-parameter", format="json")
# Transform into bytes from base64
param = get_parameter("my-parameter", format="binary")
Retrieve multiple parameters from a path
from aws_lambda_powertools.utilities import get_parameters
def handler(event, context):
params = get_parameters("/param/path")
# Access the item using a param.name notation
print(params.Subparam)
# Other modifications are supported
params = get_parameters("/param/path", format="json")
print(params.Subparam["key"])
# Supports recursive fetching
params = get_parameters("/param/path", recursive=true)
print(params.Subparam.Subsubparam)
Drawbacks
- This would add a dependency on boto3. Many functions probably use it in some form, but the Powertools don't require it directly at the moment. However, this is already pulled through the X-Ray SDK.
- Many problems around parameters can be solved using environment variables, thus the usefulness is limited to cases where value could change with a short notice.
Rationale and alternatives
- What other designs have been considered? Why not them? Replicating ssm-cache-python feature set, however this might be too feature-rich for this use-case.
- What is the impact of not doing this? Users who want to retrieve dynamic parameters will have to think about the expiration logic if they don't want to risk getting throttles at scale.
Unresolved questions
Optional, stash area for topics that need further development e.g. TBD
Metadata
Metadata
Assignees
Type
Projects
Status
Activity
heitorlessa commentedon Jul 27, 2020
nmoutschen commentedon Jul 28, 2020
I lack data here to know how frequently this is used.
In the context of a API key parameter or feature flags, this could be done through a single parameter. I'm also thinking that only supporting a single parameter would encourage by design to use a single parameter because of the Parameter Store API limits.
The default throughput limit is 40 RPS, which means a maximum concurrency of 200 assuming a timeout of 5. Users could encounter unexpected behaviour or higher API calls because of how GetParameterByPath works: If the service reaches an internal limit while processing the results, it stops the operation and returns the matching values up to that point and a NextToken.
How would developers access the parameter value within the function? I'm not sure how to design this so it looks convenient from a developer perspective.
heitorlessa commentedon Jul 28, 2020
nmoutschen commentedon Jul 28, 2020
I agree on the shared state, there's potential usefulness there beyond the parameter store.
heitorlessa commentedon Jul 28, 2020
cc @bahrmichael @Nr18 @keithrozario @michaelbrewer - Would love to get your inputs on this as we scope this new utility
heitorlessa commentedon Jul 28, 2020
As we will depend on Boto3 and possibly other libraries, as we grow our utilities for Powertools.... it's worth thinking whether customers would be fine in having another PyPi package just for utilities
e.g.
UPDATE: An alternative could be bringing a few Kilobytes into Powertools with new utilities, but in order to use some of them there would be a conscious explicit decision of bringing extra dependencies like -
pip install aws-lambda-powertools[boto]
UPDATE 2: Well, X-Ray SDK brings botocore (47M) in anyway, so if we were to suggest
boto3
as an extra dependency for certain utilities that will add 1M extra which is negligible in comparison. Creating a separate lib to account for the extra 1M doesn't worth the operational cost as a whole - Docs, Changelog, another package, etc.This means we can do:
Nr18 commentedon Jul 29, 2020
@heitorlessa in the project that I am currently working on we are investigating if we could use the
aws_lambda_powertools
and seeing this makes me happy because we wrote a small method that looks like theget_secret
utility. So let me give my 2 cents of the choices we made on that as input for this.We have the following:
It's not that fancy I believe it's largely the code that secrets manager supplies as a sample, but the reason for us to put it in a separate file and method was so that we could use it like this:
Previously we used
client.get_secret_value(SecretId=secret_name)
but that requires you to use moto (which is slow) or the stubber (makes your tests really complex especially when you have multiple clients for different AWS services)We only implemented the
SecretString
value in in our use-case it will always be a JSON payload but that does not have to be so you might want to consider doing something like:Because you typically will have 1 lambda function doing one thing I don't necessarily see the need to be able to fetch multiple secrets but there could be a use-case a lambda is invoked and needs to get credentials to read something from an API and then it needs a different set of credentials to write to another API. (Think this is typically needed when you use a scheduled event rule to keep something in sync for example)
For the parameter store, however, I do see the need to fetch all values recursively. Let say you would have a parameter story structure as following:
So when you run
get_parameters("/Services", recursive=True)
what would you expect back? A list of dicts for each ApplicationId?From a usage perspective it would be nice to be able to do something like:
nmoutschen commentedon Jul 29, 2020
Hey @Nr18 ! Thanks a lot for the (very detailed) feedback here. 😄
On the return type, I have questions if someone is doing something like this:
In this case, should we cache the output from
json.loads()
or should we re-compute it each time? Since we're already caching and people will call this over and over, there might be some benefits to cache the results. We could also add an argument to get_parameter to specify the type, e.g.:Agree on the
get_parameters
. However, with the properties (params.Name
), how would it look like for nested values? Here's how I'm thinking about it:I quite like using
params.Name
instead ofparams["Name"]
, as we could use theformat="json"
from before withget_parameters
too.nmoutschen commentedon Jul 29, 2020
Another question that popped in the PR (ping @jplock): should we decrypt data or not? I'm tempted to say yes from an ease of use, but I'm worried about the security implication to have something that will decrypt data without action.
E.g. developers could forget that this is sensitive information and accidentally leak it because they didn't explicitly decrypt it themselves.
keithrozario commentedon Jul 29, 2020
I did something very similar to this, a few months back:
https://github.com/keithrozario/lambda-cache. It looks something like this:
I used a decorator with parameters to inject the parameter/secret into the context (because unlike event, context is the same regardless of what triggers the function). But, I'm not sure if it's 100% the best way to do this. It does allow us to do interesting things though -- like decorator stacking, and even multiple parameter caching.
My thoughts:
Nr18 commentedon Jul 29, 2020
So not sure if it should be part of this or that it's more a project-wide discussion since python is moving to a more type hinted language every release it would help to provide typing that would help the developer both in the actual functions but maybe more importantly when writing unit tests.
If a decorator changes the context object it would be great if you IDE helps you figure out what it does (No expert on this area) instead of having to read through samples that are potentially outdated if they already exist.
Love the caching idea of a secret/parameter btw it saves a few calls and would increase invocation times.
nmoutschen commentedon Jul 29, 2020
I'm not a huge fan of using the Lambda context for this. This has a few issues:
On making it a generic caching system out of this, that could be a good idea! That'd drastically increase the scope and potentially help way more use cases. I just have a few concerns on expanding the scope too much. People could use DynamoDB or a relational DB to store parameters and want to retrieve them. However, when thinking about S3, some people might want to pull 100s of MBs and put that into /tmp, which I feel is out of scope for this.
We could make a generic parameter retrieval system, accepting parameter store providers. This way, people could make their own parameter store providers if they want. Then we could provide ones for common cases (e.g. SSM Parameter Store, Secrets Manager, DynamoDB, etc.). I'd keep it to things that we can pull from boto3, though, to not add dependencies on Postgres/MySQL/etc. libraries.
By the way, on boto3, we are already pulling botocore indirectly through the X-Ray SDK. boto3 doesn't add much there compared to botocore so I think it's fine to have a direct dependency.
Nr18 commentedon Jul 30, 2020
@nmoutschen I thought about it and realized that we typically use SQS in front of the Lambda function and we have some plumming (which might be a good candidate to be done by the powertools 💪, so if you agree I can try to write an RFC for that) and it looks something like this:
So the reason why we build this is that we had a lot of duplicate code of handling the messages and making sure it gets deleted and that makes the tests of the function somewhat complex and harder to maintain. With ☝️ you typically test that if the handler is called are the records passed to the
sqs.process_messages
method and you only need to test thesqs_callback
method which contains your actual business logic.That solution would not work if the secrets would come in the
context
and since you might get 10 messages you also need caching at least in the same invocation but when you get 10 messages at the same time from a queue you probably have a lot of invocations so cache in the function warm start would make sense.I like the idea of the generic parameter retrieval system as long as the parameter store and secrets manager are included so you don't need to write your own.
Regarding the boto3 being included I have not seen a clear use case where I do not include it, so the powertools might not use it but the business logic in the function typically is so I am definitely fine with pulling it in by default especially when you consider: botocore (47M) vs boto3 (1M). (We always include boto3 in a lambda layer anyway)
nmoutschen commentedon Jul 30, 2020
@Nr18 By the way, for SQS, there's already an RFC open but your input would be much appreciated there! 😄 #92
nmoutschen commentedon Jul 30, 2020
Following the discussion here, I've done a few thing in the PR:
aws_lambda_powertools.utilities.BaseProvider
) and classes for common use-cases (Parameter Store, Secrets Manager, DynamoDB).get_parameter
andget_parameters
for the Parameter Store, andget_secret
for Secrets Manager) for the common cases.The implementation for a specific provider is fairly straightforward, and much of the caching/transformation logic is handled by the BaseProvider. For example, for the SSM Parameter store (
SSMProvider().get()
):alexcasalboni commentedon Aug 18, 2020
@nmoutschen great discussion and the PR looks great!
Have you considered adding an "automatic retry" decorator as well? I implemented that here: https://github.com/alexcasalboni/ssm-cache-python/blob/master/ssm_cache/cache.py#L143
The idea was to simplify the invalidation of parameters that might change at run-time without forcing a short cache TTL, based on a specific exception/error. So your cache is valid forever, until you get the expected error.
Something like this this:
I know that for this specific use case we'd rather recommend using AWS Secrets Manager to handle db host/password & automatic rotation, but there are other cases where you want to be able to automatically re-fetch a parameter and retry (instead of a lot of try-catch and manual invalidation).
nmoutschen commentedon Aug 21, 2020
Closing as we release this feature in 1.3.0.