Skip to content

Feature request: Kinesis Firehose Response Record data class #2440

Closed
@troyswanson

Description

@troyswanson

Use case

Constructing response objects for use in Kinesis Firehose transformation functions.

This is a continuation of #1059 which describes the event object as well as the response object. The implementation for that issue can be found at #1540, but that does not include the response object.

Solution/User Experience

A data class that can be populated during the execution of a function that will be properly formed as a response to a KinesisFirehoseEvent invocation.

Rough idea

KinesisFirehoseResponse:
   records: list[KinssisFirehoseResponseRecord]
KinesisFirehoseResponseRecord:
   record_id: str
   result: Literal["Ok", "ProcessingFailed"]
   data: bytes
   metadata: KinesisFirehoseResponseRecordMetadata
KinesisFirehoseResponseRecordMetadata:
   partition_keys: dict

Note: ☝🏼 I'm not sure if this is not an exhaustive list of options that can be returned

Alternative solutions

Previously, I've used basic dictionaries for this, but it would be nice to have a more structured data class to use.

The Go example in the Dynamic Partitioning in Kinesis Data Firehose has the concept of a KinesisFirehoseResponse in their events package.

I believe it would be possible to re-use the KinesisFirehoseEvent data class from the utilities.data_classes module, but this seems like it is more geared for the event invocation object as opposed to the response object.

Acknowledgment

  • This feature request meets Powertools for AWS Lambda (Python) Tenets
    Should this be considered in other Powertools for AWS Lambda languages? i.e. Java, TypeScript, and .NET

Activity

boring-cyborg

boring-cyborg commented on Jun 12, 2023

@boring-cyborg

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

troyswanson

troyswanson commented on Jun 13, 2023

@troyswanson
ContributorAuthor

Been playing with a custom implementation for my project. Thought I would share it here:

(This uses Python 3.9 so the type alias syntax is a little different than current.)

myproj/dataclasses/kinesis_firehose.py

from typing import Union, Optional, Callable
from dataclasses import dataclass
from base64 import standard_b64encode

from aws_lambda_powertools.utilities.data_classes import KinesisFirehoseEvent


KinesisFirehoseResponseRecord = Union[
    "KinesisFirehoseResponseRecordOk",
    "KinesisFirehoseResponseRecordDropped",
    "KinesisFirehoseResponseRecordFailed",
]


@dataclass
class KinesisFirehoseEventProcessor:

    event: KinesisFirehoseEvent

    def process(self, fn: Callable[..., "KinesisFirehoseProcessedRecord"]):

        response_records: list[KinesisFirehoseResponseRecord] = list()

        for record in self.event.records:
            try:
                processed_record = fn(
                    record=record,
                    invocation_id=self.event.invocation_id,
                    delivery_stream_arn=self.event.delivery_stream_arn,
                    source_kinesis_stream_arn=self.event.source_kinesis_stream_arn,
                    region=self.event.region,
                )
                response_record = KinesisFirehoseResponseRecordOk(
                    record_id=record.record_id,
                    data=processed_record.data,
                    metadata=processed_record.metadata,
                )
            except KinesisFirehoseRecordProcessingDropped:
                response_record = KinesisFirehoseResponseRecordDropped(
                    record_id=record.record_id
                )
            except KinesisFirehoseRecordProcessingFailed:
                response_record = KinesisFirehoseResponseRecordFailed(
                    record_id=record.record_id
                )

            response_records.append(response_record)

        return KinesisFirehoseResponse(records=response_records)


@dataclass
class KinesisFirehoseProcessedRecord:
    data: str
    metadata: Optional["KinesisFirehoseResponseRecordMetadata"] = None


@dataclass
class KinesisFirehoseResponse:
    records: list["KinesisFirehoseResponseRecord"]

    def to_dict(self):
        return {"records": [record.to_dict() for record in self.records]}


@dataclass
class KinesisFirehoseResponseRecordMetadata:
    partition_keys: Optional[dict[str, str]]

    def to_dict(self):

        r = dict()

        if self.partition_keys is not None:
            r["partitionKeys"] = self.partition_keys

        return r


@dataclass
class KinesisFirehoseResponseRecordOk:
    record_id: str
    data: str
    metadata: Optional[KinesisFirehoseResponseRecordMetadata] = None

    @property
    def data_b64encoded(self) -> bytes:
        return standard_b64encode(self.data.encode())

    def to_dict(self):

        r = {
            "recordId": self.record_id,
            "result": "Ok",
            "data": self.data_b64encoded,
            "metadata": dict(),
        }

        if self.metadata is not None:
            r["metadata"] = self.metadata.to_dict()

        return r


@dataclass
class KinesisFirehoseResponseRecordFailed:
    record_id: str

    def to_dict(self):
        return {"recordId": self.record_id, "result": "ProcessingFailed"}


@dataclass
class KinesisFirehoseResponseRecordDropped:
    record_id: str

    def to_dict(self):
        return {"recordId": self.record_id, "result": "Dropped"}


class KinesisFirehoseRecordProcessingFailed(Exception):
    ...


class KinesisFirehoseRecordProcessingDropped(Exception):
    ...

Example implementation:

tests/conftest.py

import pytest
from aws_lambda_powertools.utilities.data_classes import KinesisFirehoseEvent


@pytest.fixture
def kinesis_firehose_event() -> KinesisFirehoseEvent:
    """
    record1: {"text":"hello world"}
    record2: {"text":"foo bar"}
    """

    return KinesisFirehoseEvent(
        {
            "invocationId": "invoked123",
            "deliveryStreamArn": "aws:lambda:events",
            "region": "us-west-2",
            "records": [
                {
                    "data": "eyJ0ZXh0IjoiaGVsbG8gd29ybGQifQ==",
                    "recordId": "record1",
                    "approximateArrivalTimestamp": 1686589530000,
                    "kinesisRecordMetadata": {
                        "shardId": "shardId-000000000000",
                        "partitionKey": "4d1ad2b9-2 4f8-4b9d-a088-76e9947c317a",
                        "approximateArrivalTimestamp": "2023-06-12T17:05:30.000Z",
                        "sequenceNumber": "49546986683135544286507457936321625675700192471156785154",  # noqa: E501
                        "subsequenceNumber": "",
                    },
                },
                {
                    "data": "eyJ0ZXh0IjoiZm9vIGJhciJ9",
                    "recordId": "record2",
                    "approximateArrivalTimestamp": 1686589530000,
                    "kinesisRecordMetadata": {
                        "shardId": "shardId-000000000001",
                        "partitionKey": "4d1ad2b9-24f8-4b9d-a088-76e9947c318a",
                        "approximateArrivalTimestamp": "2023-06-12T17:05:30.000Z",
                        "sequenceNumber": "49546986683135544286507457936321625675700192471156785155",  # noqa: E501
                        "subsequenceNumber": "",
                    },
                },
            ],
        }
    )

tests/test_kinesis_firehose.py

from json import dumps

from aws_lambda_powertools.utilities.data_classes.kinesis_firehose_event import (
    KinesisFirehoseEvent,
    KinesisFirehoseRecord,
)

from myproj.dataclasses.kinesis_firehose import (
    KinesisFirehoseEventProcessor,
    KinesisFirehoseProcessedRecord,
    KinesisFirehoseResponseRecordOk,
    KinesisFirehoseResponseRecordFailed,
    KinesisFirehoseProcessingFailed,
)


def test_kinesis_firehose_processor(kinesis_firehose_event: KinesisFirehoseEvent):
    def fn(record: KinesisFirehoseRecord, **kwargs) -> KinesisFirehoseProcessedRecord:
        data = record.data_as_json.copy()
        data["len"] = len(data["text"])
        data_as_json = dumps(data, separators=(",", ":"))
        return KinesisFirehoseProcessedRecord(data=data_as_json)

    processor = KinesisFirehoseEventProcessor(kinesis_firehose_event)
    response = processor.process(fn)

    assert isinstance(response.records[0], KinesisFirehoseResponseRecordOk)
    assert response.records[0].record_id == "record1"
    assert response.records[0].data == '{"text":"hello world","len":11}'
    assert isinstance(response.records[1], KinesisFirehoseResponseRecordOk)
    assert response.records[1].record_id == "record2"
    assert response.records[1].data == '{"text":"foo bar","len":7}'
rubenfonseca

rubenfonseca commented on Jun 15, 2023

@rubenfonseca
Contributor

Hi @troyswanson thank you for opening this! Since the response object can be quite complex, I agree that we could benefit with adding those classes to our dataclasses.

For reference, here's the Go types (https://github.com/aws/aws-lambda-go/blob/main/events/firehose.go#L28-L49)

I can see that you already have some code too. I would love if you could submit a PR for this! What do you think?

added
event_sourcesEvent Source Data Class utility
and removed
triagePending triage from maintainers
on Jun 15, 2023
self-assigned this
on Jun 15, 2023
moved this from Triage to Working on it in Powertools for AWS Lambda (Python)on Jun 19, 2023
moved this from Working on it to Pending customer in Powertools for AWS Lambda (Python)on Jun 20, 2023
moved this from Pending customer to Backlog in Powertools for AWS Lambda (Python)on Jul 10, 2023

19 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

Type

No type

Projects

Status

Shipped

Milestone

No milestone

Relationships

None yet

    Participants

    @rubenfonseca@troyswanson@heitorlessa@leandrodamascena@roger-zhangg

    Issue actions

      Feature request: Kinesis Firehose Response Record data class · Issue #2440 · aws-powertools/powertools-lambda-python