multiplepart/from-data: AWS API Gateway + Lambda

How to handle binary & string data types using Python

Introduction

This article covers the steps to accept binary data via multipart/form-data with AWS API-Gateway (REST) & Lambda using Python.

There will be authentication, authorisation, and probably a WAF involved in production. Still, in this blog, we will cover the steps directly linked with the handling of binary data type.

multiform.jpg

We will consider a simple form with a text field and file upload option in our example. image.png

Lambda Setup

We will start with Lambda set up and then link this Lambda with the API Gateway during setup. Below is an example code for reference only, as I've to remove additional code to keep it simple and focused on handling multipart/form-data.

from streaming_form_data import StreamingFormDataParser
from streaming_form_data.targets import ValueTarget
import base64
import boto3

# Instantiate a table resource object for table name "app_data"
dynamodb = boto3.resource("dynamodb")
app_table = dynamodb.Table("app_data")

# Instantiate a S3 resource object
s3 = boto3.resource("s3")

def lambda_handler(event, context):
    # Initiate the parser with request header
    parser = StreamingFormDataParser(headers=event['params']['header'])
    # As the form has 2 fields: 1x Text & 1x File. 
    # Here we initiate two ValueTarget to hold values in memory.
    user_full_name = ValueTarget() 
    uploaded_file = ValueTarget()

    # Register function will link the multiplepart/from-data key name 
    # with the value targets created above.
    parser.register("name", user_full_name)
    parser.register("file", uploaded_file)

    #decode event body passed by the API
    mydata = base64.b64decode(event["body"])

    # parse the decoded body based on registers defined above.
    parser.data_received(mydata)

    # covert binary value to UTF-8 format. 
    # Optionally add the "USER#" tag (DynamoDB design requirement).
    data = "USER#" + user_full_name.value.decode("utf-8")

    # For S3 bucket 'mybucket' here we create a folder with value of 
    # 'data' and 'file.p12' inside.
    filename = data + "/file.p12"
    s3_file = s3.Object('mybucket', filename)

    # upload file on S3 bucket
    s3_file.put(Body=uploaded_file.value)

    response = app_table.put_item(
        Item={
        'p_id': data,
    # If usecase conforms with DynamoDB object size limit of 400KB
    # below line allow saving the file in DynamoDB instead of S3.
        'attachment': uploaded_file.value
        }
    )
    return response

The lambda uses the "streaming_form_data " pip package. I've used "Lambda Layer" to include this package. Please refer to my other post on Lambda layers for details on using this feature.

Its important to correctly refer to the header section in your event. This is done in the first line of the lambda function:
parser = StreamingFormDataParser(headers=event['params']['header']) If you are unsure, comment all the lines and print the whole event from Lambda. This will allow checking the actual event dictionary in the Cloudwatch logs. As detailed by the screenshot below, in my case, the header section was located at event[params][header] Screenshot 2021-12-31 at 23.10.04.png

The next set of data mapping is at the parser.register stage:

parser.register("name", user_full_name)
parser.register("file", uploaded_file)

The API key name should match the first argument of the parser.register, and the second argument is the valuetarget() name. Screenshot 2022-01-01 at 00.19.24.png

After this, we follow the standard process to decode and parse the body. This will add the values to the ValueTargets objects we created earlier.

user_full_name.value & uploaded_file.value represents the binary value of the respective form fields. For the text field, we further decode it as "UTF-8", and the file we save it as binary.

The code further provides examples to store the data on S3 or DynamoDB. Please make sure the Lambda function's assigned role has required permissions for S3 & DynamoDB read/write.

API Gateway Setup

We will create an API Gateway with Lambda integration type. Add

1) Create a regional REST API. Screenshot 2021-12-30 at 23.29.31.png

2) Under the "API Gateway" settings: Screenshot 2021-12-31 at 00.04.33.png

Add "multipart/form-data" under Binary Media Types. Screenshot 2021-12-31 at 00.06.22.png

3) Add a "resource" and enable "CORS". Screenshot 2021-12-30 at 23.31.11.png

4) Create a type "Post" method and add the Lambda we created earlier. Screenshot 2022-01-01 at 01.20.46.png

5) Click on the "Integration Request" Screenshot 2022-01-01 at 01.15.27.png

6) Under "HTTP Headers" add two entires as below:
Accept: '*/*'
Content-Type: 'method.request.header.Content-Type'

Screenshot 2021-12-31 at 18.16.38.png

7) Under "Mapping Templates", select "When there are no templates defined (recommended)" Click on "Add mapping template" and add "multipart/form-data".

Screenshot 2021-12-31 at 18.00.09.png

8) Select "multipart/form-data" and under "Generate template", select "Method Request passthrough". In my lambda example, I used the key of "body" instead of "body-json". Line 5 of the template. Screenshot 2021-12-31 at 18.00.09.png

9) Save the changes and deploy the API gateway by clicking on the "Deploy API". Screenshot 2022-01-01 at 01.57.46.png

At this stage, if all the mappings are correct, Lambda should write the data to DynamoDB and S3. The examples are not production ready and should be used as a reference for handling multipart/form-data via AWS serverless architecture.

I hope you've found this article useful. Follow me if you are interested in:

  • Python
  • AWS Architecture & Security.
  • AWS Serverless solutions.