How To Setup Firehose DeliveryStream In Localstack

AWS provides us a tonnes of services to reduce development and maintenance hassle. Those who need to stream and store(maybe analyze too!) lots of data in real-time, they can avail kinesis services. Of-course if you want to build your own pipeline then no one is stopping you… but there goes a say:

If I have seen further than others, it is by standing upon the shoulders of giants

In my case it was just putting the data to kinesis data firehose, which will periodically dump those data after some threshold in an s3 bucket. Pretty basic need. But after I started getting my hands dirty I faced some difficulties and couldn’t found any complete guide for my requirement thus I thought keeping this as a reference for my future works.

If you are working around aws services then you must be heard of localstack, It is an amazing piece of software that helps you to mock the most of aws infrastructure in your own machine. If you can get a good grab on this then you might be able to save save a lot of dimes, during development you won’t be need to deploy any services on aws. Trying localstack will also made you to learn about docker and awscli if you already didn’t have a chance to work on these practically.

Ok, so the idea is, we’ll create our own aws infrastructure using localstack and docker, then create a s3 bucket on it, then create a firehose stream, then wire them together. Finally we’ll see some sample code to put data into kinesis firehose and see if they are dumped into s3 or not.

Setup the localstack and awscli

I assume that you already have docker installed in your machine. Now go to your project directory and create a docker-compose.yml file with the given content:

version: "3"

services:
localstack:
image: localstack/localstack
container_name: localstack-firehose-s3
restart: always
ports:
- "4567-4597:4567-4597"
- "8010:8010"
environment:
- SERVICES=s3,firehose,iam
- DEBUG=1
- DATA_DIR=/tmp/localstack/data
- PORT_WEB_UI=8010
- LAMBDA_EXECUTOR=docker
- KINESIS_ERROR_PROBABILITY=0.01
- DOCKER_HOST=unix:///var/run/docker.sock
- DEFAULT_REGION=ap-southeast-1

You can consult localstack documentation to fully understand the file but for now just focus on this line: -SERVICES=s3,firehose,iam

Here I’ve mentioned which services I need to run, you can put other services like lambda, sqs, dynamodb etc too!

Now close the file and run the following command to in your terminal to fire up the infrastructure:

docker-compose up 

Make sure your terminal is running on the same directory where docker-compose.yml file is. If you are running this for the first time then I might take some minutes to download localstack image. So keep patience. After this is finished keep the terminal running.

Now we are going to start setting up s3 and firehose. But for that you will need to setup awscli first. Installation differs from platform to platform, but it won’t be that hard. After installation create a .aws directory inside your home folder(in linux and mac). Inside that create a file named credentials and put the following contents in it:

[default]
aws_access_key_id = 123456
aws_secret_access_key = 123456

If you want to work on a real aws environment with your cli then you have to put real credentials, but for localstack anything will do.

Create a s3 bucket

Localstack has defined ports for specific services. Like for s3 it is 4572, for firehose its 4573. When we run a cli command for a service we need to specify the service url with the parameter --endpoint-url . Like for creating a bucket it will be:

aws --endpoint-url=http://localhost:4572 s3 mb s3://s3-firehose --region ap-southeast-1

Here s3 is the service name, mb means make bucket, other terms are self explanatory. Now we need to make the bucket public so that we can test this easily.

aws --endpoint-url=http://localhost:4572 s3api put-bucket-acl --bucket s3-firehose --acl public-read

Now if you go to localhost:4572 you will see an xml and hopefully your bucket will be there too!

Create an IAM role

For creating a role, you need a policy. For now we will use a policy that will enable a role to do anything on any aws service. Create a file called iam_policy.json then put this content in it:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1572416334166",
"Action": "*",
"Effect": "Allow",
"Resource": "*"
}
]
}

Now to create the role using this policy issue the following command in your terminal. Make sure the policy file exists in the same directory from where you are running the command.

aws --endpoint-url=http://localhost:4593 iam create-role --role-name super-role --assume-role-policy-document file://$PWD/iam_policy.json

Here iam is the service name, create-role is the command to create role, you can give whatever you like instead of super-role. And yes, inside localstack iam service endpoint is localhost:4593.

If you run this correctly you will get an output like this:

{
"Role": {
"Path": "/",
"RoleName": "super-role",
"RoleId": "519cuuk49exij5eims36",
"Arn": "arn:aws:iam::000000000000:role/super-role",
"CreateDate": "2019-11-08T17:19:55.050Z",
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1572416334166",
"Action": "*",
"Effect": "Allow",
"Resource": "*"
}
]
},
"Description": "None"
}
}

From this output keep the Arn value somewhere safe because it will be required shortly.

Create the firehose delivery stream

Firehose is a sophisticated service, there are many configuration options for different usecases. But for today, it will be simple. Just create a file named firehose_skeleton.json then write these in it:

{
"DeliveryStreamName": "s3-stream",
"DeliveryStreamType": "DirectPut",
"S3DestinationConfiguration": {
"RoleARN": "arn:aws:iam::000000000000:role/super-role",
"BucketARN": "arn:aws:s3:::s3-firehose",
"Prefix": "test-log",
"ErrorOutputPrefix": "test-error-log",
"BufferingHints": {
"SizeInMBs": 1,
"IntervalInSeconds": 60
},
"CompressionFormat": "UNCOMPRESSED",
"CloudWatchLoggingOptions": {
"Enabled": false,
"LogGroupName": "",
"LogStreamName": ""
}
},
"Tags": [
{
"Key": "tagKey",
"Value": "tagValue"
}
]
}

You should put your roles arn that you stored before at the RoleARN parameter of this file. Instead of s3-firehose at BucketARN value’s suffix you can put your bucket name if it’s different. Prefix means the files generated by firehose will have his prefix. Firehose will create a new file after 60 seconds or if the data size inside it’s buffer gets larger than 1 megabyte, these are specified with BufferingHints . Now lets create the stream:

aws --endpoint-url=http://localhost:4573 firehose create-delivery-stream --cli-input-json file://$PWD/firehose_skeleton.json

If you run this correctly you will get an output like this:

{
"DeliveryStreamARN": "arn:aws:firehose:ap-southeast-1:000000000000:deliverystream/s3-stream"
}

Now let’s try to put some data in it!

Put record in firehose

For this I will use python. First install boto3 which is the aws sdk for python.

pip install boto3

Now open python shell and run the commands or run the whole as a script:

Here DeliveryStreamName is the name of our stream that was mentioned in firehose_skeleton.json file. Let’s see if the data is dumped in s3.

Open your browser then go to this link: http://locahost:4572/s3-firehose which is the s3 url, there you will find somehting like this:

<Contents>
<Key>test-log/f4b96380-000e-4442-aec0-c4f271d00c1d</Key>
<LastModified>2019-11-08T18:05:47.324Z</LastModified>
<ETag>"2dfdb4d8ce3768f4fb73e7991c59bbf8"</ETag>
<Size>52</Size>
<StorageClass>STANDARD</StorageClass>
<Owner>
<ID>
75aa57f09aa0c8caeab4f8c24e99d10f8e7faeebf76c078efc7c6caea54ba06a
</ID>
<DisplayName>webfile</DisplayName>
</Owner>
</Contents>

See the key attribute here, this is the key of our s3 object created by firehose. To see this just append the key after the bucket url like this:

http://localhost:4572/s3-firehose/test-log/f4b96380–000e-4442-aec0-c4f271d00c1d

And the content will be the json representation of the data dictionary that we used to put_record.

Started(writing) with poetry, ended up with codes. Have a university degree on Biotechnology. Works and talks about Java, Python, JS. Have philophobia.