AWS provides us a tonnes of services to reduce development and maintenance hassle. Those who need to stream and store(maybe analyze too!) lots of data in real-time, they can avail kinesis services. Of-course if you want to build your own pipeline then no one is stopping you… but there goes a say:
If I have seen further than others, it is by standing upon the shoulders of giants
In my case it was just putting the data to kinesis data firehose, which will periodically dump those data after some threshold in an s3 bucket. Pretty basic need. But after I started getting my hands dirty I faced some difficulties and couldn’t found any complete guide for my requirement thus I thought keeping this as a reference for my future works.
If you are working around aws services then you must be heard of localstack, It is an amazing piece of software that helps you to mock the most of aws infrastructure in your own machine. If you can get a good grab on this then you might be able to save save a lot of dimes, during development you won’t be need to deploy any services on aws. Trying localstack will also made you to learn about docker and awscli if you already didn’t have a chance to work on these practically.
Ok, so the idea is, we’ll create our own aws infrastructure using localstack and docker, then create a s3 bucket on it, then create a firehose stream, then wire them together. Finally we’ll see some sample code to put data into kinesis firehose and see if they are dumped into s3 or not.
Setup the localstack and awscli
I assume that you already have docker installed in your machine. Now go to your project directory and create a
docker-compose.yml file with the given content:
You can consult localstack documentation to fully understand the file but for now just focus on this line:
Here I’ve mentioned which services I need to run, you can put other services like lambda, sqs, dynamodb etc too!
Now close the file and run the following command to in your terminal to fire up the infrastructure:
Make sure your terminal is running on the same directory where
docker-compose.yml file is. If you are running this for the first time then I might take some minutes to download localstack image. So keep patience. After this is finished keep the terminal running.
Now we are going to start setting up s3 and firehose. But for that you will need to setup
awscli first. Installation differs from platform to platform, but it won’t be that hard. After installation create a
.aws directory inside your home folder(in linux and mac). Inside that create a file named
credentials and put the following contents in it:
aws_access_key_id = 123456
aws_secret_access_key = 123456
If you want to work on a real aws environment with your cli then you have to put real credentials, but for localstack anything will do.
Create a s3 bucket
Localstack has defined ports for specific services. Like for s3 it is 4572, for firehose its 4573. When we run a cli command for a service we need to specify the service url with the parameter
--endpoint-url . Like for creating a bucket it will be:
aws --endpoint-url=http://localhost:4572 s3 mb s3://s3-firehose --region ap-southeast-1
Here s3 is the service name, mb means make bucket, other terms are self explanatory. Now we need to make the bucket public so that we can test this easily.
aws --endpoint-url=http://localhost:4572 s3api put-bucket-acl --bucket s3-firehose --acl public-read
Now if you go to
localhost:4572 you will see an xml and hopefully your bucket will be there too!
Create an IAM role
For creating a role, you need a policy. For now we will use a policy that will enable a role to do anything on any aws service. Create a file called
iam_policy.json then put this content in it:
Now to create the role using this policy issue the following command in your terminal. Make sure the policy file exists in the same directory from where you are running the command.
aws --endpoint-url=http://localhost:4593 iam create-role --role-name super-role --assume-role-policy-document file://$PWD/iam_policy.json
Here iam is the service name, create-role is the command to create role, you can give whatever you like instead of super-role. And yes, inside localstack iam service endpoint is localhost:4593.
If you run this correctly you will get an output like this:
From this output keep the Arn value somewhere safe because it will be required shortly.
Create the firehose delivery stream
Firehose is a sophisticated service, there are many configuration options for different usecases. But for today, it will be simple. Just create a file named
firehose_skeleton.json then write these in it:
You should put your roles arn that you stored before at the
RoleARN parameter of this file. Instead of
BucketARN value’s suffix you can put your bucket name if it’s different.
Prefix means the files generated by firehose will have his prefix. Firehose will create a new file after 60 seconds or if the data size inside it’s buffer gets larger than 1 megabyte, these are specified with
BufferingHints . Now lets create the stream:
aws --endpoint-url=http://localhost:4573 firehose create-delivery-stream --cli-input-json file://$PWD/firehose_skeleton.json
If you run this correctly you will get an output like this:
Now let’s try to put some data in it!
Put record in firehose
For this I will use python. First install
boto3 which is the aws sdk for python.
pip install boto3
Now open python shell and run the commands or run the whole as a script:
DeliveryStreamName is the name of our stream that was mentioned in firehose_skeleton.json file. Let’s see if the data is dumped in s3.
Open your browser then go to this link: http://locahost:4572/s3-firehose which is the s3 url, there you will find somehting like this:
key attribute here, this is the key of our s3 object created by firehose. To see this just append the key after the bucket url like this:
And the content will be the json representation of the
data dictionary that we used to put_record.