AWS : Lambda + Athena and S3

5 Simple steps : Integrate Lambda , Athena and S3

Please follow the steps to load data from S3 to Athena from here

1. Create IAM Role

  • Provide access to S3 , CloudWatch and Athena
  • Create Role which will be attached to Lambda function

2. Go to Lambda and create function

3. Change the basic setting (Timeout to 5 min)

After create function - Put any value but atleast more than 1 min

4. Write the Python code

Sample Python code to extract data from Athena

#####################################################
# Code : To Integrate Lambda and Athena
# Developer : Debaditya Chakravorty
# Credits : Thanks to all AWS and other blogs
#####################################################

import json
import boto3
import time


def lambda_handler(event,context):
	client = boto3.client('athena')

	query_string='select * from debdata limit 5;'
	DATABASE_NAME = 'debtest'
	output_dir = 's3://deb-lambda-output/'

	query_id = client.start_query_execution(
			QueryString = query_string,
			QueryExecutionContext = {
				'Database': DATABASE_NAME
			},
			ResultConfiguration = {
				'OutputLocation': output_dir
			}
		)['QueryExecutionId']

	#queryId = queryStart['QueryExecutionId']


	query_status = None
	while query_status == 'QUEUED' or query_status == 'RUNNING' or query_status is None:
		query_status = client.get_query_execution(QueryExecutionId=query_id)['QueryExecution']['Status']['State']
		if query_status == 'FAILED' or query_status == 'CANCELLED':
			raise Exception('Athena query with the string "{}" failed or was cancelled'.format(query_string))
		time.sleep(10)
	results_paginator = client.get_paginator('get_query_results')
	results_iter = results_paginator.paginate(
		QueryExecutionId=query_id,
		PaginationConfig={
		'PageSize': 1000
		}
	)
	results = []
	data_list = []
	for results_page in results_iter:
		for row in results_page['ResultSet']['Rows']:
			data_list.append(row['Data'])
	for datum in data_list[1:]:
		results.append([x['VarCharValue'] for x in datum])
	return [tuple(x) for x in results]

5. Deploy the Python code

Test and Check the result

Result data in S3

Note :

  • The steps mentioned above is for POC.

  • In Production or any organization , CloudFormation template and proper IAM roles would be utilized (Concept of least privilege)