Serverless Streaming Analytics with S3 Tables & Firehose

January 25, 2026 · 9 min read

Distinguished Solutions Architect, Author & Researcher in AI & Cloud

S3 Tables Architecture

Introduction

Modern businesses need to analyze streaming data in real-time to make faster decisions. Whether it's monitoring IoT sensors, tracking user behavior, or processing financial transactions, the ability to query fresh data immediately is critical. However, building a streaming analytics pipeline traditionally requires managing complex infrastructure and dealing with data format conversions.

This solution shows how to build a serverless real-time streaming analytics pipeline using Amazon S3 Tables and Amazon Kinesis Data Firehose. By combining streaming ingestion with Apache Iceberg's analytics-optimized format, you can query data within minutes of generation—without managing any servers or data transformation jobs.

GitHub Repository: https://github.com/manu-mishra/s3table-firehose-lambda-terraform-demo

Architecture Overview

The solution creates an end-to-end streaming analytics pipeline that generates IoT sensor data using AWS Lambda, simulating 10 sensors across multiple locations. Data streams continuously through Amazon Kinesis Data Firehose with automatic buffering and delivery, then gets stored in Apache Iceberg format using Amazon S3 Tables for optimized analytics performance. The solution integrates with AWS Lake Formation for centralized data governance and access control.

The solution generates approximately 600,000 records per hour from simulated IoT sensors, demonstrating real-world streaming data patterns for temperature, humidity, and pressure monitoring across warehouse and office locations.

Key Components

Data Generation Layer

AWS Lambda (512 MB memory) generates 10,000 IoT sensor records per invocation at a rate of 200 records per second. The function is triggered every minute by Amazon EventBridge, producing consistent data flow for the pipeline. An AWS Identity and Access Management (IAM) role grants the Lambda function permissions to write to Firehose and Amazon CloudWatch Logs.

Streaming Layer

Amazon Kinesis Data Firehose buffers incoming data for up to 5 minutes or 5 MB before writing to the destination. An IAM role provides Firehose with permissions to access S3 Tables via Lake Formation, write to the error bucket, and use AWS Key Management Service (AWS KMS) for encryption.

Storage Layer

Amazon S3 Tables stores data in Apache Iceberg format with automatic schema management and table optimization. The table bucket is encrypted using AWS KMS customer-managed keys.

Amazon Simple Storage Service (Amazon S3) Error Bucket captures failed deliveries from Firehose when records cannot be written to S3 Tables. The error bucket is encrypted with AWS KMS customer-managed keys and has versioning enabled for audit trails. Failed records are written to the errors/ prefix with metadata about the failure reason, enabling troubleshooting and data recovery.

Governance Layer

AWS Lake Formation grants explicit permissions (ALL, ALTER, DELETE, DESCRIBE, DROP, INSERT, SELECT) to the Firehose role for database, table, and column access, ensuring secure and governed data access patterns.

Monitoring Layer

Amazon CloudWatch Dashboard provides real-time monitoring of Firehose metrics, Lambda performance, and S3 Tables storage with 9 pre-configured widgets tracking key operational metrics.

Data Schema

The solution generates IoT sensor data with the following schema:

sensor_id (string): Unique identifier for each sensor
timestamp (long): Unix timestamp in seconds
location (string): Physical location of the sensor
temperature (double): Temperature reading in Celsius
humidity (double): Humidity percentage
pressure (double): Atmospheric pressure in hPa

Data generation rate: 200 records per second during Lambda execution, producing approximately 10,000 records per invocation and 600,000 records per hour across all sensors.

Prerequisites

One-Time Account Configuration

Enable S3 Tables Integration with Lake Formation
- Navigate to the Amazon S3 console
- Select Table buckets in the left navigation
- Click Enable integration
- This registers S3 Tables with AWS Lake Formation
Configure Lake Formation Administrator Permissions
- The IAM identity running Terraform requires Lake Formation administrator permissions
- Navigate to the AWS Lake Formation console
- Select Administrative roles and tasks → Choose administrators
- Add your IAM user or role
Alternatively, use the AWS CLI:
```
aws lakeformation put-data-lake-settings \
  --data-lake-settings '{"DataLakeAdmins":[{"DataLakePrincipalIdentifier":"arn:aws:iam::ACCOUNT:user/YOUR_USER"}]}'
```

Required Tools

Terraform >= 1.0
AWS Command Line Interface (AWS CLI) >= 2.15 (for S3 Tables support)
AWS credentials configured
Terraform AWS Provider >= 6.0 (for S3 Tables schema support)

Validation Script

The solution includes a validation script to verify all prerequisites before deployment:

./validate-prerequisites.sh

This script checks:

AWS credentials configuration
S3 Tables integration with Lake Formation is enabled
Current IAM identity is a Lake Formation administrator
Required IAM permissions are available
AWS CLI version supports S3 Tables (v2.15+)
Terraform is installed (v1.0+)

Deployment

Infrastructure as Code

The solution uses Terraform to provision all AWS resources. The deployment creates:

S3 Tables table bucket, namespace, and Iceberg table with schema (KMS encrypted)
Kinesis Data Firehose delivery stream with Iceberg destination
Amazon S3 error bucket for failed deliveries (KMS encrypted)
AWS Lambda data generator function (512 MB, triggered every minute)
Amazon EventBridge schedule rule
IAM roles and policies for Firehose and Lambda
AWS KMS customer-managed encryption key
Amazon CloudWatch dashboard with 9 monitoring widgets
AWS Lake Formation permissions (automated via Terraform)

Deployment Steps

Configure terraform.tfvars with your stack name (must be globally unique):
```
stack_name = "your-unique-stack-name"
```
Deploy the infrastructure:
```
terraform init
terraform apply
```

All resources are created with Lake Formation permissions automatically granted during deployment.

Verification

Lambda begins sending data immediately. Wait 5-6 minutes for Firehose buffering, then verify data flow using the Amazon CloudWatch dashboard. The dashboard provides real-time visibility into:

Firehose incoming records and delivery metrics
Lambda invocations and performance
S3 Tables storage growth

Access the dashboard URL from Terraform outputs:

terraform output dashboard_url

Alternatively, check for delivery errors:

aws s3 ls s3://{stack_name}-errors/errors/ --recursive

An empty result indicates successful data delivery.

Querying Data

After deployment, query the streaming data stored in S3 Tables using Apache Iceberg-compatible query engines:

View All Sensor Data

SELECT * FROM firehosetos3demo.firehosetos3demo LIMIT 100;

Average Temperature by Location

SELECT 
    location, 
    AVG(temperature) as avg_temp,
    COUNT(*) as reading_count
FROM firehosetos3demo.firehosetos3demo
GROUP BY location
ORDER BY avg_temp DESC;

Recent High Temperature Alerts

SELECT 
    sensor_id, 
    temperature, 
    humidity,
    timestamp,
    from_unixtime(timestamp) as reading_time
FROM firehosetos3demo.firehosetos3demo
WHERE temperature > 25
ORDER BY timestamp DESC
LIMIT 50;

Sensor Activity Summary

SELECT 
    sensor_id,
    location,
    COUNT(*) as total_readings,
    AVG(temperature) as avg_temp,
    AVG(humidity) as avg_humidity,
    MIN(timestamp) as first_reading,
    MAX(timestamp) as last_reading
FROM firehosetos3demo.firehosetos3demo
GROUP BY sensor_id, location
ORDER BY sensor_id;

Monitoring and Operations

CloudWatch Dashboard

The solution includes a pre-configured Amazon CloudWatch dashboard with the following metrics:

Firehose Metrics:

Incoming records
Delivery success/failure counts
Data freshness (time from arrival to delivery)
Bytes delivered

Lambda Metrics:

Invocations
Errors
Duration

S3 Tables Metrics:

Storage size
File count

Access the dashboard URL from Terraform outputs:

terraform output dashboard_url

Error Monitoring

Check for delivery failures:

aws s3 ls s3://{stack_name}-errors/errors/iceberg-failed/ --recursive

Security Considerations

Encryption

At Rest: All data in S3 Tables and the error bucket is encrypted using AWS KMS customer-managed keys with automatic key rotation enabled
In Transit: All data transfers use TLS encryption

Access Control

IAM Roles: Least-privilege IAM policies for Lambda and Firehose
Lake Formation: Fine-grained access control at database, table, and column levels
KMS Key Policies: Explicit permissions for service principals and roles

Cost Optimization

The solution implements several cost optimization strategies:

Firehose Buffering: 5-minute or 5 MB buffering reduces the number of small files written to S3 Tables
CloudWatch Logs Retention: 1-day retention for Lambda logs reduces storage costs
S3 Tables Maintenance: Automatic compaction and optimization reduce storage costs over time

Technical Implementation Notes

Schema Definition

Amazon Kinesis Data Firehose requires S3 Tables to have a pre-defined schema. The solution uses Terraform AWS Provider 6.0+ which supports schema definition via the metadata block:

metadata {
  iceberg {
    schema {
      field {
        name     = "sensor_id"
        type     = "string"
        required = false
      }
      # Additional fields...
    }
  }
}

Lake Formation Permissions

The solution uses Terraform's null_resource with local-exec provisioner to automatically grant Lake Formation permissions using the AWS CLI during deployment. This approach is necessary because Terraform's aws_lakeformation_permissions resource does not support S3 Tables catalog ARNs (format: account:s3tablescatalog/bucket-name).

The null_resource executes after the foundational resources (S3 Tables, IAM roles) are created but before the Firehose delivery stream is provisioned. This ensures the Firehose role has the required Lake Formation permissions (ALL, ALTER, DELETE, DESCRIBE, DROP, INSERT, SELECT) for database, table, and column access before attempting to write data.

Without these permissions, Firehose would fail with Lakeformation.AccessDenied errors when attempting to write to S3 Tables. The automated approach eliminates manual permission configuration and ensures consistent deployments.

Firehose Buffering

Amazon Kinesis Data Firehose buffers data for up to 5 minutes (300 seconds) or 5 MB before writing to S3 Tables. This buffering mechanism optimizes write performance and reduces the number of small files, which improves query performance and reduces costs.

Cleanup

To remove all resources:

terraform destroy

Note: The AWS KMS key will be scheduled for deletion with a 7-day waiting period (AWS minimum).

Conclusion

This solution demonstrates a production-ready pattern for streaming data ingestion into Amazon S3 Tables using Amazon Kinesis Data Firehose. The architecture provides automatic buffering, schema management, encryption, governance, and monitoring capabilities suitable for real-world analytics workloads. The use of Apache Iceberg format through S3 Tables enables efficient querying, ACID transactions, and time travel capabilities for your streaming data.

By combining serverless compute (Lambda), managed streaming (Firehose), and analytics-optimized storage (S3 Tables), you can build scalable real-time analytics pipelines without managing infrastructure. The solution processes 600,000 records per hour with sub-minute query latency, demonstrating the power of modern serverless data architectures on AWS.

Introduction​

Architecture Overview​

Key Components​

Data Generation Layer​

Streaming Layer​

Storage Layer​

Governance Layer​

Monitoring Layer​

Data Schema​

Prerequisites​

One-Time Account Configuration​

Required Tools​

Validation Script​

Deployment​

Infrastructure as Code​

Deployment Steps​

Verification​

Querying Data​

View All Sensor Data​

Average Temperature by Location​

Recent High Temperature Alerts​

Sensor Activity Summary​

Monitoring and Operations​

CloudWatch Dashboard​

Error Monitoring​

Security Considerations​

Encryption​

Access Control​

Cost Optimization​

Technical Implementation Notes​

Schema Definition​

Lake Formation Permissions​

Firehose Buffering​

Cleanup​

Conclusion​