Skip to main content

3 posts tagged with "serverless"

View All Tags

Serverless Streaming Analytics with S3 Tables & Firehose

· 9 min read
Manu Mishra
Distinguished Solutions Architect, Author & Researcher in AI & Cloud

S3 Tables Architecture

Introduction

Modern businesses need to analyze streaming data in real-time to make faster decisions. Whether it's monitoring IoT sensors, tracking user behavior, or processing financial transactions, the ability to query fresh data immediately is critical. However, building a streaming analytics pipeline traditionally requires managing complex infrastructure and dealing with data format conversions.

This solution shows how to build a serverless real-time streaming analytics pipeline using Amazon S3 Tables and Amazon Kinesis Data Firehose. By combining streaming ingestion with Apache Iceberg's analytics-optimized format, you can query data within minutes of generation—without managing any servers or data transformation jobs.

GitHub Repository: https://github.com/manu-mishra/s3table-firehose-lambda-terraform-demo

Google's EmbeddingGemma on AWS Lambda - A Curiosity-Driven Experiment

· 6 min read
Manu Mishra
Distinguished Solutions Architect, Author & Researcher in AI & Cloud

EmbeddingGemma on AWS Lambda

Note: This is a curiosity-driven experiment, not a production recommendation. For real workloads, Amazon SageMaker is the right choice. This project explores what's possible when you push serverless boundaries.

1. The idea

After my BitNet Lambda experiment, I kept thinking: what about embeddings? I had text generation working on Lambda, but what about the other half of modern AI applications?

Google's EmbeddingGemma caught my attention—300M parameters, multilingual, designed for efficiency. Could it work on Lambda? Only one way to find out.

So I fired up Amazon Q Developer and started experimenting.

Running 1.58-bit LLMs on AWS Lambda with BitNet

· 6 min read
Manu Mishra
Distinguished Solutions Architect, Author & Researcher in AI & Cloud

BitNet on AWS Lambda

What you'll learn (tl;dr) In ~12 minutes you'll see how to deploy Microsoft's BitNet 1.58-bit quantized LLM on AWS Lambda, the container-based architecture, and performance benchmarks across different memory configurations using the microsoft/bitnet-b1.58-2B-4T model.

Big idea: 1.58-bit quantization enables LLM deployment on Lambda's CPU infrastructure. At ~1.1GB, the model fits within Lambda's constraints for serverless AI inference.