Skip to main content

2 posts tagged with "aws lambda"

View All Tags

Google's EmbeddingGemma on AWS Lambda - A Curiosity-Driven Experiment

· 6 min read
Manu Mishra
Solutions Architect & Applied Software Engineer

EmbeddingGemma on AWS Lambda

Note: This is a curiosity-driven experiment, not a production recommendation. For real workloads, Amazon SageMaker is the right choice. This project explores what's possible when you push serverless boundaries.

1. The idea

After my BitNet Lambda experiment, I kept thinking: what about embeddings? I had text generation working on Lambda, but what about the other half of modern AI applications?

Google's EmbeddingGemma caught my attention—300M parameters, multilingual, designed for efficiency. Could it work on Lambda? Only one way to find out.

So I fired up Amazon Q Developer and started experimenting.

Running 1.58-bit LLMs on AWS Lambda - When Serverless Meets Extreme Quantization

· 6 min read
Manu Mishra
Solutions Architect & Applied Software Engineer

BitNet on AWS Lambda

What you'll learn (tl;dr) In ~12 minutes you'll see how to deploy Microsoft's BitNet 1.58-bit quantized LLM on AWS Lambda, the container-based architecture, and performance benchmarks across different memory configurations using the microsoft/bitnet-b1.58-2B-4T model.

Big idea: 1.58-bit quantization enables LLM deployment on Lambda's CPU infrastructure. At ~1.1GB, the model fits within Lambda's constraints for serverless AI inference.