2 posts tagged with "serverless"

Google's EmbeddingGemma on AWS Lambda - A Curiosity-Driven Experiment

September 21, 2025 · 6 min read

Solutions Architect & Applied Software Engineer

EmbeddingGemma on AWS Lambda

Note: This is a curiosity-driven experiment, not a production recommendation. For real workloads, Amazon SageMaker is the right choice. This project explores what's possible when you push serverless boundaries.

1. The idea

After my BitNet Lambda experiment, I kept thinking: what about embeddings? I had text generation working on Lambda, but what about the other half of modern AI applications?

Google's EmbeddingGemma caught my attention—300M parameters, multilingual, designed for efficiency. Could it work on Lambda? Only one way to find out.

So I fired up Amazon Q Developer and started experimenting.

Running 1.58-bit LLMs on AWS Lambda - When Serverless Meets Extreme Quantization

June 20, 2025 · 6 min read

Manu Mishra

Solutions Architect & Applied Software Engineer

BitNet on AWS Lambda

✨ What you'll learn (tl;dr) In ~12 minutes you'll see how to deploy Microsoft's BitNet 1.58-bit quantized LLM on AWS Lambda, the container-based architecture, and performance benchmarks across different memory configurations using the microsoft/bitnet-b1.58-2B-4T model.

Big idea: 1.58-bit quantization enables LLM deployment on Lambda's CPU infrastructure. At ~1.1GB, the model fits within Lambda's constraints for serverless AI inference.

1. The idea​

1. The idea