LLM in Production

The work that starts after the prototype

Getting an LLM feature into production means solving problems a demo never surfaces: how do you know it’s good (evaluation), how do you keep it affordable and fast (cost and latency), and how do you keep it safe (security)? This section is about that work.

I cover practical evaluation harnesses you can run without an ML background, techniques for cutting LLM cost and latency in production (caching, model routing, prompt design), and LLM security — including prompt-injection defenses, drawing on my work on an LLM security tool built on Meta’s LLaMA.

Topics

A practical LLM eval harness for teams without an ML background
Cutting LLM cost and latency in production
LLM security: prompt injection and adversarial inputs