Do not be afraid of hallucinations. Use LLM calls to your advantage.

"You have to know how to optimize LLM calls to your advantage, do not be afraid of hallucinations. build guardrails around those calls efficiently."

Last night, I ran into a bug that perfectly captured a problem I’ve been struggling with for a while. When building conversational agents, engineering guardrails (to make sure they say what they’re meant to say and do what they’re meant to do) can get very chunky.

Not because the model is bad, but because that’s the pill you have to swallow when working with LLMs: they’re stateless and hallucinative by nature. Every time a user went slightly off-script, my agent would fall into these rigid flows: clarification states, correction loops, edge-case handlers. The more I tried to “control” the agent, the less natural it became.

And the worst part? The code was getting insane. At some point I realized I had written a massive amount of guardrail logic just to handle people asking normal human questions.

The Original Problem: Guardrails Everywhere

My agent is built on a state machine. So the flow looked like this: State: ask for shoe size User: “Where is your shop?”

Instead of just answering like a normal person, the system would:

Detect “clarification needed”
Route to a clarification node
Generate an answer
Check if it blocks the flow
Jump back to the size state

So a single human question triggered:

5 functions
A sub-graph
Multiple condition checks

It worked… but it felt completely wrong. I personally didn't like the architecture—I mean, this is meant to be agentic, it's not a freaking form.

The Realization

I decided to let the LLM handle clarification entirely. It was risky, but with proper prompt engineering and something I call Per-Node Prompting, I ended up with an agent that listens to its state machine without feeling robotic.

This is how I replaced hardcoded guards with a Per-Node Agentic Architecture.

1. The “Intent-Aware Naturalizer”

Instead of moving the user into a dedicated “Clarification State”, I keep them in the original state and let the LLM handle the detour in real time.

In my graph.py, I condensed all the complex routing checks into a single junction:

# The "Agentic Bridge"
if user_intent == "other" and current_state not in CRITICAL_DATA_GATES:
    response = generate_response(
        current_state, 
        user_message, 
        collected_data,
        history=messages, 
        summary=summary
    )
    return { "current_state": current_state, "agent_response": response }

The real trick is in the generate_response prompt. I feed the LLM two competing priorities: what the user just asked and what the current state’s goal is. Sarah (the bot) is then asked to bridge the two.

For example: User: “Is this shoe original?” Sarah: “Of course, we only stock clean, fresh pairs at Briggs Store. By the way, what size should I set aside for you?”

One seamless turn. No extra nodes. No correction loops.

Even with the Naturalizer, LLMs still hallucinate intents sometimes. For example, in the “Ask Size” state, if a user said “I’ve checked”, the model would sometimes classify that as a size intent instead of other.

Instead of adding more if/else logic, I moved those rules into the state machine itself. This is what I call Per-Node Prompting.

ask_size:
  text: "What shoe size do you wear"
  expected_intents: ["size", "other"]
  intent_guidance: |
    - Use 'other' if the user asks a question or says "I've checked".
    - ONLY use 'size' if they provide a specific number (e.g., 42).

This guidance gets injected dynamically into the intent classifier. So instead of the model guessing globally, it now understands the local context of each state. That single change made intent classification way more accurate.

3. Solving the “Dementia” Problem (Agentic Memory)

The final issue was repetition. If a user went on a long detour, Sarah would sometimes hit the welcome message again because she forgot she had already introduced herself.

I couldn’t just dump the full chat history into the prompt (too expensive, too many tokens), so I built a Window + Summary memory strategy.

Short-term memory: The last 8 messages, raw transcript.
Long-term memory: A rolling summary of older messages, compressed into about 3 sentences.

When the message count hits 20, I prune the old messages and update the summary:

if len(messages) >= 20: 
    new_summary = await summarize_messages(old_summary, messages[:12])
    db.update_summary(new_summary)
    db.prune_messages(count=12)

Now Sarah “remembers” she already gave you the price even if it happened 40 turns ago.

The Result: 150 Lines Deleted

Once I stopped trying to micromanage every edge case and leaned into agentic flows, I deleted over 150 lines of hardcoded logic. I know 150 isn't much, but I knew if I continued with that rigid route the lines would increase exponentially.

The code is cleaner.
The bot is smarter.
The UX feels human.

It now handles Nigerian English and slang naturally, stays focused on the sale, and doesn’t feel like a form anymore.

The biggest lesson for me: You have to know how to optimize LLM calls to your advantage, do not be afraid of hallucinations. Build guardrails around those calls efficiently.