Introduction

Most people approach AI tools like Claude as if they’re just “better autocomplete.”

That framing breaks down fast.

What you’re actually working with is a system composed of:

reasoning + tools + context + orchestration

Once you understand that, everything else such as prompting, RAG, agents, and model choice starts to click into place.

This post is the mental model I wish I had earlier.

General AI Knowledge (That Actually Matters)

Learning Claude is not really about Claude.

It’s about learning how LLMs behave under constraints and how to shape that behavior.

The same principles apply whether you’re using Claude, GPT, or anything else:

  • Models reason probabilistically
  • Context defines behavior more than the model itself
  • Structure beats cleverness
  • Systems beat prompts

If you’re building anything real, you’re not “prompting a model.”

You’re designing a system that happens to include one.

Prompt Conditioning vs Context Conditioning

This distinction is where most people get stuck.

Prompt Conditioning

This is your instruction layer.

Think:

  • what the model should do
  • how it should behave
  • constraints and rules

A good example is a claude.md file:

  • defines purpose
  • establishes tone
  • encodes best practices

It’s not data. It is direction.

Context Conditioning

This is the information layer.

And this is where things get real:

Better context beats better models.

If your context is wrong, outdated, or bloated, your outputs will be too, no matter how good the model is.

What good context looks like

  • relevant
  • minimal
  • structured
  • current

What bad context looks like

  • entire API docs
  • full database dumps
  • random logs

What actually works

  • 2 to 5 highly relevant documents
  • specific function definitions
  • targeted error logs

Ordering Matters More Than You Think

LLMs are extremely sensitive to structure.

Earlier content is weighted more heavily.

A reliable pattern is as follows:

1. Instructions
You are a senior C# software engineer performing a strict code review for a production ASP.NET system.

You must analyze code for correctness, security, performance, and maintainability.

2. Constraints

- Only report real issues (no speculation)
- Be concise and actionable
- Follow the output format exactly
- Do not include explanations outside the JSON structure

3. Context

This code runs in a high-traffic payment processing service where correctness and concurrency safety are critical. Bugs may result in financial loss.

4. Examples

Input:
public List<int> GetEvenNumbers(List<int> numbers)
{
    var result = new List<int>();

    foreach (var n in numbers)
    {
        if (n % 2 == 1)
        {
            result.Add(n);
        }
    }

    return result;
}

Output:
{
  "issues": [
    {
      "type": "bug",
      "description": "Method returns odd numbers instead of even numbers",
      "recommendation": "Change condition to n % 2 == 0"
    }
  ]
}

---

Input:
public decimal Total(List<Order> orders)
{
    decimal total = 0;

    foreach (var order in orders)
    {
        total += order.Amount;
    }

    return total;
}

Output:
{
  "issues": []
}

---

5. Task

Analyze the following method and return issues in strict JSON format:

public bool ProcessPayment(User user, decimal amount)
{
    if (user.Balance >= amount)
    {
        user.Balance -= amount;
        return true;
    }

    return false;
}

If you get this wrong, you will see inconsistent behavior and will not know why.

This pattern works because:


1. Instructions

Sets identity and behavior (“what kind of engineer am I dealing with?”)

2. Constraints

Hard rules are locked in early before the model generates anything

3. Context

Frames interpretation (critical in production systems)

4. Examples

Locks in pattern recognition

5. Task

Only now does the model execute


RAG (Done Properly)

Retrieval augmented generation is not about adding more data.

It is about adding the right data at the right time.

For example, using Qdrant:

  • embed documents
  • retrieve only what is semantically relevant
  • inject into context

That is how you get precision.

Not by dumping everything into the prompt.


Prompt Engineering That Actually Works

If you only learn three patterns, make it these:

1. Structured Prompting

Define:

  • task
  • context
  • constraints
  • output format

This removes ambiguity, which is the number one cause of bad outputs.

Use this for:

  • APIs
  • automation
  • anything requiring consistency

Example:

You are a senior C# software engineer performing a strict code review.

## Task
Analyze the provided method and identify issues.

## Context
This method is part of a production ASP.NET API that processes user payments.

## Code
public bool ProcessPayment(User user, decimal amount)
{
    if (user.Balance > amount)
    {
        user.Balance = user.Balance - amount;
        return true;
    }

    return false;
}

## Constraints
- Only identify real issues (no speculation)
- Focus on correctness, security, and concurrency
- Be concise and actionable

## Output Format (JSON)
{
  "issues": [
    {
      "type": "bug | security | performance",
      "description": "",
      "recommendation": ""
    }
  ]
}

2. Few Shot Prompting

Models are extremely good at pattern matching, often better than following abstract rules.

Use this for:

  • formatting
  • classification
  • transformations
  • tone consistency

Example:

Convert the following C# foreach loops into LINQ expressions.

Input:
var results = new List<int>();
foreach (var x in numbers)
{
    if (x > 0)
    {
        results.Add(x);
    }
}

Output:
var results = numbers.Where(x => x > 0).ToList();

---

Input:
var names = new List<string>();
foreach (var user in users)
{
    names.Add(user.Name);
}

Output:
var names = users.Select(user => user.Name).ToList();

---

Input:
var total = 0;
foreach (var order in orders)
{
    total += order.Amount;
}

Output:
var total = orders.Sum(order => order.Amount);

---

Input:
var activeUsers = new List<User>();
foreach (var user in users)
{
    if (user.IsActive && user.Age > 18)
    {
        activeUsers.Add(user);
    }
}

Output:

3. Step by Step Reasoning

Tell the model to think in stages.

This improves:

  • correctness
  • depth
  • multi step logic

Use this for:

  • debugging
  • system design
  • anything non trivial

Example:

You are a senior C# engineer debugging production code.

Follow these steps explicitly:

1. Understand what the code is supposed to do
2. Identify what is actually happening
3. Determine the root cause of the issue
4. Propose a fix
5. Provide corrected code

---

Problem:
The following method is supposed to return all even numbers from a list, but it is returning incorrect results.

Code:

public List<int> GetEvenNumbers(List<int> numbers)
{
    var result = new List<int>();

    foreach (var n in numbers)
    {
        if (n % 2 == 1)
        {
            result.Add(n);
        }
    }

    return result;
}

What Actually Works in Practice

Not one technique.

All three combined example:


You are a senior C# software engineer performing a strict code review for a production ASP.NET system.

---

## TASK (Structured Prompting)
Review the provided code and identify issues related to:
- correctness
- performance
- security
- maintainability

Return results in the required JSON format.

---

## CONTEXT (Structured Prompting)
This code runs in a high-traffic payment processing service where correctness and concurrency safety are critical.

---

## STEP-BY-STEP INSTRUCTIONS (Step-by-Step Reasoning)
Follow these steps explicitly:

1. Understand what the method is intended to do
2. Analyze what the code is actually doing
3. Identify any mismatches between intent and implementation
4. Determine root causes of any issues
5. Propose specific fixes

---

## EXAMPLES (Few-Shot Prompting)

Input:
public List<int> GetEvenNumbers(List<int> numbers)
{
    var result = new List<int>();
    foreach (var n in numbers)
    {
        if (n % 2 == 1)
        {
            result.Add(n);
        }
    }
    return result;
}

Output:
{
  "issues": [
    {
      "type": "bug",
      "description": "Method returns odd numbers instead of even numbers",
      "recommendation": "Change condition to n % 2 == 0"
    }
  ]
}

---

Input:
public decimal CalculateTotal(List<Order> orders)
{
    decimal total = 0;
    foreach (var order in orders)
    {
        total += order.Amount;
    }
    return total;
}

Output:
{
  "issues": []
}

---

## TASK INPUT

Now analyze the following code:

public bool ProcessPayment(User user, decimal amount)
{
    if (user.Balance >= amount)
    {
        user.Balance = user.Balance - amount;
        return true;
    }

    return false;
}

---

## OUTPUT FORMAT (STRICT JSON)
{
  "issues": [
    {
      "type": "bug | security | performance | maintainability",
      "description": "",
      "root_cause": "",
      "recommendation": ""
    }
  ]
}

That is the baseline for reliable outputs.


Tools, Skills, Hooks, Subagents (The Real Architecture)

This is where most people never go and where the real leverage is.


Tools, Controlled Execution

Tools are deterministic functions the model can call.

Instead of guessing, the model:

reasons → selects tool → passes structured input → gets result → continues

Good tools:

  • do one thing well
  • have strict schemas
  • enforce rules internally
  • return consistent outputs
  • never fail silently

Custom tool creation instructions found on Claude.com can be used to help you create your own custom tool.

Bad tools are vague and ambiguous.

Good tools are explicit and constrained.

Example tool description, code, and interaction:

Tool Schema (with input_examples):

{
  "name": "get_customer_account",
  "description": "Fetches customer account details (name as a string, balance as a decimal, status) by customer ID",
  "input_schema": {
    "type": "object",
    "properties": {
      "customer_id": {
        "type": "string",
        "description": "Unique identifier for the customer"
      }
    },
    "required": ["customer_id"],
    "input_examples": [
      {
        "customer_id": "12345"
      },
      {
        "customer_id": "cust_98765"
      },
      {
        "customer_id": "A100200300"
      }
    ]
  }
}

Tool Backend Implementation (stored on your system in a referenceable location):

public class CustomerService
{
    public CustomerAccount GetCustomerAccount(string customerId)
    {
        // In real systems this would hit a DB or API
        return new CustomerAccount
        {
            CustomerId = customerId,
            Name = "Jane Doe",
            Balance = 125.50m,
            Status = "Active"
        };
    }
}

Tool Call Example (Claude Output)

{
  "type": "tool_use",
  "name": "get_customer_account",
  "input": {
    "customer_id": "cust_98765"
  }
}

Tool Result (Your System)

{
  "customer_id": "cust_98765",
  "name": "Jane Doe",
  "balance": 125.50,
  "status": "Active"
}

Final Response (Claude)

Customer cust_98765 (Jane Doe) has a balance of $125.50 and is currently Active.

Skills, How the Model Thinks

A skill is a reusable bundle of:

  • instructions
  • constraints
  • output format

Installation Instructions from Claude.com

Example, a scrum work item creation skill.

Purpose

You are an Agile Scrum expert responsible for creating, refining, and validating work items (User Stories, Tasks, Bugs, Epics) according to industry best practices.

You ensure every work item is:

clear
testable
valuable
technically actionable
aligned with Scrum principles
Core Responsibilities

You must ensure every work item includes the following elements when applicable:

1. Title
Short, specific, and action-oriented
Describes the outcome, not the implementation
2. Work Item Type

Identify one of:

Epic
User Story
Task
Bug
Spike
3. User Story (if applicable)

Use standard format:

As a [role], I want [goal], so that [benefit]

Focus on:

user intent
business value
outcome over implementation
4. Description

Must include:

what needs to be built or changed
context for why it is needed
any constraints or assumptions

Avoid:

implementation details unless necessary
5. Business Value

Explain:

why this matters
impact on users or system
measurable or observable benefit
6. Acceptance Criteria

Must be:

testable
unambiguous
independent where possible

Use format:

Given / When / Then OR bullet conditions

Must include:

success conditions
failure conditions
edge cases where relevant
7. Technical Notes (optional but recommended)

Include:

architecture considerations
APIs or services involved
integration points
data flow notes

Do NOT over-specify implementation unless required.

8. Dependencies

Identify:

upstream systems
downstream systems
teams or services required
9. Definition of Done (DoD)

Must include:

tests written
code reviewed
acceptance criteria met
deployed to target environment
observability added if relevant
10. Story Points (if applicable)

Provide relative sizing based on:

complexity
uncertainty
effort

Do NOT treat as time estimate.

11. Priority

One of:

Critical
High
Medium
Low

Must align with business value and urgency.

12. Sprint Assignment (if used)

Indicate:

sprint number or backlog status
Quality Rules

You must enforce the following:

Clarity Rule

No vague terms like:

“improve system”
“handle properly”
“make better”

Everything must be concrete and verifiable.

Testability Rule:
Every work item must be independently testable.

If it cannot be tested, it is invalid.

Value Rule:
Every item must explain why it exists.

If no value is present, flag it.

Scope Rule:
Work items must be:

small enough to complete in a sprint (unless Epic)
not overloaded with unrelated concerns

Consistency Rule:
Ensure formatting is consistent across all generated work items.

Output Format:
Always return structured work items in this format:

{
  "title": "",
  "type": "User Story | Task | Bug | Epic | Spike",
  "story": "",
  "description": "",
  "business_value": "",
  "acceptance_criteria": [],
  "technical_notes": [],
  "dependencies": [],
  "definition_of_done": [],
  "story_points": "",
  "priority": "",
  "sprint": ""
}

Behavioral Instruction

When given an input requirement:
- Identify missing Scrum elements
- Normalize unclear language
- Convert vague requests into testable work items
- Ensure alignment with Agile best practices
- Reject or flag incomplete inputs when necessary

Key Principle

A good work item describes outcomes and validation, not implementation guesses.

This is a good skill definition because it turns something messy and subjective (writing Agile work items) into a repeatable, enforceable behavioral system for an LLM instead of just a loose set of guidelines.

Skills give you:

  • consistency
  • reuse
  • versioning of behavior

Tools execute.

Skills shape reasoning.


Hooks, Deterministic Execution Points

Hooks are predefined execution points in your system where you can attach deterministic logic, scripts, or AI workflows.

Hook creation instructions can be found on Claude.com

Key idea:

Hooks let you intercept and control the lifecycle of an AI workflow.

Claude exposes 28 different hook events for manipulating Claude in any Claude code session.

Instead of Claude driving everything, your system defines:

WHEN something happens → RUN deterministic logic → THEN call Claude (or not)

Types of Hooks (common patterns):

1. Pre-processing hook
2. Post-processing hook
3. Guard hook
4. post results

Example: Using Scripts inside Hooks to perform context conditioning with QDrant vector database collections:

Flow:

User Request
   ↓
Pre-Hook
   ↓
Embed Query
   ↓
Qdrant Search
   ↓
Post-Hook (filter / enrich results)
   ↓
Claude response
Pre-Hook: Before querying Qdrant

Used to improve retrieval quality.

What to do here:

  • normalize query
  • add metadata filters
  • expand query context
  • apply user permissions
public SearchRequest PreHook_BuildQdrantQuery(string userQuery, UserContext user)
{
    return new SearchRequest
    {
        Vector = EmbeddingService.Generate(userQuery),
        Filter = new
        {
            tenant_id = user.TenantId,
            document_type = "policy"
        }
    };
}

Why this matters:

You’re ensuring Qdrant only searches relevant slices of data, not everything.

Qdrant Execution (core retrieval step)
var results = qdrantClient.Search(searchRequest);

This is your actual vector lookup.

Post-Hook: After Qdrant returns results

This is where hooks become powerful.

You can:

  • filter hallucination-prone results
  • rerank results
  • deduplicate
  • enrich context
  • enforce trust rules

Example Post-Hook

public List<Document> PostHook_FilterResults(List<Document> results)
{
    return results
        .Where(r => r.Score > 0.75)
        .Take(5)
        .ToList();
}

Example: Context Enrichment Hook

public string PostHook_FormatContext(List<Document> results)
{
    return string.Join("\n\n", results.Select(r =>
        $"Title: {r.Title}\nContent: {r.Content}"
    ));
}

This prepares clean context for Claude.

Guard Hook (optional but important)
public void GuardHook_ValidateAccess(UserContext user, Document doc)
{
    if (doc.TenantId != user.TenantId)
        throw new UnauthorizedAccessException();
}
Why Hooks Matter

Hooks give you:

1. Deterministic control

You decide exactly when logic runs.

2. Separation of concerns
  • Claude = reasoning
  • Hooks = orchestration
  • Scripts = execution logic
3. Production safety

You can enforce:

  • validation
  • retries
  • guardrails
  • schema enforcement

before or after the model ever responds.


Subagents

What they are:
Subagents are specialized execution units that perform a single, bounded task inside a larger system.

Think: “a focused worker that does one job and returns results, without controlling the overall workflow.”

Unlike full agents, subagents do not run open-ended loops. They operate under strict constraints defined by the parent system.

How to create a custom sub agent from Claude.com

Key idea

Subagents do not decide the workflow.

They:

  • receive a narrow task
  • execute within defined boundaries
  • return structured output
  • terminate immediately

The system (not the subagent) owns control flow.

Example Subagent Flow

Main Orchestrator
   ↓
Select Subagent (task-specific)
   ↓
Execute bounded task
   ↓
Return result
   ↓
Continue workflow
Example Subagent Tasks

Subagents are typically used for:

  • code review (security only)
  • data extraction from documents
  • Qdrant retrieval formatting
  • validation of structured outputs
  • summarization of logs or diffs

Each subagent has one responsibility only.

Example: PR Review Subagent

Instead of an autonomous loop, a subagent does this:

Task: Analyze PR diff for security issues

1. Inspect code changes
2. Identify vulnerabilities
3. Return structured findings
4. Stop

Why Subagents Matter

Subagents introduce control and predictability into AI systems.

1. Bounded execution

No runaway loops or uncontrolled tool usage.

2. Deterministic orchestration

The parent system decides:

  • when they run
  • what they receive
  • how results are used
3. Composability

You can chain subagents together:

Retriever Subagent → Analyzer Subagent → Validator Subagent
Why Subagents Are Safer in Production

Full agents can:

  • loop indefinitely
  • drift from goals
  • overuse tools
  • increase cost unpredictably

Subagents avoid this by design:

  • no looping
  • no self-planning
  • no uncontrolled execution

Memory (State Matters)

Without memory, everything is stateless and inefficient.

Types:

  • short term conversation
  • long term databases or vector stores

Use it for:

  • preferences
  • workflows
  • continuity

Guardrails (Non Negotiable)

You should never trust raw model output.

Guardrails include:

  • input validation
  • output validation
  • schema enforcement
  • tool restrictions

Example patterns:

  • reject PII
  • validate JSON
  • retry on failure
  • block unauthorized tool calls

Evaluation (What Most People Skip)

If you do not measure behavior, your system will degrade silently.

You need:

  • test cases
  • scoring
  • regression checks

Prompt tweaks will break things.

You just will not notice without evals.

This is the same for any person writing software as it is for AI. Writing tests and evaluating the outcomes of the implementation are work that Claude can assist with as-well and are equally subject to peer review.

With the appropriate tests in-place, writing acceptable software with Claude can become more efficient.


The Mental Model That Works

This is the actual flow:

User Request

Agent (optional)

Skill (reasoning)

Tool (execution)

Script (orchestration)

Memory / Retrieval

Guardrails

Evaluation

If you are missing pieces here, you will feel it in production.


Choosing the Right Model (Cost vs Intelligence)

With Claude models:

  • Haiku, fast, cheap, low reasoning
  • Sonnet, balanced default
  • Opus, deep reasoning, expensive

Do not ask which is best.

Ask:

What happens if the model is wrong?

ImpactModel
LowHaiku
MediumSonnet
HighOpus

What Actually Works

Route between models.

Example:

Haiku → filter/classify
Sonnet → handle most logic
Opus → escalate edge cases

Most systems overuse top tier models.

You usually do not need them.


Where AI Systems Fail in Production

It does not fail loudly.

Claude fails convincingly.

Common failure modes:

  • hallucination under missing context
  • prompt brittleness
  • tool misuse
  • silent incorrect outputs
  • context overload

The dangerous part is that everything looks right.


Determinism (Or Lack Of It)

LLMs are not deterministic.

The same input does not produce the same output.

You do not control outputs.

You control probabilities.

That means you must design for:

  • validation
  • retries
  • evaluation

Observability (What Actually Matters in Production)

If you cannot see it, you cannot trust it.

Track:

  • inputs
  • outputs
  • tool calls
  • latency
  • cost
  • failures

Without this:

  • debugging is impossible
  • optimization is guesswork
  • trust breaks down

When NOT to Use Claude

There are cases where an LLM is the wrong tool:

  • deterministic logic, use code
  • precise math, use calculators
  • high risk decisions, do not rely on LLMs alone

Final Insight

LLMs are not systems.

They are components inside systems.

If you treat them like magic, they will fail unpredictably.

If you treat them like infrastructure, they become powerful.

That shift from prompting to system design is where things start to work.