Introduction
Most people approach AI tools like Claude as if they’re just “better autocomplete.”
That framing breaks down fast.
What you’re actually working with is a system composed of:
reasoning + tools + context + orchestration
Once you understand that, everything else such as prompting, RAG, agents, and model choice starts to click into place.
This post is the mental model I wish I had earlier.
General AI Knowledge (That Actually Matters)
Learning Claude is not really about Claude.
It’s about learning how LLMs behave under constraints and how to shape that behavior.
The same principles apply whether you’re using Claude, GPT, or anything else:
- Models reason probabilistically
- Context defines behavior more than the model itself
- Structure beats cleverness
- Systems beat prompts
If you’re building anything real, you’re not “prompting a model.”
You’re designing a system that happens to include one.
Prompt Conditioning vs Context Conditioning
This distinction is where most people get stuck.
Prompt Conditioning
This is your instruction layer.
Think:
- what the model should do
- how it should behave
- constraints and rules
A good example is a claude.md file:
- defines purpose
- establishes tone
- encodes best practices
It’s not data. It is direction.
Context Conditioning
This is the information layer.
And this is where things get real:
Better context beats better models.
If your context is wrong, outdated, or bloated, your outputs will be too, no matter how good the model is.
What good context looks like
- relevant
- minimal
- structured
- current
What bad context looks like
- entire API docs
- full database dumps
- random logs
What actually works
- 2 to 5 highly relevant documents
- specific function definitions
- targeted error logs
Ordering Matters More Than You Think
LLMs are extremely sensitive to structure.
Earlier content is weighted more heavily.
A reliable pattern is as follows:
1. Instructions
You are a senior C# software engineer performing a strict code review for a production ASP.NET system.
You must analyze code for correctness, security, performance, and maintainability.
2. Constraints
- Only report real issues (no speculation)
- Be concise and actionable
- Follow the output format exactly
- Do not include explanations outside the JSON structure
3. Context
This code runs in a high-traffic payment processing service where correctness and concurrency safety are critical. Bugs may result in financial loss.
4. Examples
Input:
public List<int> GetEvenNumbers(List<int> numbers)
{
var result = new List<int>();
foreach (var n in numbers)
{
if (n % 2 == 1)
{
result.Add(n);
}
}
return result;
}
Output:
{
"issues": [
{
"type": "bug",
"description": "Method returns odd numbers instead of even numbers",
"recommendation": "Change condition to n % 2 == 0"
}
]
}
---
Input:
public decimal Total(List<Order> orders)
{
decimal total = 0;
foreach (var order in orders)
{
total += order.Amount;
}
return total;
}
Output:
{
"issues": []
}
---
5. Task
Analyze the following method and return issues in strict JSON format:
public bool ProcessPayment(User user, decimal amount)
{
if (user.Balance >= amount)
{
user.Balance -= amount;
return true;
}
return false;
}
If you get this wrong, you will see inconsistent behavior and will not know why.
This pattern works because:
1. Instructions
Sets identity and behavior (“what kind of engineer am I dealing with?”)
2. Constraints
Hard rules are locked in early before the model generates anything
3. Context
Frames interpretation (critical in production systems)
4. Examples
Locks in pattern recognition
5. Task
Only now does the model execute
RAG (Done Properly)
Retrieval augmented generation is not about adding more data.
It is about adding the right data at the right time.
For example, using Qdrant:
- embed documents
- retrieve only what is semantically relevant
- inject into context
That is how you get precision.
Not by dumping everything into the prompt.
Prompt Engineering That Actually Works
If you only learn three patterns, make it these:
1. Structured Prompting
Define:
- task
- context
- constraints
- output format
This removes ambiguity, which is the number one cause of bad outputs.
Use this for:
- APIs
- automation
- anything requiring consistency
Example:
You are a senior C# software engineer performing a strict code review.
## Task
Analyze the provided method and identify issues.
## Context
This method is part of a production ASP.NET API that processes user payments.
## Code
public bool ProcessPayment(User user, decimal amount)
{
if (user.Balance > amount)
{
user.Balance = user.Balance - amount;
return true;
}
return false;
}
## Constraints
- Only identify real issues (no speculation)
- Focus on correctness, security, and concurrency
- Be concise and actionable
## Output Format (JSON)
{
"issues": [
{
"type": "bug | security | performance",
"description": "",
"recommendation": ""
}
]
}
2. Few Shot Prompting
Models are extremely good at pattern matching, often better than following abstract rules.
Use this for:
- formatting
- classification
- transformations
- tone consistency
Example:
Convert the following C# foreach loops into LINQ expressions.
Input:
var results = new List<int>();
foreach (var x in numbers)
{
if (x > 0)
{
results.Add(x);
}
}
Output:
var results = numbers.Where(x => x > 0).ToList();
---
Input:
var names = new List<string>();
foreach (var user in users)
{
names.Add(user.Name);
}
Output:
var names = users.Select(user => user.Name).ToList();
---
Input:
var total = 0;
foreach (var order in orders)
{
total += order.Amount;
}
Output:
var total = orders.Sum(order => order.Amount);
---
Input:
var activeUsers = new List<User>();
foreach (var user in users)
{
if (user.IsActive && user.Age > 18)
{
activeUsers.Add(user);
}
}
Output:
3. Step by Step Reasoning
Tell the model to think in stages.
This improves:
- correctness
- depth
- multi step logic
Use this for:
- debugging
- system design
- anything non trivial
Example:
You are a senior C# engineer debugging production code.
Follow these steps explicitly:
1. Understand what the code is supposed to do
2. Identify what is actually happening
3. Determine the root cause of the issue
4. Propose a fix
5. Provide corrected code
---
Problem:
The following method is supposed to return all even numbers from a list, but it is returning incorrect results.
Code:
public List<int> GetEvenNumbers(List<int> numbers)
{
var result = new List<int>();
foreach (var n in numbers)
{
if (n % 2 == 1)
{
result.Add(n);
}
}
return result;
}
What Actually Works in Practice
Not one technique.
All three combined example:
You are a senior C# software engineer performing a strict code review for a production ASP.NET system.
---
## TASK (Structured Prompting)
Review the provided code and identify issues related to:
- correctness
- performance
- security
- maintainability
Return results in the required JSON format.
---
## CONTEXT (Structured Prompting)
This code runs in a high-traffic payment processing service where correctness and concurrency safety are critical.
---
## STEP-BY-STEP INSTRUCTIONS (Step-by-Step Reasoning)
Follow these steps explicitly:
1. Understand what the method is intended to do
2. Analyze what the code is actually doing
3. Identify any mismatches between intent and implementation
4. Determine root causes of any issues
5. Propose specific fixes
---
## EXAMPLES (Few-Shot Prompting)
Input:
public List<int> GetEvenNumbers(List<int> numbers)
{
var result = new List<int>();
foreach (var n in numbers)
{
if (n % 2 == 1)
{
result.Add(n);
}
}
return result;
}
Output:
{
"issues": [
{
"type": "bug",
"description": "Method returns odd numbers instead of even numbers",
"recommendation": "Change condition to n % 2 == 0"
}
]
}
---
Input:
public decimal CalculateTotal(List<Order> orders)
{
decimal total = 0;
foreach (var order in orders)
{
total += order.Amount;
}
return total;
}
Output:
{
"issues": []
}
---
## TASK INPUT
Now analyze the following code:
public bool ProcessPayment(User user, decimal amount)
{
if (user.Balance >= amount)
{
user.Balance = user.Balance - amount;
return true;
}
return false;
}
---
## OUTPUT FORMAT (STRICT JSON)
{
"issues": [
{
"type": "bug | security | performance | maintainability",
"description": "",
"root_cause": "",
"recommendation": ""
}
]
}
That is the baseline for reliable outputs.
Tools, Skills, Hooks, Subagents (The Real Architecture)
This is where most people never go and where the real leverage is.
Tools, Controlled Execution
Tools are deterministic functions the model can call.
Instead of guessing, the model:
reasons → selects tool → passes structured input → gets result → continues
Good tools:
- do one thing well
- have strict schemas
- enforce rules internally
- return consistent outputs
- never fail silently
Custom tool creation instructions found on Claude.com can be used to help you create your own custom tool.
Bad tools are vague and ambiguous.
Good tools are explicit and constrained.
Example tool description, code, and interaction:
Tool Schema (with input_examples):
{
"name": "get_customer_account",
"description": "Fetches customer account details (name as a string, balance as a decimal, status) by customer ID",
"input_schema": {
"type": "object",
"properties": {
"customer_id": {
"type": "string",
"description": "Unique identifier for the customer"
}
},
"required": ["customer_id"],
"input_examples": [
{
"customer_id": "12345"
},
{
"customer_id": "cust_98765"
},
{
"customer_id": "A100200300"
}
]
}
}
Tool Backend Implementation (stored on your system in a referenceable location):
public class CustomerService
{
public CustomerAccount GetCustomerAccount(string customerId)
{
// In real systems this would hit a DB or API
return new CustomerAccount
{
CustomerId = customerId,
Name = "Jane Doe",
Balance = 125.50m,
Status = "Active"
};
}
}
Tool Call Example (Claude Output)
{
"type": "tool_use",
"name": "get_customer_account",
"input": {
"customer_id": "cust_98765"
}
}
Tool Result (Your System)
{
"customer_id": "cust_98765",
"name": "Jane Doe",
"balance": 125.50,
"status": "Active"
}
Final Response (Claude)
Customer cust_98765 (Jane Doe) has a balance of $125.50 and is currently Active.
Skills, How the Model Thinks
A skill is a reusable bundle of:
- instructions
- constraints
- output format
Installation Instructions from Claude.com
Example, a scrum work item creation skill.
Purpose
You are an Agile Scrum expert responsible for creating, refining, and validating work items (User Stories, Tasks, Bugs, Epics) according to industry best practices.
You ensure every work item is:
clear
testable
valuable
technically actionable
aligned with Scrum principles
Core Responsibilities
You must ensure every work item includes the following elements when applicable:
1. Title
Short, specific, and action-oriented
Describes the outcome, not the implementation
2. Work Item Type
Identify one of:
Epic
User Story
Task
Bug
Spike
3. User Story (if applicable)
Use standard format:
As a [role], I want [goal], so that [benefit]
Focus on:
user intent
business value
outcome over implementation
4. Description
Must include:
what needs to be built or changed
context for why it is needed
any constraints or assumptions
Avoid:
implementation details unless necessary
5. Business Value
Explain:
why this matters
impact on users or system
measurable or observable benefit
6. Acceptance Criteria
Must be:
testable
unambiguous
independent where possible
Use format:
Given / When / Then OR bullet conditions
Must include:
success conditions
failure conditions
edge cases where relevant
7. Technical Notes (optional but recommended)
Include:
architecture considerations
APIs or services involved
integration points
data flow notes
Do NOT over-specify implementation unless required.
8. Dependencies
Identify:
upstream systems
downstream systems
teams or services required
9. Definition of Done (DoD)
Must include:
tests written
code reviewed
acceptance criteria met
deployed to target environment
observability added if relevant
10. Story Points (if applicable)
Provide relative sizing based on:
complexity
uncertainty
effort
Do NOT treat as time estimate.
11. Priority
One of:
Critical
High
Medium
Low
Must align with business value and urgency.
12. Sprint Assignment (if used)
Indicate:
sprint number or backlog status
Quality Rules
You must enforce the following:
Clarity Rule
No vague terms like:
“improve system”
“handle properly”
“make better”
Everything must be concrete and verifiable.
Testability Rule:
Every work item must be independently testable.
If it cannot be tested, it is invalid.
Value Rule:
Every item must explain why it exists.
If no value is present, flag it.
Scope Rule:
Work items must be:
small enough to complete in a sprint (unless Epic)
not overloaded with unrelated concerns
Consistency Rule:
Ensure formatting is consistent across all generated work items.
Output Format:
Always return structured work items in this format:
{
"title": "",
"type": "User Story | Task | Bug | Epic | Spike",
"story": "",
"description": "",
"business_value": "",
"acceptance_criteria": [],
"technical_notes": [],
"dependencies": [],
"definition_of_done": [],
"story_points": "",
"priority": "",
"sprint": ""
}
Behavioral Instruction
When given an input requirement:
- Identify missing Scrum elements
- Normalize unclear language
- Convert vague requests into testable work items
- Ensure alignment with Agile best practices
- Reject or flag incomplete inputs when necessary
Key Principle
A good work item describes outcomes and validation, not implementation guesses.
This is a good skill definition because it turns something messy and subjective (writing Agile work items) into a repeatable, enforceable behavioral system for an LLM instead of just a loose set of guidelines.
Skills give you:
- consistency
- reuse
- versioning of behavior
Tools execute.
Skills shape reasoning.
Hooks, Deterministic Execution Points
Hooks are predefined execution points in your system where you can attach deterministic logic, scripts, or AI workflows.
Hook creation instructions can be found on Claude.com
Key idea:
Hooks let you intercept and control the lifecycle of an AI workflow.
Claude exposes 28 different hook events for manipulating Claude in any Claude code session.
Instead of Claude driving everything, your system defines:
WHEN something happens → RUN deterministic logic → THEN call Claude (or not)
Types of Hooks (common patterns):
1. Pre-processing hook
2. Post-processing hook
3. Guard hook
4. post results
Example: Using Scripts inside Hooks to perform context conditioning with QDrant vector database collections:
Flow:
User Request
↓
Pre-Hook
↓
Embed Query
↓
Qdrant Search
↓
Post-Hook (filter / enrich results)
↓
Claude response
Pre-Hook: Before querying Qdrant
Used to improve retrieval quality.
What to do here:
- normalize query
- add metadata filters
- expand query context
- apply user permissions
public SearchRequest PreHook_BuildQdrantQuery(string userQuery, UserContext user)
{
return new SearchRequest
{
Vector = EmbeddingService.Generate(userQuery),
Filter = new
{
tenant_id = user.TenantId,
document_type = "policy"
}
};
}
Why this matters:
You’re ensuring Qdrant only searches relevant slices of data, not everything.
Qdrant Execution (core retrieval step)
var results = qdrantClient.Search(searchRequest);
This is your actual vector lookup.
Post-Hook: After Qdrant returns results
This is where hooks become powerful.
You can:
- filter hallucination-prone results
- rerank results
- deduplicate
- enrich context
- enforce trust rules
Example Post-Hook
public List<Document> PostHook_FilterResults(List<Document> results)
{
return results
.Where(r => r.Score > 0.75)
.Take(5)
.ToList();
}
Example: Context Enrichment Hook
public string PostHook_FormatContext(List<Document> results)
{
return string.Join("\n\n", results.Select(r =>
$"Title: {r.Title}\nContent: {r.Content}"
));
}
This prepares clean context for Claude.
Guard Hook (optional but important)
public void GuardHook_ValidateAccess(UserContext user, Document doc)
{
if (doc.TenantId != user.TenantId)
throw new UnauthorizedAccessException();
}
Why Hooks Matter
Hooks give you:
1. Deterministic control
You decide exactly when logic runs.
2. Separation of concerns
- Claude = reasoning
- Hooks = orchestration
- Scripts = execution logic
3. Production safety
You can enforce:
- validation
- retries
- guardrails
- schema enforcement
before or after the model ever responds.
Subagents
What they are:
Subagents are specialized execution units that perform a single, bounded task inside a larger system.
Think: “a focused worker that does one job and returns results, without controlling the overall workflow.”
Unlike full agents, subagents do not run open-ended loops. They operate under strict constraints defined by the parent system.
How to create a custom sub agent from Claude.com
Key idea
Subagents do not decide the workflow.
They:
- receive a narrow task
- execute within defined boundaries
- return structured output
- terminate immediately
The system (not the subagent) owns control flow.
Example Subagent Flow
Main Orchestrator
↓
Select Subagent (task-specific)
↓
Execute bounded task
↓
Return result
↓
Continue workflow
Example Subagent Tasks
Subagents are typically used for:
- code review (security only)
- data extraction from documents
- Qdrant retrieval formatting
- validation of structured outputs
- summarization of logs or diffs
Each subagent has one responsibility only.
Example: PR Review Subagent
Instead of an autonomous loop, a subagent does this:
Task: Analyze PR diff for security issues
1. Inspect code changes
2. Identify vulnerabilities
3. Return structured findings
4. Stop
Why Subagents Matter
Subagents introduce control and predictability into AI systems.
1. Bounded execution
No runaway loops or uncontrolled tool usage.
2. Deterministic orchestration
The parent system decides:
- when they run
- what they receive
- how results are used
3. Composability
You can chain subagents together:
Retriever Subagent → Analyzer Subagent → Validator Subagent
Why Subagents Are Safer in Production
Full agents can:
- loop indefinitely
- drift from goals
- overuse tools
- increase cost unpredictably
Subagents avoid this by design:
- no looping
- no self-planning
- no uncontrolled execution
Memory (State Matters)
Without memory, everything is stateless and inefficient.
Types:
- short term conversation
- long term databases or vector stores
Use it for:
- preferences
- workflows
- continuity
Guardrails (Non Negotiable)
You should never trust raw model output.
Guardrails include:
- input validation
- output validation
- schema enforcement
- tool restrictions
Example patterns:
- reject PII
- validate JSON
- retry on failure
- block unauthorized tool calls
Evaluation (What Most People Skip)
If you do not measure behavior, your system will degrade silently.
You need:
- test cases
- scoring
- regression checks
Prompt tweaks will break things.
You just will not notice without evals.
This is the same for any person writing software as it is for AI. Writing tests and evaluating the outcomes of the implementation are work that Claude can assist with as-well and are equally subject to peer review.
With the appropriate tests in-place, writing acceptable software with Claude can become more efficient.
The Mental Model That Works
This is the actual flow:
User Request
↓
Agent (optional)
↓
Skill (reasoning)
↓
Tool (execution)
↓
Script (orchestration)
↓
Memory / Retrieval
↓
Guardrails
↓
Evaluation
If you are missing pieces here, you will feel it in production.
Choosing the Right Model (Cost vs Intelligence)
With Claude models:
- Haiku, fast, cheap, low reasoning
- Sonnet, balanced default
- Opus, deep reasoning, expensive
Do not ask which is best.
Ask:
What happens if the model is wrong?
| Impact | Model |
|---|---|
| Low | Haiku |
| Medium | Sonnet |
| High | Opus |
What Actually Works
Route between models.
Example:
Haiku → filter/classify
Sonnet → handle most logic
Opus → escalate edge cases
Most systems overuse top tier models.
You usually do not need them.
Where AI Systems Fail in Production
It does not fail loudly.
Claude fails convincingly.
Common failure modes:
- hallucination under missing context
- prompt brittleness
- tool misuse
- silent incorrect outputs
- context overload
The dangerous part is that everything looks right.
Determinism (Or Lack Of It)
LLMs are not deterministic.
The same input does not produce the same output.
You do not control outputs.
You control probabilities.
That means you must design for:
- validation
- retries
- evaluation
Observability (What Actually Matters in Production)
If you cannot see it, you cannot trust it.
Track:
- inputs
- outputs
- tool calls
- latency
- cost
- failures
Without this:
- debugging is impossible
- optimization is guesswork
- trust breaks down
When NOT to Use Claude
There are cases where an LLM is the wrong tool:
- deterministic logic, use code
- precise math, use calculators
- high risk decisions, do not rely on LLMs alone
Final Insight
LLMs are not systems.
They are components inside systems.
If you treat them like magic, they will fail unpredictably.
If you treat them like infrastructure, they become powerful.
That shift from prompting to system design is where things start to work.
