If you’ve prepared for a UiPath automation interview using our 400-question guide, this is the LangGraph counterpart for the other side of the agentic AI stack — the pro-code, Python-first framework that shows up in interviews for AI engineer, automation architect, and agent-platform roles alike.
- Section 1: LangGraph Fundamentals & Core Concepts (Q1–25)
- Section 2: State Management & Reducers (Q26–50)
- Section 3: Control Flow — Conditional Edges, Command & Send (Q51–75)
- Section 4: Persistence, Durable Execution & Fault Tolerance (Q76–100)
- Section 5: Human-in-the-Loop, Interrupts & Time Travel (Q101–125)
- Section 6: Tools, Tool-Calling & Prebuilt Agents (Q126–150)
- Section 7: Memory & the Store API (Q151–175)
- Section 8: Multi-Agent Architectures (Q176–200)
- Section 9: Streaming & Observability (Q201–225)
- Section 10: Production, Deployment & Framework Comparisons (Q226–250)
- Key Takeaways
- FAQs
- References
Who this is for: developers moving from RPA or traditional backend work into agent frameworks, AI engineers who’ve used LangChain but not LangGraph specifically, and interviewers building a technical screen for either role.
How to use this guide: questions are grouped into 10 sections and arranged in increasing difficulty within each section — start at the top of a section if you’re new to that topic, skip to the later questions if you already know the basics. Every answer is grounded in LangGraph’s official documentation (linked inline), and every code snippet reflects the current Graph API and Functional API surface as of mid-2026. Where LangGraph’s API has changed recently (the v3 event-streaming API, the Command primitive, semantic search in BaseStore), the answer says so explicitly rather than presenting it as settled trivia.
Section 1: LangGraph Fundamentals & Core Concepts (Q1–25)
Start here if you haven’t built a graph before. These questions cover why LangGraph exists, the Pregel execution model underneath it, and the vocabulary (nodes, edges, state) every later section assumes you already know.
Q1. What is LangGraph, in one sentence? LangGraph is a low-level orchestration framework for building stateful, controllable agents and workflows as graphs of nodes and edges, where each node is a unit of computation and the graph’s state is threaded through and updated as execution proceeds. Docs: LangGraph Overview
Q2. How is LangGraph different from plain LangChain? LangChain provides the building blocks — chat models, prompts, tool abstractions, retrievers. LangGraph provides the orchestration layer on top: explicit state, branching and looping control flow, checkpointing, and human-in-the-loop primitives that a linear LangChain chain doesn’t have. You typically use LangChain components inside LangGraph nodes.
Q3. Why would you choose a graph over a simple chain of prompts? A chain executes in one fixed direction. The moment your logic needs to loop (an agent retrying a tool call), branch (route to different nodes based on model output), or pause for a human decision, a linear chain can’t represent that naturally — you end up hand-rolling control flow around it. A graph makes looping, branching, and pausing first-class.
Q4. What execution model does LangGraph use internally? LangGraph runs on Pregel, a bulk-synchronous-parallel graph processing model (originally from Google’s large-scale graph processing paper). Execution proceeds in discrete “super-steps”; all nodes scheduled to run in a given step execute (conceptually in parallel), their writes are applied, and then the next step’s nodes are determined. Docs: Runtime / Pregel
Q5. What are the three things you define to build a graph? A state schema (what data flows through the graph), one or more nodes (functions that read state and return updates), and edges (which connect nodes and determine execution order, either fixed or conditional).
Q6. What’s the minimal code to construct and compile a graph?
from typing import TypedDict
from langgraph.graph import StateGraph, START, END
class State(TypedDict):
topic: str
joke: str
def generate_joke(state: State):
return {"joke": f"A joke about {state['topic']}"}
graph = (
StateGraph(State)
.add_node(generate_joke)
.add_edge(START, "generate_joke")
.add_edge("generate_joke", END)
.compile()
)
compile() validates the graph structure and returns a runnable object with .invoke(), .stream(), and related methods. Docs: Graph API Overview
Q7. What are START and END? They’re special, reserved node names marking the graph’s entry and exit points. Every graph needs at least one edge from START to a real node, and paths through the graph eventually need to reach END (or return a Command that resolves execution) or the run never terminates.
Q8. Can a node be an async function? Yes. LangGraph supports both sync and async node functions, and you invoke the corresponding sync or async method (.invoke()/.ainvoke(), .stream()/.astream()) to match. Mixing sync nodes into an async run generally works, but async nodes require the async invocation path.
Q9. What does add_node() actually register? It registers a callable (or Runnable) under a name in the graph, along with optional configuration like a retry_policy or cache_policy. The name defaults to the function’s __name__ unless you pass one explicitly — worth knowing because that name is what shows up in updates stream output and in metadata like langgraph_node.
Q10. What’s the difference between add_edge() and add_conditional_edges()? add_edge() creates a fixed, unconditional transition from one node to another (or to END). add_conditional_edges() takes a routing function that inspects the current state and returns the name (or names) of the node(s) to run next — this is how you implement branching. Docs: add_conditional_edges reference
Q11. Can multiple nodes run in the same super-step? Yes — if a node has edges fanning out to two or more nodes with no dependency between them, LangGraph runs them concurrently within the same step. This is the basis of parallel fan-out patterns like calling three tools at once and merging their results with a reducer.
Q12. What is “thinking in LangGraph” as opposed to thinking in a plain script? It means modeling your application as explicit state transitions rather than as an imperative sequence of function calls. Instead of asking “what function do I call next,” you ask “what does the state look like after this node runs, and which node(s) should see that state next.” This reframing is what makes persistence, replay, and human-in-the-loop possible without extra plumbing. Docs: Thinking in LangGraph
Q13. What happens if two nodes in the same super-step both write to the same state key without a reducer? Without a reducer defined for that key, LangGraph raises an InvalidUpdateError — concurrent, non-reducer-guarded writes to the same key are ambiguous and the framework refuses to silently pick a winner. Defining a reducer (see Section 2) is exactly how you tell LangGraph how to merge such writes.
Q14. What’s a “graph” versus a “subgraph” in LangGraph terms? A subgraph is just a compiled graph used as a node inside another (parent) graph. From the parent’s perspective it’s a single node; internally it runs its own multi-step execution, and its state can be fully separate from or partially overlapping with the parent’s state schema. Docs: Use subgraphs
Q15. When would you use a subgraph instead of just another node? When a chunk of logic is reusable across multiple parent graphs, when you want to encapsulate a multi-step process (like a research sub-workflow) behind a single interface, or when you’re building a multi-agent system where each agent is itself a small graph.
Q16. What is the “Graph API” versus the “Functional API”? The Graph API is the explicit StateGraph / nodes / edges model covered in this section. The Functional API (@entrypoint and @task decorators, covered in Section 4) lets you write workflows as regular Python functions with loops and conditionals, while still getting persistence, streaming, and human-in-the-loop for free. They’re two front ends over the same runtime. Docs: Functional API overview
Q17. Does LangGraph require you to use LangChain chat models? No. Nodes are plain Python functions — you can call any LLM client (OpenAI’s SDK directly, a self-hosted model, etc.) from inside a node. Using LangChain’s init_chat_model or a specific chat model integration gets you provider-agnostic streaming and tool-calling conventions for free, but it isn’t a hard requirement.
Q18. What’s the difference between LangGraph (Python) and LangGraph.js? They’re parallel implementations of the same core concepts — StateGraph, Pregel execution, checkpointers, Command, streaming — for Python and TypeScript/JavaScript respectively. API shapes are close but not identical (for example, Send and Command are exposed as classes in both, but idiomatic usage differs slightly per language). Pick based on your application’s runtime, not a capability gap.
Q19. Is LangGraph tied to LangSmith? No — LangGraph is open source and runs standalone. LangSmith is LangChain’s observability and deployment product; it adds tracing, evaluation, and (via LangSmith Deployment, formerly “LangGraph Platform”) managed hosting for graphs, but none of that is required to build or run a graph.
Q20. What is a “Send” object used for at a conceptual level, before the mechanics? It lets a conditional edge dispatch a variable number of parallel tasks to the same node, each with its own slice of input — the classic case being map-reduce, where you don’t know ahead of time how many items you’re mapping over. Full mechanics are in Section 3.
Q21. What’s the practical difference between a “workflow” and an “agent” in LangGraph’s own vocabulary? LangGraph’s docs distinguish workflows (predefined code paths — you decide the sequence of steps up front) from agents (the LLM dynamically decides its own steps, typically by choosing which tools to call and when to stop). Most production systems are a hybrid: an outer workflow with an agent as one or more of its nodes. Docs: Workflows and agents
Q22. Why does LangGraph favor explicit graphs over letting an LLM “just figure out” control flow entirely on its own? Full LLM-driven control flow (the LLM decides literally everything, unconstrained) is flexible but unpredictable and hard to debug, test, or put reliability guarantees around. An explicit graph lets you fix the parts of your logic that should be deterministic (validation, routing rules, approval gates) while still giving the LLM freedom where it adds value (reasoning, tool selection within a bounded node).
Q23. What does “low-level” mean in LangGraph’s own description of itself? It means LangGraph doesn’t prescribe a specific agent architecture or hide the state machine from you — you compose your own graph shape rather than configuring a fixed template. Prebuilt agents (Section 6) exist on top of this low-level core for common patterns, but you’re never forced to use them.
Q24. Can a compiled graph be visualized? Yes — a compiled graph exposes methods to render its structure as a Mermaid diagram or PNG, which is useful in interviews and code review alike for showing you actually understand the control flow you built rather than just describing it verbally.
Q25. What’s a common first mistake developers make when moving from a chain to a graph? Treating every node as if it must be a single LLM call, when a node is just “a function that takes state and returns a partial update” — plain deterministic Python (validation, formatting, an API call with no LLM involved) belongs in nodes just as much as model calls do. Overloading single nodes with multiple responsibilities is the second most common mistake, and it makes both debugging and reducer design harder.
Section 2: State Management & Reducers (Q26–50)
State is the one concept everything else in LangGraph builds on. This section covers how to define it, how updates merge into it, and the built-in patterns (MessagesState, add_messages) you’ll see in almost every real graph.
Q26. What can a graph’s state schema be defined as? A TypedDict, a Pydantic BaseModel, or a Python dataclass. All three work as the type argument to StateGraph(...); which one you pick affects validation behavior (Pydantic validates on construction) and whether you get attribute or dict-style access inside nodes.
Q27. What does a node return, and how does that relate to the full state? A node returns a partial update — a dict containing only the keys it’s changing, not the entire state object. LangGraph merges that partial update into the full state using each key’s reducer (or, absent a reducer, by overwriting the key’s previous value).
Q28. What is a reducer, precisely? A function attached to a state key (via Annotated[Type, reducer_fn]) that defines how a new value from a node’s update combines with the key’s existing value. Without one, LangGraph’s default behavior is simple overwrite — the new value replaces the old.
Q29. Show the canonical example of a reducer.
import operator
from typing import Annotated, TypedDict
class State(TypedDict):
# each node's update to `items` is appended, not overwritten
items: Annotated[list[str], operator.add]
Every node that returns {"items": [...]} has its list concatenated onto the existing one via operator.add, rather than replacing it outright.
Q30. What is add_messages and why does almost every chat-oriented graph use it? add_messages is a built-in reducer for lists of chat messages. It appends new messages by default, but if an incoming message has the same id as an existing one, it replaces that message in place instead of duplicating it — which is exactly the semantics you want for streaming partial updates to an existing AI message rather than accumulating duplicates. Docs: Graph API — messages
Q31. What is MessagesState? A prebuilt TypedDict state schema with a single messages field already annotated with add_messages. It’s a convenience so you don’t redeclare the same chat-history pattern in every graph:
from langgraph.graph import MessagesState
class State(MessagesState):
extra_field: str # extend it with your own keys
Q32. Can you write a custom reducer instead of using operator.add? Yes — a reducer is just any two-argument function (existing, new) -> merged. A common custom reducer deduplicates a list, keeps only the last N items, or merges two dicts key-by-key rather than replacing one with the other wholesale.
Q33. If your state is a Pydantic model, does validation run on every node’s partial update? Validation behavior depends on the LangGraph version and streaming mode in use; broadly, the framework coerces/validates the full merged state at defined points rather than validating each node’s raw partial return in isolation, since a partial update alone would fail required-field validation on its own. Check current docs for your exact version before relying on this for input sanitization.
Q34. What’s the difference between “private” node-local state and the graph’s shared state? Shared state (declared on StateGraph) is visible to every node and persisted at checkpoints. If a node needs scratch variables that shouldn’t be part of that shared, persisted schema, it just uses regular local Python variables inside the function — those never touch the graph’s state object and disappear when the node returns.
Q35. How do you give two nodes access to different slices of state (an input schema versus output schema)? StateGraph supports separate input and output schemas in addition to the internal state schema, so a node can be typed to only see (and be validated against) the subset of fields relevant to it, while the internal schema carries everything the graph needs end-to-end.
Q36. What happens to state keys a node doesn’t mention in its return value? They’re left untouched — a node’s return is a partial update, so any key not present in the returned dict simply keeps its prior value going into the next step.
Q37. Why can a dataclass be a better fit than a TypedDict for some graph states? A dataclass gives you attribute access (state.topic) instead of dict access (state["topic"]), default values via field(default=...), and it’s often more ergonomic to extend and unit test in isolation, at the cost of the union-style flexibility TypedDicts offer for partial/optional keys.
Q38. Give an example of a reducer that merges dictionaries instead of overwriting them.
def merge_dicts(existing: dict, new: dict) -> dict:
return {**existing, **new}
class State(TypedDict):
metadata: Annotated[dict, merge_dicts]
Without this, a second node’s update to metadata would silently wipe out keys the first node had already set.
Q39. What’s the risk of using operator.add as a reducer on a list state key that multiple parallel branches write to? None inherently — that’s exactly the pattern parallel fan-out relies on, since each branch’s contribution gets appended rather than racing to overwrite. The risk shows up if you also need deterministic ordering of those appended items across runs; Pregel’s super-step model guarantees all writes in a step are applied, but not necessarily in a specific append order across concurrent branches unless you sort downstream.
Q40. Can state include non-serializable objects (like an open DB connection)? You can put arbitrary Python objects into state, but if you’re using a checkpointer for persistence, every value in state needs to survive whatever serialization the checkpointer uses (pickle by default for most first-party checkpointers). Open connections, threads, and similar objects generally shouldn’t live in checkpointed state — pass them via config/context instead.
Q41. What’s the purpose of Annotated in Annotated[list[str], operator.add]? Annotated is standard Python typing machinery for attaching metadata to a type without changing the type itself. LangGraph specifically looks for a reducer function in that metadata slot; the type checker sees list[str], and LangGraph additionally sees “use operator.add to merge updates to this field.”
Q42. How would you model a counter that increments across nodes?
class State(TypedDict):
step_count: Annotated[int, operator.add]
def some_node(state: State):
return {"step_count": 1} # each call adds 1, doesn't set an absolute value
This is the same pattern as the list case — operator.add works on any type that supports +, not just lists.
Q43. What’s the difference between updating state via a node’s return value and updating it via graph.update_state()? A node’s return value is applied as part of normal graph execution, going through the same reducer logic as any other step. graph.update_state() is called outside normal execution (typically for human-in-the-loop editing or time travel) — it still goes through reducers, but it creates a new checkpoint directly rather than as the result of a node running.
Q44. Why might you deliberately avoid putting large blobs (like full document text) directly in graph state? Every checkpoint serializes the entire state; large objects bloat checkpoint storage and slow down persistence on every single step, even steps that don’t touch that field. A common pattern is storing a reference (file path, object-store key, or document ID) in state and fetching the actual content on demand inside the node that needs it.
Q45. Is state scoped per-thread or global to the whole application? Per-thread. A thread_id (passed via config={"configurable": {"thread_id": ...}}) scopes a distinct conversation or run’s checkpoint history. Two different thread_ids never see each other’s state through the checkpointer — cross-thread data has to go through the separate Store API (Section 7).
Q46. What’s a practical reason to define an explicit output schema separate from your internal state? It lets you hide internal bookkeeping fields (retry counters, intermediate scratch results, raw tool outputs) from whatever’s consuming the graph’s final output — the caller only sees the fields you’ve declared as part of the output schema, keeping your public interface stable even as internal implementation details change.
Q47. Can a state field’s type differ from what a reducer is annotated for, e.g. can you reduce a set instead of a list? Yes — reducers aren’t restricted to lists. A set-typed field with a custom lambda existing, new: existing | new reducer merges via set union; the mechanism is the same regardless of the underlying collection type, as long as your reducer function’s signature matches (existing, new) -> merged.
Q48. What happens if a node raises an exception mid-execution — does partial state from that node get applied? No — a node’s writes are only committed to the checkpoint once the node returns successfully. If it raises, none of its partial update is applied for that attempt; whether the step retries depends on whether a RetryPolicy is attached (Section 4).
Q49. How would you test a single node in isolation without running the whole graph? Since a node is just a plain function taking a state dict/object and returning a partial update, you call it directly with a hand-constructed state fixture and assert on the returned update — no graph, checkpointer, or compilation needed for that level of unit test. Save integration-level assertions (does the graph behave correctly end-to-end) for a separate test tier.
Q50. What’s the tradeoff of putting too much logic into reducers versus into nodes? Reducers should stay small and mechanical — merge these two values this way. Once a reducer starts making decisions that feel like business logic (should this cancel that, should this override under some condition), it becomes hard to trace where behavior actually lives, since reducers run implicitly on every write rather than being called explicitly like a node. If a merge rule needs real judgment, that’s usually a sign that logic belongs in an explicit node instead.
Section 3: Control Flow — Conditional Edges, Command & Send (Q51–75)
This is the section that separates “I can build a linear pipeline” from “I can build an agent that actually branches, loops, and fans out work.” Expect at least a third of a real LangGraph interview to live here.
Q51. What does a conditional edge’s routing function receive and return? It receives the current state (after the preceding node has run) and returns either a single node name (as a string) or a list of node names to route to next. LangGraph then schedules exactly those nodes for the following super-step.
Q52. Write a minimal conditional edge that routes based on a boolean state field.
def route(state: State) -> str:
return "human_review" if state["needs_approval"] else "finalize"
graph.add_conditional_edges("check", route, {"human_review": "human_review", "finalize": "finalize"})
The third argument (a mapping) is optional in many cases but makes the possible destinations explicit for graph visualization and validation.
Q53. What is the Command primitive and what four things can it carry? Command is a return type (or input) that combines a state update with control-flow instructions in one object. It can carry: update (a partial state update, same as a normal node return), goto (which node(s) to run next, like a conditional edge), graph (target the parent graph when returning from inside a subgraph), and resume (a value used as input to continue execution after an interrupt()). Docs: Graph API — Command
Q54. When would you use Command(goto=…) instead of add_conditional_edges()? When the routing decision and the state update naturally belong together in the same node — for example, a node that both writes a result and decides where to go next based on that result, without needing a separate routing function to re-derive the decision from state afterward. It collapses “update state” and “route” into a single return statement.
Q55. Show a node that both updates state and routes using Command.
from langgraph.types import Command
from typing import Literal
def classify(state: State) -> Command[Literal["urgent_path", "normal_path"]]:
label = "urgent_path" if "asap" in state["text"].lower() else "normal_path"
return Command(update={"label": label}, goto=label)
Typing the return as Command[Literal[...]] also lets LangGraph’s graph-drawing tooling render the possible destinations correctly.
Q56. What does Command(graph=Command.PARENT) do? It lets a node inside a subgraph route control back up to the parent graph rather than staying within the subgraph’s own nodes — necessary when a subgraph needs to hand control to a sibling node in the parent, not just to another node in its own scope.
Q57. What is the Send object, and what problem does it specifically solve? Send(node_name, state_for_that_invocation) lets a conditional edge dispatch a dynamic, runtime-determined number of parallel invocations of a node, each with its own custom input — solving map-reduce-style fan-out where you don’t know the count ahead of time (e.g., “run this node once per item in a list whose length depends on a previous step’s output”).
Q58. Show a Send-based map-reduce fan-out.
from langgraph.types import Send
def continue_to_map(state: State):
return [Send("process_item", {"item": item}) for item in state["items"]]
graph.add_conditional_edges("split", continue_to_map)
Each Send triggers a separate execution of process_item with its own isolated {"item": item} input; their results are gathered back into the parent state via whatever reducer the receiving key uses.
Q59. How does the state passed via Send relate to the graph’s overall state schema? It can be a different shape entirely from the main graph state — Send‘s second argument is whatever input the target node expects, which is commonly a narrower dict than the full graph state, precisely because each parallel invocation only needs its own slice of data.
Q60. What’s the difference between a conditional edge returning a list of node names versus using Send? Returning a plain list of node names fans out to a fixed, known set of specific nodes with the same graph state going into each. Send fans out to potentially many invocations of the same node, each with different, per-invocation state — the dynamic-count, per-item-input case a plain list can’t express.
Q61. Can a routing function raise an exception instead of returning a valid destination? It can, and if it does the run fails at that step just like any other unhandled exception in node execution — there’s nothing special protecting routing functions from normal Python error semantics, so validate whatever state field you’re branching on before trusting it blindly.
Q62. How do you implement a loop (e.g., “keep calling this tool-using node until the model stops requesting tools”) in LangGraph? With a conditional edge whose routing function checks the last message for tool calls: if present, route back to the tool-execution node; if absent, route to END (or the next node). This is exactly the shape of the prebuilt ReAct loop under the hood.
Q63. What guards against an infinite loop in a graph like that? LangGraph enforces a recursion limit (configurable via config={"recursion_limit": N}, default 25) on the number of super-steps a single run can execute — once exceeded, the run raises a GraphRecursionError rather than looping forever.
Q64. Why might two different nodes need to route to the same next node, and how do you express that cleanly? It’s common for both a “success” path and a “needs escalation” path to eventually converge on a shared “finalize” or “notify” node. You express it exactly like any other edge — multiple add_edge() calls (or conditional edges) can target the same destination node; LangGraph doesn’t require a tree shape, just a valid DAG-or-cycle-with-a-limit.
Q65. What is a “fan-in” and how does state merging make it safe? Fan-in is multiple parallel branches converging back into a single downstream node. It’s safe specifically because of reducers — if two branches both wrote to the same state key, the reducer defines how those concurrent writes combine (append, merge, etc.) rather than one silently clobbering the other.
Q66. Can you route to END conditionally from the middle of a graph? Yes — END is just another valid destination a conditional edge’s routing function can return, used whenever a branch’s logic determines the run is genuinely finished early rather than needing to reach some fixed “final” node.
Q67. What’s the difference between static graph structure and dynamic control flow at runtime? The graph’s nodes and possible edges are fixed at compile time — you can’t add a node that didn’t exist in the compiled graph. But which of those predefined edges actually get taken, and how many times a Send-based node runs, is entirely dynamic at runtime based on state. This is the core tension conditional edges and Command both resolve: fixed topology, dynamic traversal.
Q68. How would you implement “retry this node up to 3 times with different prompts if the output fails validation” using control flow alone (not RetryPolicy)? Track an attempt counter in state (Annotated[int, operator.add]), have the node’s routing function check both the validation result and the counter, and route back to the same node (adjusting the prompt based on the failure) while under the limit, or forward once valid or once the limit is hit. This is a business-logic retry, distinct from the infrastructure-level RetryPolicy covered in Section 4, which handles transient exceptions, not “the LLM’s output didn’t validate.”
Q69. What happens if a Command’s update conflicts with a reducer expecting a specific shape? The same rules apply as any node return — Command(update={...}) is merged through the target keys’ reducers exactly like a plain dict return would be, so a malformed update fails in the same way (an InvalidUpdateError for un-reduced concurrent writes, or a type error inside a custom reducer that doesn’t defend against the shape it received).
Q70. Why can Command be returned from inside a tool, not just from a graph node? Because agentic tool-calling loops often need a tool’s execution to affect graph control flow directly — e.g., a tool that looks up an order and determines the conversation should jump straight to a “handle_refund” node rather than going back through the LLM for another round of reasoning. Letting tools return Command avoids forcing that decision to be re-derived by a separate router after the fact.
Q71. What’s the interaction between add_conditional_edges() and cycles in the graph? Conditional edges are exactly how cycles get created — a routing function that can return the name of a node “earlier” in the logical flow is what makes looping (retry, re-planning, continued tool use) possible; LangGraph doesn’t distinguish a “cycle edge” from any other conditional edge structurally.
Q72. Give a realistic branching example beyond toy code: routing a customer support graph.
def route_ticket(state: State) -> str:
if state["sentiment"] == "angry" and state["order_value"] > 500:
return "escalate_to_human"
if state["category"] == "refund":
return "refund_flow"
return "auto_respond"
This combines two independent signals (sentiment, order value) with a category check — realistic routing logic is rarely a single flag, and interviewers often probe whether you’d centralize this in one router node or split it into staged conditional edges.
Q73. How do you unit-test a routing function without running the graph? Since it’s a plain function taking state and returning a string (or list of strings/Sends), call it directly with hand-built state fixtures covering each branch and assert on the returned destination — identical testing approach to testing a node in isolation (Q49).
Q74. What’s a common interview follow-up after explaining Send, and how do you answer it? “How do results from all the parallel Send invocations get back together?” — the answer is: through the state key(s) those invocations write to, combined via whatever reducer is attached (commonly operator.add to collect a list of per-item results), which the fan-in node then reads as a complete collection once all parallel branches for that super-step have finished.
Q75. What’s the single most common mistake developers make with conditional edges? Forgetting that the routing function runs after state has already been updated by the preceding node — trying to branch on a value the current step is still computing, rather than the value the previous step already committed. Related: not handling every possible return value of the routing function in the destinations mapping, which surfaces as a confusing runtime error rather than a compile-time one.
Section 4: Persistence, Durable Execution & Fault Tolerance (Q76–100)
Checkpointing is what makes memory, human-in-the-loop, and time travel possible — and it’s also where “toy graph” and “production graph” diverge the most sharply.
Q76. What is a checkpointer, in one sentence? A pluggable backend that saves a snapshot of the graph’s full state (a “checkpoint”) after every super-step, keyed by thread_id, so execution can be resumed, replayed, or inspected later. Docs: Persistence
Q77. Name the built-in checkpointer implementations and when you’d use each. InMemorySaver (formerly MemorySaver) for local development and tests — nothing persists past the process. SqliteSaver for lightweight local/single-instance persistence. PostgresSaver for production, multi-instance deployments needing a real durable store.
Q78. What’s the minimal code to compile a graph with a checkpointer and run it against a specific thread?
from langgraph.checkpoint.memory import InMemorySaver
graph = builder.compile(checkpointer=InMemorySaver())
config = {"configurable": {"thread_id": "conversation-42"}}
graph.invoke({"topic": "ice cream"}, config=config)
Every subsequent .invoke() or .stream() call using that same thread_id continues from wherever that thread’s last checkpoint left off, rather than starting fresh.
Q79. Without a checkpointer, can a graph still run at all? Yes — a checkpointer is optional for a single-shot .invoke() with no persistence needs. It becomes mandatory the moment you need any of: multi-turn memory across separate calls, interrupt()-based human-in-the-loop, or time travel — all three are built directly on the checkpoint history.
Q80. What exactly does a single checkpoint contain? The full graph state as of that step, plus metadata: which step number it is, which node(s) just ran, and a “pending writes” record used for retry safety. get_state() returns this same shape for the latest checkpoint on a thread.
Q81. What’s the difference between “checkpointing” and “durable execution” — are they the same thing? No, and this is a common interview trap. Checkpointing saves state between completed steps — if a node fails mid-execution, whatever it was doing when it crashed is lost and the step reruns from scratch on retry. True durable execution (as offered by systems like Temporal) additionally makes individual side effects within a step resumable/replayable, not just the state between steps. LangGraph’s checkpointing plus RetryPolicy gets you resilient step-level retries; it isn’t the same guarantee as a dedicated durable-execution engine for arbitrarily long, side-effect-heavy single steps.
Q82. Given the answer above, when would a team pair LangGraph with Temporal rather than relying on LangGraph alone? When individual nodes perform expensive, non-idempotent side effects (charging a customer, calling a non-retryable external API) inside a workflow that might span hours or days and must survive process crashes mid-step — Temporal’s durable-execution guarantees cover exactly that failure mode, with LangGraph handling the agent’s reasoning and state while Temporal handles the surrounding durable orchestration.
Q83. What is a RetryPolicy and where can it be attached? A configuration object controlling automatic retries for a node (or, in the Functional API, a @task) when it raises certain exception types. It can be attached per-node via add_node(..., retry_policy=...), or set as a default across the whole graph. Docs: Fault tolerance
Q84. What fields does a RetryPolicy support? max_attempts (including the first try), initial_interval and max_interval (backoff bounds), backoff_factor (exponential multiplier), jitter (randomize interval to avoid thundering-herd retries), and retry_on (which exception types or a custom predicate qualify for retry).
Q85. What does RetryPolicy retry by default, and why is that default deliberately narrow? By default it retries things that look like transient infrastructure failures — connection errors, 5xx-style responses — but not ValueError, TypeError, or RuntimeError, since those almost always indicate a genuine programming bug rather than a flaky dependency. Retrying a bug just re-triggers the same bug three times instead of surfacing it.
Q86. What happens to a node’s partial writes if it fails partway through and then retries? Before each retry attempt, LangGraph clears any writes the failed attempt had already staged for that step, so a retried node starts clean rather than layering a second partial attempt’s writes on top of a first partial attempt’s leftovers.
Q87. Can you attach different retry policies to different nodes in the same graph? Yes — retry policy is a per-node (or per-task) setting, so a node calling a flaky third-party API can have an aggressive retry policy while a node doing pure local computation (where retrying a bug is pointless) has none at all.
Q88. What’s a CachePolicy and how is it different from retrying? A CachePolicy lets you cache a node’s output keyed by its input, so re-running the same input (common during development iteration, or when replaying from an earlier checkpoint) skips re-executing expensive or non-deterministic work like an LLM call — this is about avoiding redundant execution, not about recovering from failure.
Q89. Why does thread_id matter so much operationally, beyond just “which conversation is this”? It’s the unit of isolation for concurrency, persistence, and human-in-the-loop resumption all at once — two requests with the same thread_id racing against a database-backed checkpointer need application-level coordination to avoid interleaved writes, since the checkpointer itself doesn’t serialize concurrent access to a single thread for you.
Q90. How would you migrate a graph’s state schema (add a new required field) without breaking existing persisted threads? Add the new field with a sensible default (or make it Optional) rather than a hard requirement, since old checkpoints won’t have it populated; a node that depends on the new field should handle its absence gracefully for any thread whose checkpoint history predates the schema change, rather than assuming every thread was created after the migration.
Q91. What’s the PostgresSaver’s setup() method for? It creates the checkpointer’s required tables/schema in the target Postgres database — a one-time (or per-migration) step that has to run before the checkpointer can actually persist anything, distinct from constructing the PostgresSaver instance itself.
Q92. Can checkpoints be deleted, and why would you need to? Yes, most checkpointer implementations expose a way to delete a thread’s checkpoint history — needed for data-retention compliance (a user requests deletion of their conversation history) or simply to reclaim storage for threads that are no longer relevant.
Q93. What does “list of pending writes” in a checkpoint’s metadata actually protect against? It’s how LangGraph knows, if a process crashes after a node finishes but before the next step’s scheduling is fully recorded, which writes were already durably committed versus which need to be recomputed on resume — preventing either silently losing a completed node’s output or double-applying it.
Q94. How do checkpoints interact with parallel branches (fan-out) in terms of what gets saved? All writes from every node that ran in a given super-step are captured together in that step’s checkpoint — a fan-out of five parallel nodes produces one checkpoint reflecting all five nodes’ combined, reducer-merged writes, not five separate checkpoints.
Q95. What’s a realistic interview question testing whether you understand checkpoint frequency? “If a node takes 30 seconds and the graph has 10 sequential nodes, how many checkpoints does a single run produce, and what does that mean for storage cost at scale?” — the answer is one checkpoint per completed super-step (so up to 10 here, fewer if steps run in parallel within the same super-step), meaning checkpoint volume scales with graph depth × run volume, which is why teams often trim what’s stored in state (Q44) before scaling to production traffic.
Q96. What’s the operational difference between SqliteSaver and PostgresSaver beyond “one’s a file and one’s a server”? SQLite’s single-writer model makes it a poor fit the moment you have more than one application instance writing concurrently — it’s fine for a single local process or a low-concurrency prototype, but a multi-instance production deployment needs Postgres’s proper concurrent-write support and connection pooling.
Q97. Does a checkpointer store your prompts and model outputs in plaintext by default? Yes, by default the entire state (which typically includes full message history) is serialized as-is — if that includes sensitive data, you’re responsible for encryption at rest (a database-level concern) or redacting sensitive fields from state before they’re checkpointed, LangGraph doesn’t apply field-level encryption on your behalf.
Q98. What does “durable execution mode” configurability at the graph level actually toggle, conceptually? It controls how aggressively LangGraph treats already-completed work as replayable versus safe-to-recompute on resume after an interruption — different modes trade off strict exactly-once semantics against simplicity and performance, which is why LangGraph’s docs frame this as a spectrum rather than a single on/off durability flag.
Q99. How would you explain, to a non-engineer stakeholder, why checkpointing costs anything at all in latency? Every completed step now includes writing a snapshot to a database before the graph is “done” with that step — that’s an extra I/O round-trip per step compared to a purely in-memory, no-persistence run, which is the tradeoff you’re accepting in exchange for resumability, memory, and auditability.
Q100. What’s a good interview answer to “when would you deliberately not use a checkpointer”? Stateless, single-shot utility graphs with no need for memory, replay, or human-in-the-loop — e.g., a graph that classifies one piece of text and returns a label, called fresh each time with no conversational context to preserve. Adding persistence there is pure overhead with no corresponding benefit.
Section 5: Human-in-the-Loop, Interrupts & Time Travel (Q101–125)
If Section 4 was “how state survives,” this section is “how a human gets to change what happens next” — the piece every interviewer probing production-readiness will ask about.
Q101. What does interrupt() do, mechanically? Called inside a node, interrupt(value) pauses the graph’s execution at that exact point and surfaces value (any JSON-serializable payload — a question, a proposed action, a diff to review) to whatever’s driving the graph. The node function’s execution is suspended until the graph is invoked again with a Command(resume=...). Docs: Interrupts
Q102. Show the minimal interrupt-and-resume pattern.
from langgraph.types import interrupt, Command
def human_review(state: State):
decision = interrupt({"action": state["proposed_action"]})
return {"approved": decision == "approve"}
# first call pauses at interrupt():
graph.invoke(initial_input, config=config)
# resuming later, from the same thread_id:
graph.invoke(Command(resume="approve"), config=config)
The value passed to Command(resume=...) becomes interrupt()‘s return value inside the node, and the node re-runs from the top with that value available — which has an important consequence covered next.
Q103. Why does that last point (“the node re-runs from the top”) matter for how you write nodes containing interrupt()? Because the node function resumes by re-executing from its beginning up to and past the interrupt() call, any code before the interrupt() call inside that node runs again on resume. That code needs to be idempotent (safe to run twice) — a side effect like sending an email before the interrupt would fire a second time on every resume unless you guard against it.
Q104. Does interrupt() require a checkpointer? Yes, unconditionally — pausing and later resuming exactly where execution left off is only possible because the checkpointer persisted the state at that point. A graph compiled without a checkpointer can’t use interrupt() meaningfully.
Q105. What’s the difference between interrupt() and a static “breakpoint” set via compile(interrupt_before=[…])? interrupt_before/interrupt_after are compile-time, node-name-based breakpoints that always pause before or after a named node runs, regardless of any runtime condition. interrupt() is called from inside node logic itself, so it can pause conditionally — e.g., only when a proposed action exceeds some risk threshold, not on every single run.
Q106. How do you inspect what a pending interrupt is asking for, from outside the graph? graph.get_state(config).next and the state snapshot’s tasks (or, on the event-streaming API, stream.interrupts/stream.interrupted) surface the pending interrupt’s payload — you read that to render a UI prompt (approve/reject, edit this field, etc.) before deciding what to pass to Command(resume=...).
Q107. Can a single graph run pause on more than one interrupt across its execution? Yes — a graph can hit multiple interrupt() calls across different nodes (or the same node called multiple times via a loop), each one pausing and later resuming independently as the run progresses; there’s no limit of “one interrupt per thread.”
Q108. What’s the difference between Command(resume=…) and Command(update=…) when resuming after a human review step? resume supplies the human’s answer to the specific interrupt() call that’s pending — it becomes that call’s return value. update separately lets you also patch other state fields at the same time you resume, if the human’s input should change more than just what the interrupt was directly asking about.
Q109. How would you implement “let the human edit the agent’s draft before it’s sent,” not just approve/reject it?
def review(state: State):
edited = interrupt({"draft": state["draft"], "action": "edit_or_approve"})
return {"draft": edited} # human's edited text replaces the draft
The interrupt’s payload can carry the full draft for the human to see, and whatever they pass to resume= becomes the new draft — approve is just “resume with the same text unchanged.”
Q110. What’s get_state_history() and what does it return? It returns an iterator over every checkpoint ever recorded for a given thread_id, from most recent to oldest — the full audit trail of every super-step’s state, which is the basis for time travel. Docs: Use time travel
Q111. How do you “rewind” execution to an earlier point and try a different path from there? Find the target checkpoint via get_state_history(), then call graph.invoke(new_input, config={"configurable": {"thread_id": ..., "checkpoint_id": earlier_id}}) (or the equivalent config carrying that checkpoint’s identity) — execution resumes from that earlier point forward, effectively branching a new timeline while the original run’s history remains untouched.
Q112. What does graph.update_state() let you do that pure time-travel-and-replay doesn’t? It lets you edit a checkpoint’s state values before resuming from it, rather than just replaying exactly what happened. Combined with time travel, this is how you implement “the agent made a mistake at step 4 — let’s correct that one field and continue from there” instead of only being able to replay the original mistake verbatim.
Q113. Does update_state() overwrite the original checkpoint, or create a new one? It creates a new checkpoint at the same logical point in the thread’s history, with the altered values — the original checkpoint is preserved untouched, which is exactly what makes this safe to use for exploration without destroying your audit trail.
Q114. What’s a realistic production reason to combine interrupt() with a queue/notification system rather than blocking synchronously? A human reviewer might not be available for minutes or hours — the graph’s execution genuinely needs to sit paused (persisted via checkpoint) while a notification (Slack message, email, ticket) goes out, and only resume whenever the reviewer eventually acts, which could be a very different process invocation entirely from the one that hit the interrupt.
Q115. How does event streaming (Section 9) expose interrupts, versus the lower-level stream_mode API? On the v3 event-streaming API, stream.interrupted is a boolean you check after consuming a stream, and stream.interrupts gives you the structured interrupt payloads directly. On the older stream_mode API, the same information surfaces as a __interrupt__ key in the returned state dict (v1) or a dedicated interrupts field on values stream parts (v2).
Q116. What happens if you call Command(resume=…) on a thread that isn’t actually paused at an interrupt? This is generally a misuse — resuming implies there’s a pending interrupt waiting for that specific value. Behavior in that case depends on version and isn’t something to rely on; the correct pattern is always to check get_state(config).next (or stream.interrupted) first to confirm a run is actually paused before attempting to resume it.
Q117. Why is human-in-the-loop design something interviewers specifically probe for at senior levels? Because getting the mechanics of interrupt() right is necessary but not sufficient — the harder design question is which actions in a given workflow actually warrant a pause (irreversible, high-value, or ambiguous ones) versus which should run fully autonomously, and how escalation thresholds are decided and maintained over time. That’s a judgment/architecture question, not an API-syntax one.
Q118. What’s the relationship between interrupt() and the idea of “durable execution” from Section 4? They’re built on the same foundation — a paused interrupt() is only recoverable across arbitrary delays (including a full process restart) because the checkpointer already persisted everything needed to resume; without durable checkpointing, interrupt() would only be able to pause for as long as the current process stays alive in memory.
Q119. How would you let a human reject an action and provide a reason that feeds back into the agent’s next attempt?
def review(state: State):
result = interrupt({"action": state["proposed_action"]})
if result["decision"] == "reject":
return {"rejected_reason": result["reason"], "attempt": state["attempt"] + 1}
return {"approved": True}
The interrupt payload the human resumes with can be a structured object (not just a string), letting a single interrupt carry both the decision and supporting context back into state for a routing function to act on next.
Q120. Can time travel be used for anything other than debugging or correcting mistakes? Yes — a common use is exploring “what if” alternatives for evaluation purposes: replay the same conversation from a fixed checkpoint with a different prompt or model, and compare the two resulting trajectories side by side, which is a cheap way to A/B test a change against a real historical scenario rather than only against synthetic test cases.
Q121. What’s the difference between “replay” (re-running from a checkpoint with the same input) and “branch” (re-running with different input)? Replay reproduces exactly what already happened, useful for verifying determinism or debugging with full visibility into a past run. Branching intentionally diverges from that point forward — same history up to the checkpoint, different path afterward — which is what both correcting a mistake and “what if” exploration actually rely on.
Q122. What would you check first if a human-in-the-loop workflow appears to “lose” state after a reviewer approves an action? Whether the node’s pre-interrupt code is accidentally non-idempotent (Q103) and is resetting or overwriting a field every time it re-runs on resume, and whether Command(resume=...)‘s value is actually reaching the intended interrupt() call rather than a different pending interrupt earlier in a multi-interrupt thread.
Q123. How does interrupt() interact with parallel branches — if two parallel nodes both call interrupt(), what happens? Each call surfaces its own interrupt payload, and the run pauses until all pending interrupts for that step have been resumed — you generally need to resume each one (or provide resume values keyed appropriately) rather than assuming a single Command(resume=...) call satisfies every outstanding interrupt in a fan-out.
Q124. Why might a team build their own lightweight approval UI on top of get_state()/interrupt() rather than using a prebuilt tool? Because the approval UI’s requirements are almost always domain-specific — what fields to show a human reviewer, what edit affordances make sense, what audit trail format compliance requires — and interrupt()‘s payload is intentionally a plain JSON-serializable value precisely so it can back whatever bespoke UI a team already has, rather than forcing a specific reviewer interface.
Q125. What’s a good closing answer to “what’s the single biggest risk of human-in-the-loop design done poorly”? Interrupt fatigue — routing so many low-stakes decisions to a human reviewer that they start rubber-stamping approvals without real scrutiny, which defeats the entire purpose of the checkpoint. The fix is the same judgment call as Q117: reserve interrupts for genuinely high-stakes or ambiguous decisions, and let everything else run autonomously with strong guardrails and after-the-fact monitoring instead.
Section 6: Tools, Tool-Calling & Prebuilt Agents (Q126–150)
Most real LangGraph graphs revolve around an LLM deciding to call tools. This section covers the mechanics of that loop and the prebuilt helpers that implement the common version of it for you.
Q126. What is create_react_agent and what does it build? A prebuilt function (from langgraph.prebuilt, split out as part of LangGraph 0.3’s move toward first-class prebuilt agents) that constructs a complete ReAct-style tool-calling agent graph — model call, conditional routing to tools if the model requested any, tool execution, and looping back to the model — from just a model and a list of tools, without you hand-wiring that graph yourself. Docs: create_react_agent reference
Q127. Show the minimal create_react_agent usage.
from langgraph.prebuilt import create_react_agent
agent = create_react_agent(
model="anthropic:claude-sonnet-4-6",
tools=[search_tool, calculator_tool],
)
agent.invoke({"messages": [{"role": "user", "content": "What's 42 * 17?"}]})
The returned object is itself a compiled graph — you can still pass a checkpointer to create_react_agent(..., checkpointer=...) and get all the same persistence, streaming, and interrupt capabilities as a hand-built graph.
Q128. What is ToolNode and how does it relate to create_react_agent? ToolNode is the prebuilt node that actually executes tool calls requested by a model — given the last AI message’s tool_calls, it looks up and invokes each named tool with the provided arguments and returns the results as tool messages. create_react_agent uses a ToolNode internally; you can also use it directly if you’re hand-building a similar loop with custom routing around it.
Q129. How does a model “decide” to call a tool at the LangGraph level — what’s actually happening? The chat model is bound to the tool definitions (via .bind_tools([...]) or equivalently by passing tools to init_chat_model), which tells the underlying provider’s API about the available functions and their schemas. The model’s response then either contains normal text or one or more tool_calls entries; a conditional edge inspects the last message for the presence of tool_calls and routes to ToolNode if present, or ends/continues otherwise.
Q130. What happens if a tool raises an exception during execution inside ToolNode? By default, ToolNode catches tool execution errors and returns them as a tool message content (so the model gets to see the error and can decide how to react — retry with different arguments, apologize to the user, try a different tool) rather than crashing the whole graph run. This behavior is configurable if you want errors to propagate instead.
Q131. How do you give a tool access to the graph’s current state, not just the arguments the model provided? By adding a parameter annotated with InjectedState to the tool function’s signature — LangGraph recognizes that annotation and populates it from the graph’s current state automatically, without exposing that parameter to the model (so the model never has to “guess” values that should come from state rather than from its own reasoning).
Q132. What’s InjectedToolCallId used for? It gives a tool access to its own tool_call_id — useful when the tool itself needs to construct a Command that includes a properly-formed tool message referencing the call that triggered it, since the tool message response has to carry that ID for the model to correctly associate the result with its request.
Q133. Can a tool return a Command instead of a plain string/dict result? Yes (referenced in Q70) — a tool can return Command(update={...}, goto=...) to both supply its result and influence graph control flow directly from inside the tool, rather than requiring a separate node afterward to inspect the tool’s output and decide what happens next.
Q134. What’s the difference between a “tool” in the LangChain/LangGraph sense and a plain Python function? A tool wraps a plain function with a name, description, and an argument schema (typically inferred from type hints and docstring, or defined explicitly via a Pydantic model) that gets serialized and sent to the model’s API so the model knows the tool exists and how to call it correctly — the @tool decorator is the common way to turn a plain function into that wrapped form.
Q135. Why does a tool’s docstring/description matter so much for reliability? The model chooses which tool to call and how to call it based entirely on the name, description, and parameter descriptions it’s given — a vague description (“gets data”) leads to the model guessing wrong or misusing the tool’s parameters, while a precise one (“returns order status, amount, and date for a given order ID; do not use for inventory lookups”) measurably improves correct tool selection, since that description functions as part of the prompt.
Q136. How would you limit which tools are available depending on graph state (e.g., a user’s permission level)? Rather than binding a fixed tool list once, build the tool list dynamically inside the node that calls the model — filtering the full tool set down to whichever subset the current state’s permission level allows — before binding that filtered list to the model call for that particular invocation.
Q137. What’s the recommended way to handle a tool that needs human approval before executing (e.g., “send this email”)? Combine interrupt() with the tool-calling loop: route to a review node before the actual side-effecting tool executes, surface the proposed tool call’s arguments via interrupt(), and only invoke the real tool (or a Command reflecting the approved action) once a human has resumed with approval — never let a destructive tool execute unconditionally inside ToolNode if it needs a human gate first.
Q138. What is create_supervisor and how does it differ from create_react_agent? create_supervisor (from the separate langgraph-supervisor package) builds a multi-agent graph where a central supervisor agent decides, turn by turn, which of several specialized sub-agents (each often itself built with create_react_agent) should handle the current request — it’s a level up from a single tool-calling agent, orchestrating multiple agents rather than multiple tools. Docs: langgraph-supervisor reference
Q139. Can prebuilt agents like create_react_agent be customized, or are you stuck with the default loop shape? They expose meaningful customization points — a custom system prompt/state modifier, a custom state schema extending the default, hooks for pre-model and post-model processing — without requiring you to fork the whole implementation; if your needs go beyond what those hooks support, that’s the signal to hand-build the graph with the Graph API instead.
Q140. What’s a common reason a tool-calling loop never terminates in practice? The model keeps requesting tool calls indefinitely — often because a tool’s results aren’t actually resolving the model’s underlying question (bad tool design, Q135), or because there’s no explicit stopping instruction/limit in the system prompt or a recursion-limit-style safeguard, so the model has no signal that it should conclude and respond directly instead of calling yet another tool.
Q141. How would you stream just the tool-call arguments as they’re generated, not the final tool result? Using the v3 event-streaming API’s message.tool_calls projection (or, in raw content-block terms, filtering for tool-call-typed content blocks), you get the incrementally-generated tool-call arguments as the model produces them, distinct from stream.tool_calls (via ToolCallTransformer), which surfaces the correlated call and its eventual execution result together.
Q142. What’s the difference between binding tools to a model directly versus letting create_react_agent do it? Functionally similar — create_react_agent calls .bind_tools() (or the equivalent) for you as part of constructing its internal model-calling node. The difference is convenience versus control: binding tools yourself inside a hand-built node gives you a place to add custom logic (dynamic tool filtering, per-call configuration) that the prebuilt agent’s default construction doesn’t expose without using its customization hooks.
Q143. What’s a realistic multi-tool scenario interviewers use to test whether you understand tool-call routing, not just definition? “The agent has a search tool and a calculator tool. The user asks a question that needs both, in sequence, where the calculator’s input depends on the search result.” The correct answer walks through multiple loop iterations: model requests search → ToolNode executes it → result returns to model → model requests calculator with the search-derived number → ToolNode executes that → model finally responds with text and no further tool calls, ending the loop.
Q144. Why might you deliberately keep tool execution outside the graph’s checkpointed state (e.g., not storing full raw API responses in state)? Same reasoning as Q44 — a tool that returns a huge payload (a full document, a large dataset) bloats every subsequent checkpoint if stored verbatim in state; a common pattern is having the tool store the raw result externally (cache, object store) and return only a reference or a trimmed summary into state for the model to reason over.
Q145. What does “middleware” mean in the context of LangChain’s newer agent-building surface, and how does it relate to LangGraph? Middleware refers to composable hooks that intercept and modify agent behavior at defined points (before/after a model call, before/after tool execution) without rewriting the underlying graph — it’s a higher-level convenience layered on top of the same LangGraph primitives (nodes, edges, state) covered throughout this guide, aimed at making common cross-cutting concerns (logging, guardrails, retries) reusable across agents.
Q146. How do you test a tool-calling agent’s behavior without hitting a real LLM API on every test run? Use a fake/stub chat model that returns pre-scripted tool-call requests for given inputs (LangChain ships test utilities for exactly this), so you can assert the graph routes correctly and ToolNode executes the right tool with the right arguments — deterministic, fast, and free of live-API flakiness — reserving real-model tests for a smaller integration-test tier.
Q147. What’s a tool-design mistake that looks fine in isolation but breaks down in a multi-tool agent? Two tools with overlapping, ambiguous descriptions (e.g., both plausibly described as “look up customer information”) — the model can’t reliably pick the right one, and the failure mode isn’t a crash, it’s silently calling the wrong tool and returning a confidently wrong answer, which is much harder to catch in testing than an outright error.
Q148. Can create_react_agent’s default agent use structured output for its final response instead of free text? Yes, current versions support configuring a response format/structured output schema so the agent’s final answer (once it stops calling tools) is validated against and returned as a structured object rather than only free-form text — useful when the agent’s output feeds directly into downstream code rather than being shown to a human as-is.
Q149. Why is “the model decided not to call any tools when it should have” a harder bug to debug than “the tool call failed”? A failed tool call produces a visible error you can trace directly. A model that should have called a tool but didn’t leaves no error at all — it just answers from its own (possibly wrong or outdated) knowledge instead of using the available tool, and the only way to catch it is evaluation against known-correct expected behavior, not error-log inspection.
Q150. What’s the honest tradeoff of using create_react_agent versus hand-building the same loop with the Graph API? create_react_agent gets you a correct, maintained implementation of a very common pattern in a few lines, and you inherit fixes/improvements to that pattern for free. Hand-building gives you full control over every routing decision, state field, and intermediate node — worth it the moment your agent’s loop needs to deviate meaningfully from plain “call model, call tools if requested, repeat,” which happens more often in production systems than beginner tutorials suggest.
Section 7: Memory & the Store API (Q151–175)
Checkpointers (Section 4) give you memory within a thread. This section covers memory that needs to survive across threads — a returning user, a fact learned in one conversation that should inform another.
Q151. What’s the fundamental difference between short-term and long-term memory in LangGraph’s own terminology? Short-term memory is thread-scoped conversation history, handled by the checkpointer — it’s naturally tied to one ongoing interaction. Long-term memory is scoped across threads (and often across users or sessions entirely), handled by a separate BaseStore, because a checkpointer’s thread_id scoping is the wrong shape for “remember this fact about this user regardless of which conversation they start next.” Docs: Memory overview
Q152. What is BaseStore? An interface for persisting and retrieving arbitrary key-value data organized into namespaces, independent of any particular thread — the mechanism long-term memory is built on. Built-in implementations include InMemoryStore (development) and PostgresStore (production).
Q153. Show the minimal pattern for saving and retrieving a memory via the store.
from langgraph.store.memory import InMemoryStore
store = InMemoryStore()
namespace = ("memories", user_id)
store.put(namespace, "preferences", {"likes": "concise answers"})
item = store.get(namespace, "preferences")
Namespaces are tuples, letting you organize memories hierarchically (by user, by application area, by memory type) however your application needs.
Q154. How does a node or tool access the store at runtime? The store is passed into the compiled graph (builder.compile(store=store, checkpointer=checkpointer)), and a node or tool accesses it via a parameter annotated to receive the injected store at call time, similar in spirit to how InjectedState (Q131) works for graph state.
Q155. What is semantic search in the context of BaseStore, and when was it added? It’s the ability to query the store with a natural-language query and get back memories ranked by embedding similarity to that query, rather than only exact-key lookup — added as a capability across PostgresStore, InMemoryStore, LangGraph Studio, and LangGraph Platform deployments, letting an agent recall “something relevant to this topic” without knowing the exact key a past memory was stored under. Docs: Semantic search for LangGraph memory
Q156. How do you configure semantic search for a store? You specify an embedding provider and model (e.g., "openai:text-embedding-3-small"), a vector dimension size, and which fields of a stored item should be indexed for embedding — either programmatically when constructing the store, or in langgraph.json‘s store configuration block when deploying to LangGraph Platform.
Q157. What’s the difference between store.get() and store.search()? get() is an exact lookup by namespace and key — you already know precisely what you’re retrieving. search() queries across a namespace, optionally with a natural-language query for semantic ranking, returning the most relevant matches when you don’t know the exact key, which is the more common access pattern for an agent recalling relevant-but-not-exactly-known facts.
Q158. What are the three integration patterns for adding long-term memory to an agent, at a high level? As a tool the model can explicitly call (“remember this,” “recall anything about X”), as logic baked directly into a node (automatically save/load relevant memories before or after a model call without the model having to ask), or via the BaseStore accessed more indirectly through a dedicated memory-management subsystem — the right choice depends on whether you want memory access to be an explicit, model-visible decision or an implicit, always-on background behavior.
Q159. Why would you choose “memory as an explicit tool” over “memory baked into every node automatically”? Explicit-tool memory gives the model (and your evaluation/observability tooling) visibility into exactly when memory was consulted or written, which matters for debugging and for cases where memory access has a cost (embedding calls, retrieval latency) you don’t want paid on every single turn regardless of relevance.
Q160. What does “cross-thread memory” actually solve that a very long single-thread conversation couldn’t? A single thread’s history grows unboundedly and eventually exceeds context-window-practical limits even with summarization; more importantly, real usage patterns are naturally multi-thread (a user starts a new conversation tomorrow, or interacts through a different channel) — cross-thread memory via the store lets facts learned in thread A be available in an entirely separate thread B, which checkpointing alone structurally cannot do.
Q161. How would you decide what’s worth saving to long-term memory versus what should just live in a given thread’s checkpoint history? Durable, user-level facts that should hold regardless of conversation context (stated preferences, profile details, standing instructions) belong in the store; conversational specifics relevant only to the current exchange (what the user just asked five messages ago) are exactly what thread-scoped checkpointing already handles well — saving everything to long-term memory indiscriminately just recreates unbounded-context problems one layer up.
Q162. What’s a namespace collision risk when designing your store’s namespace scheme, and how do you avoid it? Using something ambiguous (like just a username string) as a namespace segment risks collisions if two logically distinct memory types share that same segment structure — a more robust scheme includes an explicit type/category segment (("memories", "preferences", user_id) vs. ("memories", "facts", user_id)) so retrieval and writes for one category can’t accidentally clobber or leak into another.
Q163. Can memory stored via BaseStore be shared across multiple different graphs/applications, not just multiple threads of the same graph? Yes, in principle — since the store is a standalone key-value/semantic-search backend independent of any specific graph’s compilation, any application with a reference to the same underlying store (same Postgres instance, same connection config) can read and write the same namespaces, which is how organizations share a user-memory layer across multiple agent products.
Q164. What are the four “parallel strategies” sometimes described for memory retrieval, beyond plain semantic similarity? Semantic (embedding similarity to the query), keyword/BM25-style (exact or near-exact term overlap), graph traversal (finding memories connected through shared entities/relationships), and temporal (weighting more recent memories higher) — a production memory system often blends more than one of these rather than relying on embedding similarity alone, since pure semantic search can miss an exact-term match a keyword search would have caught immediately.
Q165. What’s a realistic failure mode of relying on semantic search alone for memory recall? A query using very specific, distinctive terminology (an exact product SKU, a precise error code) can retrieve worse results via pure embedding similarity than a simple exact/keyword match would, because semantically-similar-but-wrong memories can outscore the one memory with the literal exact term — which is the practical argument for blending keyword and semantic strategies rather than treating semantic search as a strict upgrade over keyword search.
Q166. How do you test memory-dependent agent behavior without needing an actual persisted store between test runs? Instantiate a fresh InMemoryStore per test, pre-populate it with whatever memories the test scenario assumes already exist, and assert on the agent’s behavior given that fixture — same isolation principle as testing with InMemorySaver instead of a real database-backed checkpointer (Q77).
Q167. What’s the relationship between the store’s namespaces and multi-tenant applications (many separate customers/organizations)? Including a tenant or organization identifier as a namespace segment is the natural way to enforce memory isolation between tenants at the data-access level — critically, this needs to be enforced in your application’s access-control logic (which namespaces a given request is allowed to query), since BaseStore itself doesn’t inherently understand or enforce tenant boundaries on your behalf.
Q168. Why might an interviewer ask “how would you expire old memories” and what’s a reasonable answer? Because unbounded memory accumulation eventually hurts both retrieval quality (more noise competing with genuinely relevant memories) and storage cost. A reasonable answer covers explicit TTL/expiration on writes if the store implementation supports it, or a periodic background job that prunes or archives memories past some age or below some usage/relevance threshold, rather than assuming memory should simply persist forever unmanaged.
Q169. What’s the difference between “episodic” and “semantic” memory as sometimes discussed in agent-memory design (not to be confused with semantic search)? Episodic memory is memory of specific past events or interactions (“last Tuesday, the user asked about refunds and was frustrated”). Semantic memory (in this cognitive-science sense) is memory of general facts or knowledge, decoupled from any specific episode (“this user prefers email over phone contact”) — both can live in the same BaseStore, but conflating them under a single undifferentiated “memories” namespace makes both harder to retrieve precisely.
Q170. How would you evaluate whether an agent’s memory system is actually helping, rather than just adding latency and cost? Compare task success or user-satisfaction metrics between otherwise-identical agent runs with memory enabled versus disabled on a representative test set — memory that isn’t measurably improving outcomes on real scenarios is a cost (retrieval latency, embedding spend, occasional wrong-memory-retrieved errors) without a demonstrated benefit, and that’s a legitimate finding, not just an implementation bug to fix.
Q171. Can a subgraph (Section 1) have its own isolated store, separate from the parent graph’s store? The store, like the checkpointer, is typically configured at compile time and inherited by subgraphs run within the parent’s execution — but nothing structurally prevents compiling a subgraph independently with its own distinct store reference if a use case genuinely calls for isolated long-term memory scoped only to that subgraph’s concerns.
Q172. What’s a common mistake teams make when first adding long-term memory to an existing agent? Saving every single interaction indiscriminately to long-term memory “just in case,” rather than being deliberate about what’s actually worth remembering (Q161) — this bloats storage, slows retrieval, and often degrades answer quality because irrelevant memories compete with genuinely useful ones during semantic search.
Q173. How does long-term memory interact with the human-in-the-loop patterns from Section 5? A memory-write step is itself a reasonable candidate for a lighter-weight interrupt or at least an audit log — particularly for memories that will influence future automated decisions — since an incorrectly saved “fact” about a user can silently bias every future interaction that retrieves it, which is a harder-to-detect failure than a single bad response.
Q174. What’s the argument for keeping the store and the checkpointer as genuinely separate systems, rather than trying to unify them? They have different natural access patterns and lifecycles — checkpoint history is inherently sequential and thread-scoped (a linear chain of “what happened next”), while long-term memory is inherently associative and cross-cutting (arbitrary facts retrieved by relevance rather than by position in a sequence) — collapsing them into one system tends to produce a data model that’s awkward for both use cases rather than good at either.
Q175. What’s a strong closing answer to “how would you design the memory system for a customer support agent used by thousands of users”? Thread-scoped checkpointing for in-conversation context, a BaseStore namespaced by user ID (and organization ID if multi-tenant) for durable facts like preferences and known account details, semantic search for recalling relevant past interactions without needing exact keys, an explicit retention/expiration policy rather than unbounded accumulation, and — critically — an evaluation harness measuring whether memory retrieval is actually improving resolution quality rather than assuming it does by default.
Section 8: Multi-Agent Architectures (Q176–200)

Once a single agent’s tool-calling loop isn’t enough — because different tasks genuinely need different expertise, prompts, or tool access — you’re into multi-agent territory. This is also where LangGraph, CrewAI, and AutoGen get compared most directly in interviews.
Q176. What’s the core reason to split one agent into multiple agents, rather than giving one agent a very large tool list and a very long system prompt? Focus and reliability — a single agent juggling twenty tools and a sprawling system prompt covering unrelated domains tends to make worse tool-selection and reasoning decisions than several narrower agents, each with a small, coherent tool set and a prompt scoped to one job, coordinated by an explicit handoff mechanism.
Q177. What is the “supervisor” multi-agent pattern? A central supervisor agent receives each turn, decides which specialized sub-agent should handle it, delegates, and (in the common design) receives control back afterward to decide the next step — every handoff is mediated by that single decision-maker rather than sub-agents transferring control directly to each other. Docs: langgraph-supervisor
Q178. What is the “swarm” multi-agent pattern, and how does it differ from supervisor? In a swarm, agents hand off control directly to one another based on their own assessment of which specialist is now needed, and the system tracks which agent was last active so a follow-up turn resumes with the same agent rather than routing back through a central decision-maker — there’s no single supervisor node deciding every handoff. Docs: langgraph-swarm
Q179. Given Q177 and Q178, when would you choose swarm over supervisor? When agent-to-agent handoff is naturally peer-like rather than hierarchical — e.g., a billing agent that, mid-conversation, recognizes a question is actually a technical support issue and hands off directly, versus needing to report back up to a central router first. Choose supervisor when you want one place that owns and can audit every routing decision; choose swarm when direct, decentralized handoff better matches how the work actually flows.
Q180. How is a “handoff” between agents typically implemented at the LangGraph primitive level? Via Command(goto=target_agent_name, update={...}) returned from the active agent (or from a dedicated handoff tool it calls) — the same Command primitive from Section 3, just used at the granularity of “which whole agent runs next” rather than “which node within one agent’s internal graph runs next.”
Q181. What state is typically shared versus kept private when multiple agents operate in the same graph? Shared state commonly includes the conversation’s message history (so each agent has full context of what’s happened) and any task-level facts relevant across agents; agent-private state (an agent’s own scratch reasoning, its own tool-call intermediate results) is often kept out of the shared schema, sometimes by giving each agent its own subgraph with a narrower internal state.
Q182. What’s a network (as opposed to supervisor or swarm) multi-agent topology? A topology where any agent can potentially route to any other agent directly, without a strict hierarchy (supervisor) or a simple last-active-agent handoff convention (swarm) — more flexible, but correspondingly harder to reason about and debug, since there’s no single place enforcing which handoffs are actually sensible.
Q183. Why is “shared state is the default” (in both langgraph-supervisor and langgraph-swarm) worth calling out explicitly in an interview? Because it means every sub-agent, by default, sees the full conversation and shared context rather than operating in an isolated bubble — which is usually desirable for coherence, but it also means a sub-agent’s internal reasoning or scratch state can leak into what other agents see unless you’ve deliberately scoped what’s actually shared versus kept in a private subgraph state.
Q184. How would you prevent one agent’s tool-calling loop from interfering with another agent’s, if both share the same top-level state? Give each agent’s internal tool-calling logic (its own create_react_agent-built subgraph, for instance) its own internal state scope for intermediate tool-call bookkeeping, and only pass the narrower, agreed-upon shared fields (final results, conversation history) up into the parent multi-agent graph’s shared state — the same input/output schema separation from Q35, applied at the multi-agent level.
Q185. What’s a realistic multi-agent interview scenario, and how would you talk through designing it? “Design a multi-agent system for an e-commerce support bot: order status, returns, and general product questions.” A strong answer identifies three specialized agents (each with its own narrow tool set — order lookup API, returns/refund API, product catalog search), a supervisor (or swarm, justified by reasoning like Q179) deciding routing, shared conversation history, and an explicit human-in-the-loop gate on the returns agent specifically, since refunds are the one action here with real financial consequence.
Q186. How does multi-agent design change your approach to human-in-the-loop compared to a single-agent system? You need to decide not just whether a given action needs approval but which agent’s actions need it — a refund-approval gate belongs specifically on the returns agent’s side effects, not globally across every agent’s every action, since gating everything defeats the purpose of specialization and gating nothing misses the one agent whose mistakes are actually expensive.
Q187. What’s a common reliability failure mode specific to multi-agent systems that doesn’t show up in single-agent graphs? Handoff loops — Agent A hands off to Agent B, which (misjudging the situation) hands back to A, which hands back to B again, with no forward progress and no single agent clearly “stuck” in the way a single-agent tool-loop failure would be. Guarding against this usually means tracking handoff count/history in shared state and routing to a human or a fallback path if handoffs exceed a sane threshold.
Q188. How would you evaluate a multi-agent system’s routing quality specifically, separate from evaluating each agent’s individual task performance? Build a labeled test set of representative inputs with the correct target agent (or correct sequence of agents) for each, and measure the supervisor’s (or swarm’s handoff logic’s) routing accuracy against those labels independently of whether each individual specialist agent then performs its task well — routing and task execution are separate failure modes that need separate evaluation to debug effectively.
Q189. Can a subgraph-based agent in a multi-agent system have its own separate checkpointer or must it share the parent’s? In common usage each agent’s subgraph inherits the checkpointer configured on the overall compiled graph, since persistence is generally meant to capture the whole multi-agent run coherently as one thread — genuinely isolating one agent’s checkpointing from the rest is an advanced, less common configuration you’d only reach for with a specific isolation requirement.
Q190. What’s the tradeoff of a deeply hierarchical (supervisor-of-supervisors) multi-agent design versus a flatter one? Hierarchy scales your ability to reason about large numbers of specialized agents by grouping them under intermediate supervisors, but each additional layer adds latency (more round trips before a request reaches the agent that actually does the work) and makes end-to-end tracing harder — a flatter design is easier to debug and faster per-turn, at the cost of one supervisor’s routing prompt eventually growing unwieldy if it’s coordinating too many peers directly.
Q191. How do you decide the boundary of what counts as “one agent” versus “two agents that should be merged into one”? If two capabilities always get invoked together, need the same tools, and never operate independently of each other in practice, splitting them into separate agents usually just adds handoff overhead without a real specialization benefit — the split is worth it when the two capabilities have genuinely different tool sets, prompts, or failure characteristics that benefit from being reasoned about (and evaluated, and gated) separately.
Q192. What’s the relationship between multi-agent architectures in LangGraph and the “Deep Agents” pattern? Deep Agents (built on top of LangGraph) is specifically about a planning agent that manages subagents, a virtual file system, and long-running task decomposition — it’s one particular, opinionated multi-agent shape (a main agent spawning and coordinating scoped subagents for pieces of a larger task) rather than a general-purpose alternative to supervisor/swarm; you’d reach for it specifically when a task benefits from explicit planning and file-based state rather than direct conversational handoff.
Q193. Why might streaming (Section 9) be more complicated in a multi-agent graph than a single-agent one? Because you now need to attribute streamed tokens, tool calls, and lifecycle events to the specific agent (and, in a hierarchical design, the specific subgraph nesting level) producing them — which is exactly the problem stream.subgraphs (from the v3 event-streaming API) is designed to solve, surfacing each nested agent’s execution as its own object rather than a flat, unattributed event stream.
Q194. What’s a good way to explain, to a skeptical stakeholder, why a multi-agent system costs more to run than one agent handling everything? Every handoff typically involves at least one additional model call (the supervisor deciding where to route, or the current agent deciding to hand off) on top of whatever work the specialist agent itself does — multi-agent design trades some additional per-request cost and latency for better task-specific reliability and easier evaluation/maintenance of each specialist independently, and that tradeoff needs to be justified by the actual reliability gain, not assumed.
Q195. How would you test handoff logic in isolation, the way Q73 tested a single routing function? Treat the handoff decision (whichever agent or function produces it — a supervisor’s routing call, or a specific agent’s decision to hand off) as its own unit under test: feed it representative conversation states and assert on the resulting Command(goto=...) or equivalent routing output, independent of actually running the target agent’s full logic.
Q196. What’s a design smell suggesting a multi-agent system has been over-decomposed? If most conversations end up bouncing through three or four agent handoffs before reaching the one that actually resolves the user’s request, and those handoffs rarely change based on context (they’re nearly always the same fixed sequence), that fixed sequence is arguably just one agent’s internal steps that got needlessly split into separate agents with handoff overhead in between.
Q197. How does shared long-term memory (Section 7) interact with a multi-agent system where each agent has a different “personality” or role? The store is typically shared across all agents in the system (same user, same namespace scheme) since a fact learned by one specialist agent is usually relevant regardless of which agent handles the user’s next request — the alternative (siloed memory per agent) tends to produce a confusing experience where the user has to repeat context depending on which specialist happens to pick up their next message.
Q198. What’s a strong way to open an answer about “LangGraph vs CrewAI vs AutoGen” specifically for multi-agent design (full comparison in Section 10)? Frame it around control granularity: LangGraph gives you explicit graph-level control over every handoff, piece of shared state, and human-in-the-loop gate, at the cost of writing more of that structure yourself (or using langgraph-supervisor/langgraph-swarm as a starting point); CrewAI and AutoGen ship more opinionated, higher-level multi-agent abstractions out of the box, trading some of that fine-grained control for faster initial setup of common patterns.
Q199. What’s a realistic failure an interviewer might describe and ask you to diagnose: “our supervisor keeps routing refund requests to the general-question agent”? Start with the supervisor’s routing prompt and few-shot examples (is “refund” actually represented clearly enough for the model to distinguish it from a general question), then check whether the returns agent’s own description (what the supervisor sees when deciding where to route) is specific enough — this is fundamentally the same class of problem as Q135’s tool-description reliability issue, just applied to agent descriptions instead of tool descriptions.
Q200. What’s a strong closing statement on multi-agent design for a senior-level interview? Multi-agent architecture is a reliability and maintainability tool, not a default — the right number of agents is however many distinct, testable specializations your problem actually has, coordinated by whichever handoff pattern (supervisor for centralized auditability, swarm for direct peer handoff) matches how work naturally flows between them, with shared memory and state scoped deliberately rather than shared by default just because it’s the path of least resistance.
Section 9: Streaming & Observability (Q201–225)
Streaming is where a lot of candidates who understand the graph model conceptually still trip up on API specifics — the modes changed meaningfully between v1, v2, and v3. For the full walkthrough with a FastAPI + React implementation, see our dedicated LangGraph streaming guide.
Q201. What stream modes does LangGraph’s stream()/astream() API support? values (full state after each step), updates (only the changed keys per step), messages (LLM token chunks with metadata), custom (arbitrary data emitted via get_stream_writer()), checkpoints, tasks, and debug (checkpoints + tasks combined with extra metadata). Docs: Streaming
Q202. What’s the difference between stream_mode=”values” and stream_mode=”updates”? values yields the complete state snapshot after every step, whether or not that particular step changed a given field. updates yields only the keys a step actually changed, scoped by node name — more bandwidth-efficient and the natural choice for a “node X just finished” progress indicator rather than re-sending the whole state repeatedly.
Q203. How do you stream LLM tokens specifically, and what shape does that data come in? Via stream_mode="messages", which yields (message_chunk, metadata) tuples — the chunk being the incremental piece of the LLM’s response, and metadata including which node and which tagged model invocation produced it, letting you filter tokens by node or by tag if multiple models are involved in one graph.
Q204. What’s get_stream_writer() used for, and what’s the constraint on using it in async code on older Python? It lets a node or tool emit arbitrary custom data mid-execution (progress percentages, intermediate status) that surfaces via stream_mode="custom". On Python versions below 3.11, get_stream_writer() doesn’t work inside async functions because those Python versions don’t propagate context automatically across asyncio tasks — you pass a writer parameter explicitly to the node/tool instead.
Q205. What changed between stream_mode’s v1 and v2 output formats? v1’s output shape depends on your options (a single mode returns raw data, multiple modes return (mode, data) tuples, subgraph streaming returns (namespace, data) tuples) — three different shapes depending on configuration. v2 unifies all of that into one consistent StreamPart dict — {"type": ..., "ns": ..., "data": ...} — regardless of how many modes or whether subgraphs are involved.
Q206. What is the v3 event-streaming API, at a conceptual level, and how is it different from v1/v2 stream_mode? v3 (graph.stream_events(..., version="v3")) sits one layer above raw stream_mode output: instead of you branching on chunk shapes or StreamPart.type, it exposes typed projections — stream.messages, stream.values, stream.subgraphs, stream.output — built on a content-block protocol that gives text, reasoning, and tool-call boundaries explicit structure, so the framework does the correlation work v1/v2 leave to your consumer code. Docs: Event streaming
Q207. Show the v3 equivalent of streaming tokens, compared to the v2 pattern.
# v2
async for part in graph.astream(input, stream_mode="messages", version="v2"):
if part["type"] == "messages":
msg, meta = part["data"]
print(msg.content, end="")
# v3
stream = graph.stream_events(input, version="v3")
for message in stream.messages:
for token in message.text:
print(token, end="")
v3 gives you one stream object per LLM call via stream.messages, which removes the need to track “which node is this token from” yourself to avoid concatenating unrelated model calls together.
Q208. What’s a content block, and why does the messages channel model output that way? A content block is a discrete unit of an LLM’s output — text, reasoning, or tool-call arguments — with explicit message-start / content-block-start / content-block-delta / content-block-finish / message-finish boundaries, so a consumer can tell unambiguously where one kind of content ends and another begins, rather than inferring it from provider-specific formatting.
Q209. Where do reasoning tokens surface in v3, and why is that a common source of confusion? On message.reasoning, separate from message.text. Reading only .text means you silently miss all reasoning-model “thinking” tokens — which also means a model that’s reasoning at length produces no visible .text output for a stretch, which without knowing to check .reasoning looks like the stream has stalled.
Q210. What is a StreamTransformer and when would you write a custom one? An interface (init(), process(event), finalize(), fail(err)) for building a custom projection over the raw event stream — write one when none of the built-in projections (stream.messages, stream.values, stream.subgraphs) give you the derived view you need, like aggregate token-usage tracking or a bespoke progress indicator. Docs: Event streaming — custom projections
Q211. What does required_stream_modes control on a StreamTransformer, and what’s the consequence of forgetting to declare a mode? It declares which raw Pregel channels ("messages", "custom", etc.) the graph must actually emit for that transformer to see anything — the runtime takes the union across every registered transformer’s declared modes. Forget to declare "custom", for example, and your transformer’s process() simply never receives custom events at all, silently, rather than raising an obvious error.
Q212. What’s the difference between a named and an unnamed StreamChannel? A named channel (StreamChannel("my_projection")) both exposes an iterable under stream.extensions and forwards each pushed value into the main event stream as a custom:<name> event — meaning its payload must be JSON-serializable. An unnamed channel (StreamChannel()) is side-channel only, the right choice for projections holding in-process objects (promises, class instances) that can’t be serialized.
Q213. How do you consume multiple projections in strict arrival order rather than picking just one? stream.interleave("values", "messages", "subgraphs") in synchronous code yields items from all three projections interleaved in the actual order they occurred, rather than requiring you to asyncio.gather over separately-iterated projections (the async-code equivalent for concurrent consumption).
Q214. Why does a real-time streaming UI need a keepalive mechanism, and what does that have to do with reasoning models specifically? A reasoning model can produce zero message.text output for tens of seconds while reasoning (those tokens are on .reasoning, not .text) — an idle SSE or WebSocket connection with no traffic for that long often gets dropped by an intermediate proxy that assumes the connection is dead, so you emit a periodic empty “keepalive” frame to hold the connection open regardless of whether the model has produced visible content yet.
Q215. What HTTP response header commonly needs setting to prevent a reverse proxy from buffering an SSE stream, defeating the point of streaming? X-Accel-Buffering: no (for Nginx-style proxies) — without it, the proxy can buffer the entire response and deliver it all at once instead of passing chunks through as they arrive, which silently turns a “streaming” endpoint back into a blocking one from the client’s perspective.
Q216. What’s the useStream() React hook, and what problem does it solve versus hand-rolling an EventSource client? A hook from @langchain/langgraph-sdk that handles message accumulation, loading state, interrupt detection, and conversation branching for a graph deployed behind an Agent Server — it removes the need to hand-write the token-accumulation and state-tracking logic a raw EventSource consumer would otherwise require, at the cost of expecting that specific deployment shape rather than an arbitrary custom backend. Docs: useStream React reference
Q217. What does stream.tool_calls (via the built-in ToolCallTransformer) give you that raw messages-channel parsing doesn’t? Tool calls already correlated by ID with their execution results — rather than separately tracking a tool-call content block from the messages channel and matching it up yourself with a later tools channel event carrying the result, the transformer has already joined them into one coherent object for you.
Q218. Structured output (JSON mode) streaming — what’s the honest limitation, on any streaming version? The token stream for structured/JSON-mode output is characters — braces, quotes, partial field names — not readable prose, so streaming it token-by-token to a UI is rarely useful on its own. The authoritative, usable result is the final parsed state (stream.output or the equivalent values/updates payload), not something reassembled from the raw token stream.
Q219. What’s the difference between LangSmith tracing and LangGraph’s own streaming/checkpointing for observability purposes? Streaming and checkpoints give you real-time and historical visibility into a specific run’s state and outputs. LangSmith tracing is a separate observability product that captures spans across an entire run (and across runs) for aggregate analysis — latency percentiles, error rates, prompt/response inspection across many executions — which individual-run streaming and checkpoint inspection alone doesn’t give you at that aggregate level. Docs: LangSmith Observability
Q220. How would you compute time-to-first-token for a LangGraph-backed endpoint? Timestamp the moment the first item comes out of message.text (or the first content-block-delta text event on the raw channel) relative to when the request/run started — on v3 this is a single, clearly-defined event to timestamp; on v2 you’d reconstruct the same measurement from the first non-empty content chunk in a noisier raw delta stream.
Q221. What’s the observability argument for run.lifecycle (or stream.lifecycle) beyond just knowing when a run finishes? It emits started/running/completed/failed/interrupted transitions per run, subgraph, and subagent — meaning per-node and per-subgraph latency and failure attribution become structured projections you can pipe straight into a metrics/tracing exporter, rather than something you’d otherwise have to reconstruct from application logs after the fact.
Q222. Why does client disconnect handling matter specifically for streaming endpoints backed by an LLM call? If a client disconnects mid-stream and your server keeps consuming the upstream graph’s stream to completion anyway, you’re paying for (and generating) tokens nobody will ever read — checking for disconnect (e.g., request.is_disconnected() in a FastAPI generator) and aborting the underlying run is both a cost-control and a resource-hygiene concern, not just a UX nicety.
Q223. What’s a message-finish error event, and why does it matter for error handling in a streaming UI? It’s how an unrecoverable failure during a specific LLM call surfaces on the messages channel — as a structured error attached to that message’s finish event, rather than as an exception that abruptly kills the whole stream mid-transmission. Handling it explicitly is what turns a mid-stream model failure into a clean, user-visible error state instead of a stream that just silently stops.
Q224. If you’re already running a production system on v2 stream_mode, what’s a reasonable, honest answer to “should you migrate to v3”? Migrate when you specifically need reasoning-delta streaming, tool-call-argument streaming, or clean per-call usage metadata — v2 can technically get you all three, just with meaningfully more hand-written bookkeeping (accumulators, tuple unpacking, manual correlation). If the current v2 consumer is working and none of those specific needs are pressing, the ergonomic improvement alone usually doesn’t justify reworking a shipping path immediately.
Q225. What’s a strong way to summarize LangGraph’s streaming story across all three versions in one interview answer? v1 exposes the rawest, least consistent shape. v2 unifies that into one consistent StreamPart dict you still branch on manually. v3 moves the branching logic into the framework itself via typed projections over an explicit content-block protocol — the throughline across all three is the same underlying Pregel event stream, just progressively more structured and less work for the consumer to parse correctly.
Section 10: Production, Deployment & Framework Comparisons (Q226–250)
The closing section — deployment mechanics, and the comparison questions (“why LangGraph and not X”) that senior and architect-level interviews lean on heavily.
Q226. What is LangGraph Platform, and what is it called now? A managed hosting and deployment layer for LangGraph applications — as of late 2025, it was renamed “LangSmith Deployment,” reflecting its integration into the broader LangSmith product rather than standing as a separately-branded platform. Docs: LangSmith Deployment
Q227. What are the three deployment options under LangSmith Deployment? Cloud (fully managed SaaS, fastest to get started, available on Plus and Enterprise plans), Hybrid (SaaS control plane with a self-hosted data plane, so sensitive data stays in your infrastructure while LangChain manages the control layer — Enterprise only), and fully self-hosted (the entire platform runs in your own infrastructure with no data leaving your VPC).
Q228. What is langgraph.json and what does it configure? The configuration file the LangGraph CLI reads by default to build and deploy an application — it declares things like which graphs to expose, dependencies, environment variables, and (for semantic memory) store/embedding configuration, functioning as the deployment manifest for the application.
Q229. What is an “assistant” in LangGraph Platform/LangSmith Deployment terms, and how does it differ from a graph? An assistant is a configured, named instance of a graph — the same underlying graph definition can back multiple assistants with different configuration (different prompts, different model choices) without duplicating the graph’s code, and assistants can be composed as “remote graphs” to build multi-agent systems across separately deployed services.
Q230. What do the disable_assistants, disable_runs, disable_threads, and disable_store configuration flags do? They selectively turn off groups of the platform’s built-in HTTP routes for a deployment — useful when you want to expose only a subset of the platform’s default API surface (for security, simplicity, or because your application handles that concern itself elsewhere) rather than the full default route set.
Q231. How would you decide between LangSmith Deployment (managed) and self-hosting your own FastAPI + checkpointer setup? Managed deployment trades some infrastructure control for faster setup, built-in scaling, and integrated tracing/observability out of the box — reasonable defaults for most teams. Self-hosting makes sense when you have specific infrastructure requirements (data residency, existing deployment tooling, cost optimization at very large scale) that the managed offering doesn’t accommodate, or when you need tighter control over the exact request/response surface than the platform’s default routes provide.
Q232. What’s a realistic production checklist item people forget: testing a graph’s behavior under retry (Section 4) specifically? Confirming that nodes performing side effects (an API call, a database write) are either naturally idempotent or explicitly guarded against duplicate execution — RetryPolicy‘s automatic retries assume a node can safely re-run, and a node that charges a payment or sends a notification without idempotency protection will do so twice on a retried transient failure, which is a correctness bug that only shows up under real failure conditions, not in happy-path testing.
Q233. How would you approach load-testing a LangGraph-backed API before a production launch? Separate the concerns: LLM-call latency and cost scale with concurrent request volume regardless of your graph’s structure, while checkpointer write throughput (especially on a shared Postgres instance under concurrent threads) is a distinct bottleneck worth testing independently — a load test that only measures end-to-end request latency can mask which of those two very different systems is actually the constraint under real traffic.
Q234. What’s the case for LangGraph over CrewAI, stated fairly? LangGraph gives you explicit, low-level control over state, control flow, persistence, and human-in-the-loop — which matters when your application’s requirements go beyond CrewAI’s more opinionated role-based crew abstraction, or when you need fine-grained checkpointing and time travel that CrewAI’s higher-level abstraction doesn’t expose as directly.
Q235. What’s the case for CrewAI over LangGraph, stated fairly (the other half of Q234)? CrewAI’s role-based abstraction (agents with defined roles, goals, and a crew that coordinates them) gets a working multi-agent system running faster for teams whose use case fits that model well, without needing to hand-design a graph’s nodes, edges, and state schema from scratch — the tradeoff is less low-level control in exchange for a faster path to a common pattern.
Q236. What’s the core architectural difference between LangGraph and AutoGen? AutoGen is built around conversational message-passing between agents as the primary abstraction — agents “talk” to each other in a structured conversation loop. LangGraph is built around explicit state and graph topology as the primary abstraction, with agent conversation being one possible pattern you can construct on top of that state machine, not the framework’s foundational unit.
Q237. When would LangGraph specifically be a stronger choice than either CrewAI or AutoGen? When you need fine-grained persistence and time travel, precise human-in-the-loop gating on specific actions (not just at agent boundaries), or a control-flow shape that doesn’t map cleanly onto either “crew of role-based agents” or “conversational message passing” — cases where the underlying graph model’s flexibility is worth the additional upfront design work.
Q238. What’s the honest limitation interviewers want you to name about LangGraph itself, not just its competitors? It’s genuinely lower-level than CrewAI or AutoGen for common multi-agent patterns — you (or a library like langgraph-supervisor) have to construct the routing and handoff logic that those frameworks provide more directly out of the box, which is more upfront work for standard cases even though it pays off in control for non-standard ones.
Q239. What’s the core difference between LangGraph and Temporal, restated for a production-deployment framing (deeper mechanics in Q81–82)? LangGraph models agent reasoning, state, and tool use with checkpoint-based resumability between steps. Temporal is a durable-execution engine for arbitrary long-running workflows with event-history-backed replay durability within a step, not just between steps, and no built-in concept of prompts, context windows, or LLM-specific state — the common production pattern pairs LangGraph for the agent’s reasoning layer with Temporal underneath for workflows with expensive, must-not-repeat side effects.
Q240. What’s a good interview framing for “why not just build this without any framework, in plain Python”? A hand-rolled state machine can absolutely work for a simple case, but you re-implement checkpointing, replay, human-in-the-loop pausing, streaming, and retry semantics yourself — LangGraph’s value isn’t “you couldn’t build this otherwise,” it’s that these cross-cutting production concerns are already solved and tested, letting your team’s effort go into the actual application logic rather than re-deriving durable state-machine infrastructure.
Q241. What’s a realistic system-design interview prompt combining multiple sections of this guide, and how would you structure an answer? “Design a production customer-support agent that can look up orders, process refunds under $200 autonomously, escalate larger refunds to a human, and remember customer preferences across conversations.” A strong answer touches: a supervisor or single agent with tool access (Section 6), interrupt() gating specifically on refunds above the threshold (Section 5), a PostgresSaver checkpointer for conversation persistence (Section 4) and a PostgresStore with semantic search for cross-conversation memory (Section 7), and RetryPolicy on the order-lookup and refund-processing nodes specifically, given they call external systems (Section 4).
Q242. How would you explain LangGraph’s testing story across unit, integration, and evaluation tiers? Unit-test individual nodes and routing functions in isolation (Q49, Q73) with hand-built state fixtures and no LLM calls; integration-test the compiled graph’s end-to-end behavior with a stub/fake chat model producing scripted responses (Q146) to verify control flow without live-API cost or flakiness; and separately run evaluation (via LangSmith or a custom harness) against real or near-real model behavior to measure task success, which unit and integration tests deliberately don’t cover since they’re testing structure, not model quality.
Q243. What’s a reasonable answer to “how do you version an agent’s behavior in production without breaking existing conversations”? Assistants (Q229) let you version configuration (prompts, models) somewhat independently of the underlying graph’s code; for actual graph-structure changes, treat it like any schema migration (Q90) — new threads get the new graph shape, in-flight threads either need a migration path for their checkpoint history or continue running against the version they started on until they naturally conclude.
Q244. What’s a strong answer to “what would make you choose to NOT use LangGraph for a given project”? A genuinely simple, single-shot LLM call with no need for state across turns, no tool use, no human approval gate, and no persistence requirement — introducing a graph, checkpointer, and all the associated machinery for a task that’s really just “call the model once and return the result” is unnecessary overhead; LangGraph earns its complexity budget on multi-step, stateful, or human-gated workflows, not trivial ones.
Q245. How would you handle secrets/credentials (API keys for tools) in a LangGraph deployment, especially in a multi-tenant setup? Pass credentials via config (LangGraph’s RunnableConfig-style configuration passed at invocation time) rather than hardcoding them into graph or node definitions, and scope them per-tenant/per-request through that same config mechanism rather than through global environment state that every thread would otherwise share indiscriminately.
Q246. What’s a good answer to “how do you monitor cost” for a LangGraph-backed production system? Token usage metadata is available per LLM call (message.output.usage_metadata on the v3 streaming API, or the equivalent field in a non-streaming response), so a StreamTransformer (Q210) or equivalent hook aggregating that usage per run — and tagging it by node, agent, or tenant — turns raw usage numbers into attributable cost, rather than only knowing an aggregate spend number with no way to trace which part of the system is driving it.
Q247. What’s the honest tradeoff of adopting prebuilt agents/multi-agent libraries (create_react_agent, langgraph-supervisor) versus building everything on the raw Graph API? Faster initial development and a maintained, tested implementation of common patterns, at the cost of being somewhat coupled to how those libraries have chosen to structure state and control flow internally — worth it until your requirements diverge enough from the common pattern that working around the prebuilt abstraction costs more than building the equivalent logic directly would have.
Q248. What’s a strong answer to “how would you decide when a project has outgrown a single graph and needs multi-agent architecture”? When a single agent’s system prompt and tool list have grown large enough that tool-selection and reasoning reliability are visibly degrading (Q176), or when genuinely distinct workflows (each needing separate evaluation, separate human-in-the-loop policies, or separate ownership within the team) are being forced into one undifferentiated agent — the signal is reliability and organizational friction, not simply “the codebase got big.”
Q249. What question would a strong candidate ask back, if given the chance, during a LangGraph system-design interview? Something clarifying the actual failure tolerance and latency budget of the system being designed — whether a refund workflow can tolerate a human-review delay of minutes versus needing to resolve in seconds, for instance — since nearly every design decision in this guide (whether to checkpoint, whether to gate with interrupt(), whether multi-agent is worth the overhead) depends on constraints that a well-posed interview question should surface rather than assume.
Q250. What’s the single idea, if a candidate remembers nothing else from this guide, that ties Sections 1 through 10 together? LangGraph’s entire value proposition is turning implicit, hand-rolled state-machine concerns — persistence, replay, human approval, streaming, retries — into explicit, first-class primitives (state + reducers, checkpointer, interrupt(), typed streaming projections, RetryPolicy) that you compose rather than reinvent; every section of this guide is really just a different one of those primitives, and a strong candidate can explain how they all sit on top of the same Pregel super-step execution model from Q4.
Key Takeaways
- LangGraph’s core mental model is Pregel-style super-step execution over an explicit state schema — nearly every advanced feature (Command, Send, checkpointing, interrupts, streaming projections) is a different lens on that same underlying model.
- Reducers, not manual merge logic, are how LangGraph resolves concurrent writes to shared state — understanding
Annotated[Type, reducer_fn]unlocks parallel fan-out,add_messages, and custom merge semantics alike. - Checkpointing enables memory,
interrupt()-based human-in-the-loop, and time travel simultaneously — they’re three consumers of the same underlying persisted-state mechanism, not three separate systems. CommandandSendare the two primitives that make dynamic, runtime-determined control flow possible —Commandfor “update state and route” in one return,Sendfor dynamic-count parallel fan-out.- Long-term memory (BaseStore) and short-term memory (checkpointer) solve genuinely different problems — cross-thread durable facts versus in-thread conversation history — and conflating them is a common design mistake.
- Multi-agent architecture (supervisor, swarm) is a reliability and specialization tool, not a default — the right number of agents matches the number of genuinely distinct, separately-testable specializations a problem has.
- Streaming evolved from raw, inconsistent chunks (v1) to a unified dict format (v2) to typed projections over an explicit content-block protocol (v3) — know which version a codebase is on before writing streaming code for it.
- LangGraph’s honest tradeoff versus CrewAI, AutoGen, and Temporal is control versus convenience or durability guarantees — a strong candidate can name what LangGraph gives up, not just what it provides.
FAQs
Is LangGraph hard to learn if I already know LangChain? The core building blocks (chat models, tools, prompts) transfer directly — what’s new is the state-machine mental model (Section 1) and the persistence/streaming layer built on top of it, which is a few days to a few weeks of ramp-up depending on how deep the role requires.
Do I need to memorize exact function signatures for a LangGraph interview? No — interviewers are almost always more interested in whether you understand why a primitive exists (why Send versus a plain conditional edge, why interrupt() needs a checkpointer) than whether you can recite an exact parameter list from memory; understanding the reasoning lets you reconstruct approximately-correct syntax on demand.
What’s the most commonly under-prepared topic among LangGraph interview candidates? Human-in-the-loop design judgment (Section 5) — most candidates can explain interrupt() mechanically but far fewer can reason clearly about which actions in a given workflow actually warrant a pause, which is exactly the kind of question senior-level interviews probe for.
Should I prepare framework comparison questions (LangGraph vs CrewAI vs AutoGen vs Temporal) even for a mid-level role? Yes, at least at a basic level — even junior-to-mid interviews often ask “why did you pick LangGraph for this project” as a way to check you understand the tool’s tradeoffs rather than having used it by default or by hype.
Is this guide enough on its own, or should I also read LangGraph’s official docs directly? Use this guide to structure your review and check your understanding, but read the official docs (linked throughout every answer above) for anything you’re rusty on — LangGraph’s API surface moves quickly enough that the primary docs are the ground truth this guide is deliberately built to point back to, not replace.
References
All code patterns and API descriptions in this guide were verified against the following official sources (dated 2026 unless otherwise noted):
- LangGraph (OSS) — Graph API Overview. docs.langchain.com/oss/python/langgraph/graph-api
- LangGraph (OSS) — Thinking in LangGraph. docs.langchain.com/oss/python/langgraph/thinking-in-langgraph
- LangGraph (OSS) — Persistence. docs.langchain.com/oss/python/langgraph/persistence
- LangGraph (OSS) — Fault tolerance (RetryPolicy, CachePolicy). docs.langchain.com/oss/python/langgraph/fault-tolerance
- LangGraph (OSS) — Interrupts. docs.langchain.com/oss/python/langgraph/interrupts
- LangGraph (OSS) — Use time travel. docs.langchain.com/oss/python/langgraph/use-time-travel
- LangGraph (OSS) — Functional API overview. docs.langchain.com/oss/python/langgraph/functional-api
- LangGraph (OSS) — Streaming and Event streaming. docs.langchain.com/oss/python/langgraph/streaming / docs.langchain.com/oss/python/langgraph/event-streaming
- LangChain — Memory overview and Semantic search for LangGraph memory. docs.langchain.com/oss/python/concepts/memory / langchain.com/blog/semantic-search-for-langgraph-memory
- LangChain Reference —
create_react_agent,langgraph-supervisor,langgraph-swarm. reference.langchain.com/python/langgraph.prebuilt / reference.langchain.com/python/langgraph-supervisor / reference.langchain.com/python/langgraph-swarm - LangChain — LangSmith Deployment (formerly LangGraph Platform). docs.langchain.com/oss/python/langgraph/deploy
- LangChain —
useStream()React reference. docs.langchain.com/langgraph-platform/use-stream-react









