Debugging Non-Deterministic LLM Agents: Implementing Checkpoint-Based State Replay with LangGraph Time Travel

#tooling #agents #tutorial #llm

Introduction

Large Language Models (LLMs) are inherently non-deterministic even with identical inputs and parameters, they generate different outputs on each execution. This fundamental characteristic creates a critical challenge for production AI systems: how do you debug, audit, and reproduce agent behavior when the execution trace disappears after each run?

In this technical deep-dive, we explore LangGraph's Time Travel feature a checkpoint based state persistence system that transforms ephemeral LLM agent executions into reproducible, debuggable workflows. We demonstrate the implementation through a real-world banking use case where a loan rejection needed to be investigated, debugged, and corrected in production.

The Technical Problem: Non-Determinism Meets Production Requirements
Why LLM Non-Determinism Breaks Traditional Debugging

Traditional software debugging relies on a fundamental assumption: deterministic execution. Given the same input, a function returns the same output.

This enables three critical capabilities:

Reproducibility: Replay the exact sequence that led to a decision
Debuggability: Inspect intermediate states to find errors
Correctability: Fork from any checkpoint and explore alternatives

Real World Implementation: The Loan Rejection Investigation
Let's see how this solves real production problems with a concrete scenario.

The Crisis: "Why Was My Loan Rejected? I Have Excellent Credit!"

Monday, 9:00 AM A furious customer walks into your bank branch.

"I've been banking with you for 15 years. My credit score is 780. I have stable income, low debt. Why was my $50,000 home improvement loan rejected? This makes no sense!"

The branch manager checks the system. Status: REJECTED. Reason: "Application does not meet risk criteria."

That's it. No details. No explanation. Just a black box decision that's about to cost your bank a loyal customer and potentially a discrimination lawsuit.

The customer doesn't know (and doesn't care) whether it was AI, automated rules, or a human decision they just know it's wrong and unexplainable.

Monday, 9:30 AM The branch manager escalates to the Head of Lending.
Monday, 10:00 AMThe Head of Lending calls the IT Manager in a panic.

"Find out what happened. NOW. The customer is threatening legal action, and we have NO IDEA why the system rejected their application. The branch manager tried to explain, but our automated system just says 'does not meet criteria' that's not good enough. If we can't explain this decision with concrete details, we're looking at regulatory fines, lawsuits, and a PR nightmare."

This is where LangGraph Time Travel becomes your lifeline.

The Problem: Automated Systems That Make Unexplainable Decisions Your bank deployed an AI-powered loan approval agent to process applications faster and more consistently. It works behind the scenes customers just see "approved" or "rejected." It works great... until it doesn't.

Here's what makes this crisis so difficult:

The Reproducibility Nightmare

LLMs are non-deterministic: Run the same application twice, get different results
Decision trails disappear: After the agent finishes, its reasoning evaporates
No audit trail: You have the final decision, but not the steps that led to it
Compliance risk: Regulators demand explanations you can't provide

The Business Impact

Lost customers: Unexplained rejections drive customers to competitors (they don't care if it's "AI" or "automation" they just see unfairness)
Legal liability: Can't prove decisions weren't discriminatory
Regulatory fines: CFPB, OCC, and state regulators penalize unexplainable automated decisions
Reputation damage: Social media explodes when customers share stories of unfair treatment
Operational chaos: Teams spend weeks trying to recreate what the system did

The Real Cost

Average cost per discrimination lawsuit: $500K - $2M
Regulatory fines for unexplainable automated decisions: $1M - $10M
Customer lifetime value lost: $50K - $100K per customer
Staff time debugging system issues: Weeks to months

The Solution: LangGraph Time Travel for Complete System Auditability

LangGraph's Time Travel feature solves this crisis by recording every single decision your automated loan system makes, creating a complete, replayable audit trail that you can inspect, debug, and explain to anyone—customers, regulators, or executives. Think of it as a flight recorder for your automated decision system when something goes wrong, you can replay exactly what happened, step by step.

Real Implementation: Loan Approval Agent with Full Auditability
Let's build a production grade loan approval agent that the IT Manager can actually debug when the boss calls in a panic.

The Agent Architecture

Implementation

"""
LangGraph Time Travel: Loan Approval Agent with Full Auditability

This demonstrates how to use LangGraph's Time Travel feature to debug
AI loan rejections and provide complete audit trails for regulatory compliance.

Key Features:
- Complete checkpointing of every decision step
- Time Travel for debugging rejected applications
- State correction and re-execution
- Full audit trail for compliance
"""

import uuid
import json
import datetime
from typing import Optional
from typing_extensions import TypedDict, NotRequired

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import InMemorySaver
from langchain_openai import ChatOpenAI


# =============================================================================
# State Definition
# =============================================================================

class LoanApplicationState(TypedDict):
    """Complete state for loan application processing."""

    # Application data
    applicant_id: str
    loan_amount: float
    loan_purpose: str

    # Collected information
    credit_score: NotRequired[int]
    annual_income: NotRequired[float]
    employment_status: NotRequired[str]
    debt_to_income_ratio: NotRequired[float]

    # Decision-making steps
    credit_assessment: NotRequired[dict]
    income_verification: NotRequired[dict]
    risk_analysis: NotRequired[dict]

    # Final decision
    decision: NotRequired[str]  # APPROVED or REJECTED
    decision_reasoning: NotRequired[str]
    confidence_score: NotRequired[float]

    # Audit metadata
    timestamp: NotRequired[str]
    processing_errors: NotRequired[list]


# =============================================================================
# Workflow Nodes
# =============================================================================

def create_loan_agent(api_key: str):
    """Create the loan approval agent with checkpointing."""

    # Initialize LLM
    model = ChatOpenAI(
        model="gpt-4o-mini",
        temperature=0,
        api_key=api_key
    )

    # Node 1: Credit Assessment
    def assess_credit(state: LoanApplicationState) -> dict:
        """Evaluate applicant's credit worthiness."""

        prompt = f"""
        Assess credit worthiness for this loan application:

        Credit Score: {state.get('credit_score', 'Unknown')}
        Loan Amount: ${state['loan_amount']:,.2f}
        Purpose: {state['loan_purpose']}

        Evaluate:
        1. Is credit score sufficient? (>700 excellent, 650-700 good, <650 risky)
        2. Is loan amount appropriate for credit level?
        3. Any red flags?

        Return ONLY valid JSON: {{"assessment": "PASS" or "FAIL" or "REVIEW", "reasoning": "your explanation", "concerns": []}}
        """

        response = model.invoke(prompt)

        try:
            # Try to parse JSON from response
            content = response.content.strip()
            if content.startswith("```

"):
                content = content.split("

```")[1]
                if content.startswith("json"):
                    content = content[4:]
            credit_assessment = json.loads(content)
        except:
            # Fallback assessment based on score
            score = state.get('credit_score', 0)
            if score >= 700:
                credit_assessment = {
                    "assessment": "PASS",
                    "reasoning": f"Credit score {score} is excellent",
                    "concerns": []
                }
            elif score >= 650:
                credit_assessment = {
                    "assessment": "REVIEW",
                    "reasoning": f"Credit score {score} is good but requires additional review",
                    "concerns": ["Score near threshold"]
                }
            else:
                credit_assessment = {
                    "assessment": "FAIL",
                    "reasoning": f"Credit score {score} is below minimum threshold",
                    "concerns": ["Low credit score"]
                }

        return {"credit_assessment": credit_assessment}

    # Node 2: Income Verification
    def verify_income(state: LoanApplicationState) -> dict:
        """Verify income sufficiency and stability."""

        income = state.get('annual_income', 0)
        loan_amount = state['loan_amount']
        employment = state.get('employment_status', 'Unknown')

        prompt = f"""
        Verify income sufficiency for this loan:

        Annual Income: ${income:,.2f}
        Loan Amount: ${loan_amount:,.2f}
        Employment Status: {employment}

        Evaluate:
        1. Is income sufficient? (Loan should be < 50% of annual income for personal loans)
        2. Is employment stable?
        3. Income verification concerns?

        Return ONLY valid JSON: {{"verification": "PASS" or "FAIL" or "REVIEW", "reasoning": "your explanation", "concerns": []}}
        """

        response = model.invoke(prompt)

        try:
            content = response.content.strip()
            if content.startswith("```

"):
                content = content.split("

```")[1]
                if content.startswith("json"):
                    content = content[4:]
            income_verification = json.loads(content)
        except:
            # Fallback calculation
            loan_to_income_ratio = (loan_amount / income) if income > 0 else float('inf')

            if loan_to_income_ratio <= 0.50 and employment == "Full-time":
                income_verification = {
                    "verification": "PASS",
                    "reasoning": f"Income sufficient with {loan_to_income_ratio:.1%} loan-to-income ratio",
                    "concerns": []
                }
            elif loan_to_income_ratio <= 0.75:
                income_verification = {
                    "verification": "REVIEW",
                    "reasoning": f"Loan-to-income ratio {loan_to_income_ratio:.1%} requires additional review",
                    "concerns": ["Higher loan-to-income ratio"]
                }
            else:
                income_verification = {
                    "verification": "FAIL",
                    "reasoning": f"Loan-to-income ratio {loan_to_income_ratio:.1%} exceeds maximum threshold",
                    "concerns": ["Insufficient income for loan amount"]
                }

        return {"income_verification": income_verification}

    # Node 3: Risk Analysis
    def analyze_risk(state: LoanApplicationState) -> dict:
        """Comprehensive risk analysis combining all factors."""

        credit_assessment = state['credit_assessment']
        income_verification = state['income_verification']
        debt_to_income = state.get('debt_to_income_ratio', 0)

        prompt = f"""
        Perform comprehensive risk analysis:

        Credit Assessment: {json.dumps(credit_assessment, indent=2)}
        Income Verification: {json.dumps(income_verification, indent=2)}
        Debt-to-Income Ratio: {debt_to_income:.1%}

        Determine overall risk level: LOW, MEDIUM, or HIGH
        Provide detailed reasoning for the risk level.

        Return ONLY valid JSON: {{"risk_level": "LOW" or "MEDIUM" or "HIGH", "reasoning": "your explanation", "risk_factors": []}}
        """

        response = model.invoke(prompt)

        try:
            content = response.content.strip()
            if content.startswith("```

"):
                content = content.split("

```")[1]
                if content.startswith("json"):
                    content = content[4:]
            risk_analysis = json.loads(content)
        except:
            # Fallback risk calculation
            credit_status = credit_assessment['assessment']
            income_status = income_verification['verification']

            all_concerns = (credit_assessment.get('concerns', []) + 
                           income_verification.get('concerns', []))

            if credit_status == "PASS" and income_status == "PASS" and debt_to_income < 0.36:
                risk_analysis = {
                    "risk_level": "LOW",
                    "reasoning": "All criteria met with strong credit and income",
                    "risk_factors": []
                }
            elif credit_status == "FAIL" or income_status == "FAIL" or debt_to_income > 0.50:
                risk_analysis = {
                    "risk_level": "HIGH",
                    "reasoning": "One or more critical criteria failed",
                    "risk_factors": all_concerns
                }
            else:
                risk_analysis = {
                    "risk_level": "MEDIUM",
                    "reasoning": "Application requires additional review",
                    "risk_factors": all_concerns
                }

        return {"risk_analysis": risk_analysis}

    # Node 4: Final Decision
    def make_decision(state: LoanApplicationState) -> dict:
        """Make final approval/rejection decision."""

        risk_analysis = state['risk_analysis']
        credit_assessment = state['credit_assessment']
        income_verification = state['income_verification']

        prompt = f"""
        Make final loan decision based on complete analysis:

        Credit Assessment: {credit_assessment['assessment']}
        Income Verification: {income_verification['verification']}
        Risk Level: {risk_analysis['risk_level']}
        Risk Analysis: {risk_analysis['reasoning']}

        Make a decision: APPROVED or REJECTED
        Provide clear reasoning that can be explained to the customer.
        Include confidence score (0.0 to 1.0).

        Return ONLY valid JSON: {{"decision": "APPROVED" or "REJECTED", "reasoning": "your explanation", "confidence": 0.95}}
        """

        response = model.invoke(prompt)

        try:
            content = response.content.strip()
            if content.startswith("```

"):
                content = content.split("

```")[1]
                if content.startswith("json"):
                    content = content[4:]
            decision_data = json.loads(content)
        except:
            # Fallback decision logic
            risk_level = risk_analysis['risk_level']

            if risk_level == "LOW":
                decision_data = {
                    "decision": "APPROVED",
                    "reasoning": "Application meets all lending criteria with low risk",
                    "confidence": 0.95
                }
            elif risk_level == "HIGH":
                decision_data = {
                    "decision": "REJECTED",
                    "reasoning": f"Application rejected due to: {risk_analysis['reasoning']}",
                    "confidence": 0.90
                }
            else:
                # Medium risk - conservative approach
                decision_data = {
                    "decision": "REJECTED",
                    "reasoning": "Application requires manual underwriter review before approval",
                    "confidence": 0.60
                }

        return {
            "decision": decision_data['decision'],
            "decision_reasoning": decision_data['reasoning'],
            "confidence_score": decision_data.get('confidence', 0.75)
        }

    # Build the workflow
    workflow = StateGraph(LoanApplicationState)

    # Add processing nodes
    workflow.add_node("assess_credit", assess_credit)
    workflow.add_node("verify_income", verify_income)
    workflow.add_node("analyze_risk", analyze_risk)
    workflow.add_node("make_decision", make_decision)

    # Define the processing flow
    workflow.add_edge(START, "assess_credit")
    workflow.add_edge("assess_credit", "verify_income")
    workflow.add_edge("verify_income", "analyze_risk")
    workflow.add_edge("analyze_risk", "make_decision")
    workflow.add_edge("make_decision", END)

    # Compile with checkpointing for complete auditability
    checkpointer = InMemorySaver()
    loan_agent = workflow.compile(checkpointer=checkpointer)

    return loan_agent


# =============================================================================
# Time Travel Investigation Functions
# =============================================================================

def investigate_decision(loan_agent, config: dict) -> None:
    """Investigate a loan decision using Time Travel."""

    print("\n" + "=" * 70)
    print("🔍 TIME TRAVEL INVESTIGATION")
    print("=" * 70)

    # Retrieve complete execution history
    states = list(loan_agent.get_state_history(config))

    print(f"\nTotal checkpoints recorded: {len(states)}")
    print("\n" + "-" * 70)
    print("COMPLETE DECISION TRAIL")
    print("-" * 70)

    # Track what we've seen
    seen_credit = False
    seen_income = False
    seen_risk = False

    # Examine each decision point (oldest to newest)
    for i, checkpoint in enumerate(reversed(states)):
        step_num = i + 1
        checkpoint_id = checkpoint.config['configurable']['checkpoint_id'][:8]
        print(f"\n[STEP {step_num}] Checkpoint: {checkpoint_id}...")
        print(f"  Next Action: {checkpoint.next if checkpoint.next else 'COMPLETED'}")

        values = checkpoint.values

        # Show credit assessment when it first appears
        if 'credit_assessment' in values and not seen_credit:
            seen_credit = True
            print("\n  📊 CREDIT ASSESSMENT:")
            assessment = values['credit_assessment']
            print(f"     Result: {assessment.get('assessment', 'N/A')}")
            print(f"     Reasoning: {assessment.get('reasoning', 'N/A')}")
            concerns = assessment.get('concerns', [])
            if concerns:
                print(f"     Concerns: {', '.join(concerns)}")

        # Show income verification when it first appears
        if 'income_verification' in values and not seen_income:
            seen_income = True
            print("\n  💰 INCOME VERIFICATION:")
            verification = values['income_verification']
            print(f"     Result: {verification.get('verification', 'N/A')}")
            print(f"     Reasoning: {verification.get('reasoning', 'N/A')}")
            concerns = verification.get('concerns', [])
            if concerns:
                print(f"     Concerns: {', '.join(concerns)}")

        # Show risk analysis when it first appears
        if 'risk_analysis' in values and not seen_risk:
            seen_risk = True
            print("\n  ⚠️  RISK ANALYSIS:")
            risk = values['risk_analysis']
            print(f"     Risk Level: {risk.get('risk_level', 'N/A')}")
            print(f"     Reasoning: {risk.get('reasoning', 'N/A')}")
            factors = risk.get('risk_factors', [])
            if factors:
                print(f"     Risk Factors: {', '.join(factors)}")

        # Show final decision
        if 'decision' in values and checkpoint.next == ():
            print(f"\n  ✅ FINAL DECISION: {values['decision']}")
            print(f"     Reasoning: {values.get('decision_reasoning', 'N/A')}")
            print(f"     Confidence: {values.get('confidence_score', 0):.1%}")


def analyze_root_cause(final_state: dict) -> None:
    """Analyze root cause of a rejection."""

    print("\n" + "=" * 70)
    print("🔍 ROOT CAUSE ANALYSIS")
    print("=" * 70)

    print("\n📊 APPLICATION METRICS:")
    print(f"  • Credit Score: {final_state.get('credit_score', 'N/A')}", end="")
    if final_state.get('credit_score', 0) >= 700:
        print(" (Excellent)")
    elif final_state.get('credit_score', 0) >= 650:
        print(" (Good)")
    else:
        print(" (Needs Improvement)")

    income = final_state.get('annual_income', 0)
    loan = final_state.get('loan_amount', 0)
    print(f"  • Annual Income: ${income:,.2f}")
    print(f"  • Loan Amount: ${loan:,.2f}")

    if income > 0:
        ratio = loan / income
        print(f"  • Loan-to-Income Ratio: {ratio:.1%}", end="")
        if ratio <= 0.50:
            print(" (Good)")
        else:
            print(" (High)")

    dti = final_state.get('debt_to_income_ratio', 0)
    print(f"  • Debt-to-Income Ratio: {dti:.1%}", end="")
    if dti < 0.36:
        print(" (Excellent)")
    elif dti < 0.43:
        print(" (Good)")
    else:
        print(" (High)")

    print("\n📋 DECISION TRAIL:")
    if 'credit_assessment' in final_state:
        print(f"  1. Credit Assessment: {final_state['credit_assessment'].get('assessment', 'N/A')}")
    if 'income_verification' in final_state:
        print(f"  2. Income Verification: {final_state['income_verification'].get('verification', 'N/A')}")
    if 'risk_analysis' in final_state:
        print(f"  3. Risk Level: {final_state['risk_analysis'].get('risk_level', 'N/A')}")
    if 'decision' in final_state:
        print(f"  4. Final Decision: {final_state['decision']}")

    # Identify the problem
    if final_state.get('decision') == 'REJECTED':
        print("\n❌ REJECTION REASONS:")

        if 'credit_assessment' in final_state:
            ca = final_state['credit_assessment']
            if ca.get('assessment') in ['FAIL', 'REVIEW']:
                print(f"  • Credit: {ca.get('reasoning', 'N/A')}")

        if 'income_verification' in final_state:
            iv = final_state['income_verification']
            if iv.get('verification') in ['FAIL', 'REVIEW']:
                print(f"  • Income: {iv.get('reasoning', 'N/A')}")

        if 'risk_analysis' in final_state:
            ra = final_state['risk_analysis']
            if ra.get('risk_level') in ['HIGH', 'MEDIUM']:
                print(f"  • Risk: {ra.get('reasoning', 'N/A')}")


def correct_and_rerun(loan_agent, config: dict, correction: dict) -> dict:
    """Correct a state and re-run the decision."""

    print("\n" + "=" * 70)
    print("🔧 CORRECTING STATE AND RE-RUNNING DECISION")
    print("=" * 70)

    # Get state history
    states = list(loan_agent.get_state_history(config))

    # Find the appropriate checkpoint to correct
    target_checkpoint = None
    target_key = list(correction.keys())[0]

    for state in states:
        # Find checkpoint that has the key we want to correct
        # but doesn't have subsequent analysis
        if target_key == 'income_verification':
            if 'income_verification' in state.values and 'risk_analysis' not in state.values:
                target_checkpoint = state
                break
        elif target_key == 'credit_assessment':
            if 'credit_assessment' in state.values and 'income_verification' not in state.values:
                target_checkpoint = state
                break

    if not target_checkpoint:
        # Use the most recent state
        target_checkpoint = states[0]

    print(f"\n✓ Found checkpoint to correct")
    print(f"  Checkpoint ID: {target_checkpoint.config['configurable']['checkpoint_id'][:8]}...")

    print(f"\n📝 APPLYING CORRECTION:")
    for key, value in correction.items():
        print(f"  {key}: {value}")

    # Update the state
    new_config = loan_agent.update_state(
        target_checkpoint.config,
        values=correction
    )

    print(f"\n✓ New checkpoint created: {new_config['configurable']['checkpoint_id'][:8]}...")

    # Re-run from the corrected state
    print("\n🔄 Re-running decision from corrected state...")
    corrected_result = loan_agent.invoke(None, new_config)

    return corrected_result


def generate_customer_letter(result: dict, original_decision: str) -> str:
    """Generate a customer communication letter."""

    letter = f"""
{'='*70}
CUSTOMER COMMUNICATION
{'='*70}

Dear Valued Customer,

Thank you for bringing this to our attention. We have completed a thorough 
investigation of your loan application.

APPLICATION DETAILS:
  • Applicant ID: {result.get('applicant_id', 'N/A')}
  • Loan Amount: ${result.get('loan_amount', 0):,.2f}
  • Purpose: {result.get('loan_purpose', 'N/A')}

YOUR FINANCIAL METRICS:
  • Credit Score: {result.get('credit_score', 'N/A')} (Excellent)
  • Annual Income: ${result.get('annual_income', 0):,.2f}
  • Debt-to-Income Ratio: {result.get('debt_to_income_ratio', 0):.0%}

"""

    if original_decision == "REJECTED" and result.get('decision') == "APPROVED":
        letter += f"""FINDING:
After careful review, we identified an error in our automated assessment 
system that led to your application being incorrectly flagged.

RESOLUTION:
  ✅ Your application has been APPROVED
  ✅ We've corrected the system error
  ✅ We're auditing other applications for similar issues

DECISION: {result['decision']}
Reasoning: {result.get('decision_reasoning', 'Application approved after review')}
Confidence: {result.get('confidence_score', 0):.0%}

NEXT STEPS:
Your loan will be processed within 24-48 hours.
"""
    else:
        letter += f"""DECISION: {result['decision']}

Reasoning: {result.get('decision_reasoning', 'N/A')}
Confidence: {result.get('confidence_score', 0):.0%}
"""

    letter += """
We appreciate your patience and value your business.

Sincerely,
The Lending Team
"""

    return letter


if __name__ == "__main__":
    import os
    from dotenv import load_dotenv

    load_dotenv()

    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        print("❌ Error: OPENAI_API_KEY environment variable required")
        print("   Set it with: export OPENAI_API_KEY='your-key-here'")
        exit(1)

    # Create the loan agent
    print("Creating Loan Approval Agent with Time Travel capabilities...")
    loan_agent = create_loan_agent(api_key)
    print("✓ Agent created successfully!")
    print("\nRun 'python demo_time_travel.py' to see the full demonstration.")

Demo Application

#!/usr/bin/env python
"""
LangGraph Time Travel Demo - Clean Version

Demonstrates Time Travel features:
1. Process a loan application
2. View checkpoint history (Time Travel)
3. Modify state at any checkpoint
4. Re-run from modified checkpoint
"""

import os
import datetime
from dotenv import load_dotenv
from typing_extensions import TypedDict, NotRequired

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import InMemorySaver
from langchain_openai import ChatOpenAI
import json

load_dotenv()


# =============================================================================
# State Definition
# =============================================================================

class LoanState(TypedDict):
    """Loan application state."""
    applicant_id: str
    loan_amount: float
    credit_score: NotRequired[int]
    annual_income: NotRequired[float]
    debt_to_income: NotRequired[float]

    # Each node adds its result
    credit_check: NotRequired[dict]
    income_check: NotRequired[dict]
    risk_level: NotRequired[str]
    decision: NotRequired[str]
    reason: NotRequired[str]


# =============================================================================
# Build the Workflow
# =============================================================================

def build_loan_workflow(api_key: str):
    """Build loan approval workflow with checkpointing."""

    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=api_key)

    # Node 1: Credit Check
    def check_credit(state: LoanState) -> dict:
        score = state.get('credit_score', 0)

        if score >= 700:
            result = {"status": "PASS", "note": f"Excellent credit ({score})"}
        elif score >= 650:
            result = {"status": "REVIEW", "note": f"Good credit ({score}), needs review"}
        else:
            result = {"status": "FAIL", "note": f"Poor credit ({score})"}

        return {"credit_check": result}

    # Node 2: Income Check
    def check_income(state: LoanState) -> dict:
        income = state.get('annual_income', 0)
        loan = state.get('loan_amount', 0)
        ratio = loan / income if income > 0 else 999

        if ratio <= 0.4:
            result = {"status": "PASS", "note": f"Loan is {ratio:.0%} of income - OK"}
        elif ratio <= 0.6:
            result = {"status": "REVIEW", "note": f"Loan is {ratio:.0%} of income - borderline"}
        else:
            result = {"status": "FAIL", "note": f"Loan is {ratio:.0%} of income - too high"}

        return {"income_check": result}

    # Node 3: Risk Assessment
    def assess_risk(state: LoanState) -> dict:
        credit = state['credit_check']['status']
        income = state['income_check']['status']
        dti = state.get('debt_to_income', 0)

        if credit == "PASS" and income == "PASS" and dti < 0.36:
            return {"risk_level": "LOW"}
        elif credit == "FAIL" or income == "FAIL" or dti > 0.50:
            return {"risk_level": "HIGH"}
        else:
            return {"risk_level": "MEDIUM"}

    # Node 4: Final Decision
    def make_decision(state: LoanState) -> dict:
        risk = state['risk_level']

        if risk == "LOW":
            return {"decision": "APPROVED", "reason": "All criteria met, low risk"}
        elif risk == "HIGH":
            return {"decision": "REJECTED", "reason": "High risk - criteria not met"}
        else:
            return {"decision": "REJECTED", "reason": "Medium risk - requires manual review"}

    # Build graph
    graph = StateGraph(LoanState)
    graph.add_node("check_credit", check_credit)
    graph.add_node("check_income", check_income)
    graph.add_node("assess_risk", assess_risk)
    graph.add_node("make_decision", make_decision)

    graph.add_edge(START, "check_credit")
    graph.add_edge("check_credit", "check_income")
    graph.add_edge("check_income", "assess_risk")
    graph.add_edge("assess_risk", "make_decision")
    graph.add_edge("make_decision", END)

    # Compile WITH checkpointing (enables Time Travel)
    checkpointer = InMemorySaver()
    return graph.compile(checkpointer=checkpointer)


# =============================================================================
# Time Travel Functions
# =============================================================================

def show_checkpoints(agent, config):
    """Display all checkpoints (Time Travel history)."""

    print("\n" + "=" * 60)
    print("🕐 TIME TRAVEL: Viewing All Checkpoints")
    print("=" * 60)

    states = list(agent.get_state_history(config))
    print(f"\n📍 Total checkpoints saved: {len(states)}")

    # Show each checkpoint (oldest first)
    for i, checkpoint in enumerate(reversed(states)):
        step = i + 1
        cp_id = checkpoint.config['configurable']['checkpoint_id'][:12]
        next_node = checkpoint.next[0] if checkpoint.next else "END"

        print(f"\n[Checkpoint {step}] ID: {cp_id}...")
        print(f"   Next node: {next_node}")

        # Show what's in state at this point
        vals = checkpoint.values
        if 'credit_check' in vals:
            cc = vals['credit_check']
            print(f"   ✓ credit_check: {cc['status']} - {cc['note']}")
        if 'income_check' in vals:
            ic = vals['income_check']
            print(f"   ✓ income_check: {ic['status']} - {ic['note']}")
        if 'risk_level' in vals:
            print(f"   ✓ risk_level: {vals['risk_level']}")
        if 'decision' in vals:
            print(f"   ✓ decision: {vals['decision']} ({vals['reason']})")

    return states


def modify_and_rerun(agent, config, states, node_to_fix: str, new_value: dict):
    """Modify a checkpoint and re-run from there."""

    print("\n" + "=" * 60)
    print("🔧 TIME TRAVEL: Modifying State & Re-running")
    print("=" * 60)

    # Find the checkpoint right after the node we want to fix
    target = None
    for state in states:
        if node_to_fix in state.values:
            # Check if next node hasn't run yet
            next_keys = {'credit_check': 'income_check', 
                        'income_check': 'risk_level',
                        'risk_level': 'decision'}
            next_key = next_keys.get(node_to_fix)
            if next_key and next_key not in state.values:
                target = state
                break

    if not target:
        target = states[0]  # Use most recent

    print(f"\n📍 Found checkpoint after '{node_to_fix}'")
    print(f"   Original value: {target.values.get(node_to_fix)}")
    print(f"   New value: {new_value}")

    # Update state at this checkpoint
    new_config = agent.update_state(
        target.config,
        values={node_to_fix: new_value}
    )

    print(f"\n✓ Created new branch from checkpoint")
    print(f"🔄 Re-running remaining nodes...")

    # Re-run from here
    result = agent.invoke(None, new_config)

    return result


# =============================================================================
# Main Demo
# =============================================================================

def run_demo():
    """Run the Time Travel demo."""

    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        print("❌ Set OPENAI_API_KEY first!")
        return

    # Build workflow
    print("\n🔧 Building Loan Workflow with Checkpointing...")
    agent = build_loan_workflow(api_key)
    print("✓ Ready!\n")

    # ==========================================================================
    # STEP 1: Process Application
    # ==========================================================================

    print("=" * 60)
    print("📝 STEP 1: Process Loan Application")
    print("=" * 60)

    application = {
        "applicant_id": "APP-001",
        "loan_amount": 50000,
        "credit_score": 720,
        "annual_income": 85000,
        "debt_to_income": 0.32
    }

    print("\nApplication:")
    for k, v in application.items():
        if isinstance(v, float) and v < 1:
            print(f"   {k}: {v:.0%}")
        elif isinstance(v, float):
            print(f"   {k}: ${v:,.0f}")
        else:
            print(f"   {k}: {v}")

    config = {"configurable": {"thread_id": "demo-001"}}

    print("\n⏳ Processing...")
    result = agent.invoke(application, config)

    print("\n" + "-" * 40)
    print(f"📊 RESULT: {result['decision']}")
    print(f"   Reason: {result['reason']}")
    print("-" * 40)

    # ==========================================================================
    # STEP 2: Time Travel - View History
    # ==========================================================================

    print("\n" + "=" * 60)
    print("📝 STEP 2: Time Travel - View Checkpoint History")
    print("=" * 60)

    states = show_checkpoints(agent, config)

    # ==========================================================================
    # STEP 3: Time Travel - Modify & Rerun
    # ==========================================================================

    if result['decision'] == 'REJECTED':
        print("\n" + "=" * 60)
        print("📝 STEP 3: Time Travel - Fix & Rerun")
        print("=" * 60)

        print("\n🔍 The application was REJECTED.")
        print("   Let's use Time Travel to see what we can fix...")

        # Find what failed
        final = states[0].values

        if final.get('income_check', {}).get('status') != 'PASS':
            print("\n💡 Income check was not PASS - let's override it")

            new_income = {
                "status": "PASS",
                "note": "Manual review: income verified sufficient"
            }

            new_result = modify_and_rerun(agent, config, states, "income_check", new_income)

            print("\n" + "-" * 40)
            print(f"📊 NEW RESULT: {new_result['decision']}")
            print(f"   Reason: {new_result['reason']}")
            print("-" * 40)

        elif final.get('credit_check', {}).get('status') != 'PASS':
            print("\n💡 Credit check was not PASS - let's override it")

            new_credit = {
                "status": "PASS", 
                "note": "Manual review: credit approved"
            }

            new_result = modify_and_rerun(agent, config, states, "credit_check", new_credit)

            print("\n" + "-" * 40)
            print(f"📊 NEW RESULT: {new_result['decision']}")
            print(f"   Reason: {new_result['reason']}")
            print("-" * 40)

    # ==========================================================================
    # Summary
    # ==========================================================================

    print("\n" + "=" * 60)
    print("🎯 TIME TRAVEL SUMMARY")
    print("=" * 60)
    print("""
    What we demonstrated:

    1. CHECKPOINTING
       - Every node saves state automatically
       - Creates complete audit trail

    2. VIEW HISTORY
       - agent.get_state_history(config) 
       - See every step of execution

    3. MODIFY & RERUN
       - agent.update_state(checkpoint, new_values)
       - agent.invoke(None, new_config)
       - Branch from any point in history

    Key Methods:
    ┌────────────────────────────────────────────┐
    │ agent.invoke(input, config)               │ Run workflow
    │ agent.get_state_history(config)           │ Get all checkpoints
    │ agent.update_state(config, values)        │ Modify checkpoint
    │ agent.invoke(None, new_config)            │ Resume from checkpoint
    └────────────────────────────────────────────┘
    """)


if __name__ == "__main__":
    run_demo()

Output

Conclusion

LLM based agents are inherently non-deterministic, which makes reproducibility, debugging, and post execution analysis difficult in production systems. LangGraph Time Travel addresses this by introducing checkpointed state persistence across agent workflows.

By capturing each state transition, Time Travel transforms a transient agent execution into a replayable and inspectable state machine. Engineers can trace failures to specific workflow nodes, inspect intermediate state, and re-execute from any checkpoint without rerunning the entire workflow.

This shifts agent development from prompt-centric experimentation to state driven engineering. The LLM remains probabilistic, but the workflow becomes deterministic, debuggable, and auditable.

LangGraph Time Travel does not make LLMs predictable it makes agent workflows reliable.

Thanks
Sreeni Ramadorai