Overview

Systematic approaches to investigating and diagnosing bugs.

Core Principle

Understand before fixing. A proper diagnosis leads to a proper fix.

The Scientific Method for Debugging

1. Observe

Gather all the facts:

What's the symptom? (What's happening that shouldn't?)
When does it happen? (Always, sometimes, specific conditions?)
Who's affected? (All users, some users, specific scenarios?)
Error messages? (Exact text, stack traces, error codes?)
Recent changes? (What changed before this started?)

Evidence to collect:

Error messages and stack traces
Application logs
User reports
Reproduction steps
Environment details (browser, OS, versions)
Network requests/responses
Database query logs

2. Form Hypothesis

Based on symptoms, what could cause this?

Common categories:

Logic error: Code does wrong thing
State management: State gets out of sync
Async/timing: Race condition, callback hell
Data issue: Unexpected input format
Integration: API change, service down
Environment: Config, permissions, network
Resource: Memory leak, connection pool exhausted

Prioritize hypotheses:

Most likely causes first
Easiest to test first (when equal likelihood)
Most impactful if true

3. Test Hypothesis

Design experiment to prove/disprove:

Add logging to see values
Add breakpoints to pause execution
Modify input to isolate variable
Disable feature to rule out
Compare with working version

Keep notes:

**Hypothesis:** Database query timeout
**Test:** Add query timing logs
**Result:** Query completes in 50ms
**Conclusion:** Not the database ❌

**Hypothesis:** Network latency
**Test:** Check network tab, add timing
**Result:** API call takes 5 seconds
**Conclusion:** Found the issue ✅

4. Analyze Results

What did you learn?

Hypothesis confirmed or rejected?
New questions raised?
Unexpected findings?
Root cause identified?

5. Repeat or Conclude

If root cause found:

Document findings
Estimate impact
Plan fix

If not found:

Form new hypothesis
Repeat cycle

Debugging Strategies

Strategy 1: Add Logging

Most universally useful technique:

// Strategic console.log placement
function processOrder(order) {
  console.log('processOrder START:', { orderId: order.id })

  const items = order.items
  console.log('items:', items.length)

  const validated = validate(items)
  console.log('validation result:', validated)

  if (!validated.success) {
    console.log('validation failed:', validated.errors)
    throw new Error('Invalid order')
  }

  const total = calculateTotal(items)
  console.log('total calculated:', total)

  console.log('processOrder END')
  return total
}

Logging guidelines:

Log function entry/exit
Log branching decisions
Log external calls (API, database)
Log unexpected values
Include context (IDs, user info)

Strategy 2: Use Debugger

Interactive debugging:

// Browser
function buggyFunction(input) {
  debugger;  // Execution pauses here
  const result = transform(input)
  debugger;  // And here
  return result
}

// Node.js
node --inspect app.js
# Then open chrome://inspect in Chrome

Debugger features:

Step over (next line)
Step into (into function call)
Step out (back to caller)
Watch expressions
Call stack inspection
Variable inspection

Strategy 3: Binary Search

Isolate the problem area:

// 100 lines of code, bug somewhere

// Comment out lines 50-100
// Bug still happens? It's in lines 1-50

// Comment out lines 25-50
// Bug disappears? It's in lines 25-50

// Comment out lines 37-50
// Bug still happens? It's in lines 25-37

// Continue until isolated to specific lines

Strategy 4: Rubber Duck Debugging

Explain the problem out loud:

"This function is supposed to calculate shipping cost"
"It takes the weight and destination"
"First it... wait, it's using price instead of weight!"
(Bug found)

Why this works: Forces you to examine assumptions.

Strategy 5: Compare Working vs Broken

What's different?

Version comparison:

# Find which commit broke it
git bisect start
git bisect bad HEAD
git bisect good v1.0.0
# Git checks out middle commit
npm test
git bisect good/bad
# Repeat until found

Environment comparison:

Works locally but not production?
Works for some users but not others?
Worked yesterday but not today?

What changed?

Strategy 6: Simplify

Reduce to minimal reproduction:

// Complex case with bug
processUserOrderWithDiscountsAndShipping(user, cart, promo, address)

// Simplify inputs one at a time
processUserOrderWithDiscountsAndShipping(user, [], null, null)
// Still breaks? Not discount or address

processUserOrderWithDiscountsAndShipping(null, [], null, null)
// Works now? It's the user object

// What about the user object causes it?

Strategy 7: Check Assumptions

Question everything:

// Assumption: API returns array
const users = await api.getUsers()
users.forEach(...)  // Crashes

// Check assumption
console.log(typeof users)  // "undefined"
console.log(users)          // undefined

// Assumption was wrong!

Common wrong assumptions:

Function returns expected type
Variable is defined
Array is not empty
API will always respond
Async operation has completed
State is up to date

Debugging by Symptom

"Intermittent failure"

Likely causes:

Race condition (timing-dependent)
Data-dependent (certain inputs trigger it)
Resource leak (happens after N operations)
External service flakiness

Investigation:

Add extensive logging
Look for async operations
Check timing between operations
Look for shared state
Run many times to see pattern

"Works locally, fails in production"

Check differences:

Environment variables
Data (production has different/more data)
Network (CORS, SSL, proxies)
Dependencies (versions, OS)
Resources (memory, connections)

"Slow performance"

Don't guess - profile:

Frontend:

Chrome DevTools > Performance tab
Look for long tasks (> 50ms)
Check for layout thrashing
Look for memory leaks

Backend:

Add timing logs around operations
Check database query time (EXPLAIN ANALYZE)
Check external API call time
Profile with APM tool

"Memory leak"

Investigation:

// Take heap snapshot
// Do operation that leaks
// Take another heap snapshot
// Compare - what increased?

Common causes:

Event listeners not removed
Closures holding references
Global variables accumulating
Intervals not cleared
Cache growing unbounded

"Crash/Exception"

Read the stack trace:

Error: Cannot read property 'map' of undefined
    at processUsers (app.js:42:15)
    at handleRequest (app.js:23:3)
    at Server.<anonymous> (server.js:12:5)

Stack trace tells you:

Line 42: Where it crashed
Line 23: Where it was called from
Line 12: Origin of the request

Then:

Go to line 42
Check what's undefined
Trace back why it's undefined

Common Bug Patterns

Null/Undefined

// Bug
function process(user) {
  return user.name.toUpperCase()  // Crashes if user is null
}

// Investigation
console.log('user:', user)  // undefined - why?
// Trace back to where user comes from

Off-by-One

// Bug
for (let i = 0; i <= array.length; i++) {  // <= instead of <
  process(array[i])  // Crashes on last iteration
}

// Investigation
console.log('i:', i, 'length:', array.length)
// Notice i === array.length causes array[i] === undefined

Async Timing

// Bug
let data
fetchData().then(result => {
  data = result
})
console.log(data)  // undefined - async not complete

// Investigation
console.log('1. Before fetch')
fetchData().then(result => {
  console.log('3. Got result')
  data = result
})
console.log('2. After fetch call')
// Output: 1, 2, 3 - async completes later

State Mutation

// Bug
function addItem(cart, item) {
  cart.items.push(item)  // Mutates input!
  return cart
}

const originalCart = { items: [] }
const newCart = addItem(originalCart, item)
// originalCart was modified - unexpected!

// Investigation
console.log('before:', originalCart)
const newCart = addItem(originalCart, item)
console.log('after:', originalCart)  // Changed!

Scope Issues

// Bug
for (var i = 0; i < 3; i++) {
  setTimeout(() => console.log(i), 100)
}
// Prints: 3, 3, 3 (expected 0, 1, 2)

// Investigation
// var is function-scoped, i is shared
// By time timeout fires, loop is done, i === 3

// Fix: Use let (block-scoped) or capture i

Debugging Tools

Browser Developer Tools

Console:

console.log() - Print values
console.table() - Display arrays/objects as table
console.trace() - Print stack trace
console.time() / console.timeEnd() - Measure duration

Debugger:

Set breakpoints
Step through code
Inspect variables
Watch expressions
Call stack

Network:

View all requests
See request/response headers and bodies
Measure timing
Replay requests

Performance:

Record profile
See function call tree
Identify bottlenecks
Check memory usage

Command Line Tools

# Search for text in files
grep -r "error" logs/

# Follow log file
tail -f logs/app.log

# Search with context
grep -B 5 -A 5 "ERROR" logs/app.log

# Find large files
du -sh *

# Check disk space
df -h

# Check memory
free -m

# Check running processes
ps aux | grep node

Database Debugging

-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

-- Show slow queries
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

-- Check table size
SELECT pg_size_pretty(pg_total_relation_size('users'));

-- Check indexes
\d users

Debugging Checklist

Before Starting

Can reproduce the issue reliably?
Have error message or symptom description?
Know when it started happening?
Checked if recent changes related?
Checked logs for clues?

During Investigation

Formed clear hypothesis?
Testing hypothesis systematically?
Taking notes on findings?
Not making random changes hoping to fix?
Questioning assumptions?

After Finding Root Cause

Anti-Patterns

❌ Random Code Changes

BAD: "Maybe if I change this... nope, try this... nope, try this..."
GOOD: "Hypothesis: X causes Y. Test: Change X. Result: Y still happens.
       Conclusion: X is not the cause."

❌ Assuming Without Verifying

BAD: "The API must be returning valid data"
GOOD: "Let me log the API response to see what it actually returns"

❌ Stopping at Symptoms

BAD: "The page is blank. Fixed by adding a null check."
GOOD: "The page is blank because user is null. User is null because
       authentication token expired. Root cause: token not being refreshed."

❌ Debugging in Production

BAD: "Let me add console.log to production to see..."
GOOD: "Let me reproduce locally and debug there, or use proper logging"

❌ No Reproduction Steps

BAD: "It crashed once, let me guess why"
GOOD: "Let me find reliable way to reproduce it first"

Integration with Other Skills

Use proof-of-work skill to document evidence
Use test-driven-development skill to add regression test after fix
Use explainer skill when explaining bug to others
Use boy-scout-rule skill while fixing (improve surrounding code)

Remember

Reproduce first - If you can't reproduce, you can't debug
Gather evidence - Don't guess, look at data
Form hypothesis - What do you think is wrong?
Test systematically - Prove or disprove hypothesis
Find root cause - Not just symptoms
Document - Help future you and others

Debugging is detective work. Be methodical, not random.

📖 debugging