ð debugging
Use when investigating bugs, diagnosing issues, or understanding unexpected behavior. Provides systematic approaches to finding root causes.
Overview
Systematic approaches to investigating and diagnosing bugs.
Core Principle
Understand before fixing. A proper diagnosis leads to a proper fix.
The Scientific Method for Debugging
1. Observe
Gather all the facts:
- What's the symptom? (What's happening that shouldn't?)
- When does it happen? (Always, sometimes, specific conditions?)
- Who's affected? (All users, some users, specific scenarios?)
- Error messages? (Exact text, stack traces, error codes?)
- Recent changes? (What changed before this started?)
Evidence to collect:
- Error messages and stack traces
- Application logs
- User reports
- Reproduction steps
- Environment details (browser, OS, versions)
- Network requests/responses
- Database query logs
2. Form Hypothesis
Based on symptoms, what could cause this?
Common categories:
- Logic error: Code does wrong thing
- State management: State gets out of sync
- Async/timing: Race condition, callback hell
- Data issue: Unexpected input format
- Integration: API change, service down
- Environment: Config, permissions, network
- Resource: Memory leak, connection pool exhausted
Prioritize hypotheses:
- Most likely causes first
- Easiest to test first (when equal likelihood)
- Most impactful if true
3. Test Hypothesis
Design experiment to prove/disprove:
- Add logging to see values
- Add breakpoints to pause execution
- Modify input to isolate variable
- Disable feature to rule out
- Compare with working version
Keep notes:
**Hypothesis:** Database query timeout
**Test:** Add query timing logs
**Result:** Query completes in 50ms
**Conclusion:** Not the database â
**Hypothesis:** Network latency
**Test:** Check network tab, add timing
**Result:** API call takes 5 seconds
**Conclusion:** Found the issue â
4. Analyze Results
What did you learn?
- Hypothesis confirmed or rejected?
- New questions raised?
- Unexpected findings?
- Root cause identified?
5. Repeat or Conclude
If root cause found:
- Document findings
- Estimate impact
- Plan fix
If not found:
- Form new hypothesis
- Repeat cycle
Debugging Strategies
Strategy 1: Add Logging
Most universally useful technique:
// Strategic console.log placement
function processOrder(order) {
console.log('processOrder START:', { orderId: order.id })
const items = order.items
console.log('items:', items.length)
const validated = validate(items)
console.log('validation result:', validated)
if (!validated.success) {
console.log('validation failed:', validated.errors)
throw new Error('Invalid order')
}
const total = calculateTotal(items)
console.log('total calculated:', total)
console.log('processOrder END')
return total
}
Logging guidelines:
- Log function entry/exit
- Log branching decisions
- Log external calls (API, database)
- Log unexpected values
- Include context (IDs, user info)
Strategy 2: Use Debugger
Interactive debugging:
// Browser
function buggyFunction(input) {
debugger; // Execution pauses here
const result = transform(input)
debugger; // And here
return result
}
// Node.js
node --inspect app.js
# Then open chrome://inspect in Chrome
Debugger features:
- Step over (next line)
- Step into (into function call)
- Step out (back to caller)
- Watch expressions
- Call stack inspection
- Variable inspection
Strategy 3: Binary Search
Isolate the problem area:
// 100 lines of code, bug somewhere
// Comment out lines 50-100
// Bug still happens? It's in lines 1-50
// Comment out lines 25-50
// Bug disappears? It's in lines 25-50
// Comment out lines 37-50
// Bug still happens? It's in lines 25-37
// Continue until isolated to specific lines
Strategy 4: Rubber Duck Debugging
Explain the problem out loud:
- "This function is supposed to calculate shipping cost"
- "It takes the weight and destination"
- "First it... wait, it's using price instead of weight!"
- (Bug found)
Why this works: Forces you to examine assumptions.
Strategy 5: Compare Working vs Broken
What's different?
Version comparison:
# Find which commit broke it
git bisect start
git bisect bad HEAD
git bisect good v1.0.0
# Git checks out middle commit
npm test
git bisect good/bad
# Repeat until found
Environment comparison:
- Works locally but not production?
- Works for some users but not others?
- Worked yesterday but not today?
What changed?
Strategy 6: Simplify
Reduce to minimal reproduction:
// Complex case with bug
processUserOrderWithDiscountsAndShipping(user, cart, promo, address)
// Simplify inputs one at a time
processUserOrderWithDiscountsAndShipping(user, [], null, null)
// Still breaks? Not discount or address
processUserOrderWithDiscountsAndShipping(null, [], null, null)
// Works now? It's the user object
// What about the user object causes it?
Strategy 7: Check Assumptions
Question everything:
// Assumption: API returns array
const users = await api.getUsers()
users.forEach(...) // Crashes
// Check assumption
console.log(typeof users) // "undefined"
console.log(users) // undefined
// Assumption was wrong!
Common wrong assumptions:
- Function returns expected type
- Variable is defined
- Array is not empty
- API will always respond
- Async operation has completed
- State is up to date
Debugging by Symptom
"Intermittent failure"
Likely causes:
- Race condition (timing-dependent)
- Data-dependent (certain inputs trigger it)
- Resource leak (happens after N operations)
- External service flakiness
Investigation:
- Add extensive logging
- Look for async operations
- Check timing between operations
- Look for shared state
- Run many times to see pattern
"Works locally, fails in production"
Check differences:
- Environment variables
- Data (production has different/more data)
- Network (CORS, SSL, proxies)
- Dependencies (versions, OS)
- Resources (memory, connections)
"Slow performance"
Don't guess - profile:
Frontend:
- Chrome DevTools > Performance tab
- Look for long tasks (> 50ms)
- Check for layout thrashing
- Look for memory leaks
Backend:
- Add timing logs around operations
- Check database query time (EXPLAIN ANALYZE)
- Check external API call time
- Profile with APM tool
"Memory leak"
Investigation:
// Take heap snapshot
// Do operation that leaks
// Take another heap snapshot
// Compare - what increased?
Common causes:
- Event listeners not removed
- Closures holding references
- Global variables accumulating
- Intervals not cleared
- Cache growing unbounded
"Crash/Exception"
Read the stack trace:
Error: Cannot read property 'map' of undefined
at processUsers (app.js:42:15)
at handleRequest (app.js:23:3)
at Server.<anonymous> (server.js:12:5)
Stack trace tells you:
- Line 42: Where it crashed
- Line 23: Where it was called from
- Line 12: Origin of the request
Then:
- Go to line 42
- Check what's undefined
- Trace back why it's undefined
Common Bug Patterns
Null/Undefined
// Bug
function process(user) {
return user.name.toUpperCase() // Crashes if user is null
}
// Investigation
console.log('user:', user) // undefined - why?
// Trace back to where user comes from
Off-by-One
// Bug
for (let i = 0; i <= array.length; i++) { // <= instead of <
process(array[i]) // Crashes on last iteration
}
// Investigation
console.log('i:', i, 'length:', array.length)
// Notice i === array.length causes array[i] === undefined
Async Timing
// Bug
let data
fetchData().then(result => {
data = result
})
console.log(data) // undefined - async not complete
// Investigation
console.log('1. Before fetch')
fetchData().then(result => {
console.log('3. Got result')
data = result
})
console.log('2. After fetch call')
// Output: 1, 2, 3 - async completes later
State Mutation
// Bug
function addItem(cart, item) {
cart.items.push(item) // Mutates input!
return cart
}
const originalCart = { items: [] }
const newCart = addItem(originalCart, item)
// originalCart was modified - unexpected!
// Investigation
console.log('before:', originalCart)
const newCart = addItem(originalCart, item)
console.log('after:', originalCart) // Changed!
Scope Issues
// Bug
for (var i = 0; i < 3; i++) {
setTimeout(() => console.log(i), 100)
}
// Prints: 3, 3, 3 (expected 0, 1, 2)
// Investigation
// var is function-scoped, i is shared
// By time timeout fires, loop is done, i === 3
// Fix: Use let (block-scoped) or capture i
Debugging Tools
Browser Developer Tools
Console:
console.log()- Print valuesconsole.table()- Display arrays/objects as tableconsole.trace()- Print stack traceconsole.time()/console.timeEnd()- Measure duration
Debugger:
- Set breakpoints
- Step through code
- Inspect variables
- Watch expressions
- Call stack
Network:
- View all requests
- See request/response headers and bodies
- Measure timing
- Replay requests
Performance:
- Record profile
- See function call tree
- Identify bottlenecks
- Check memory usage
Command Line Tools
# Search for text in files
grep -r "error" logs/
# Follow log file
tail -f logs/app.log
# Search with context
grep -B 5 -A 5 "ERROR" logs/app.log
# Find large files
du -sh *
# Check disk space
df -h
# Check memory
free -m
# Check running processes
ps aux | grep node
Database Debugging
-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
-- Show slow queries
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
-- Check table size
SELECT pg_size_pretty(pg_total_relation_size('users'));
-- Check indexes
\d users
Debugging Checklist
Before Starting
- Can reproduce the issue reliably?
- Have error message or symptom description?
- Know when it started happening?
- Checked if recent changes related?
- Checked logs for clues?
During Investigation
- Formed clear hypothesis?
- Testing hypothesis systematically?
- Taking notes on findings?
- Not making random changes hoping to fix?
- Questioning assumptions?
After Finding Root Cause
- Understand WHY it happens?
- Can explain it to someone else?
- Documented findings?
- Estimated impact?
- Identified proper fix?
Anti-Patterns
â Random Code Changes
BAD: "Maybe if I change this... nope, try this... nope, try this..."
GOOD: "Hypothesis: X causes Y. Test: Change X. Result: Y still happens.
Conclusion: X is not the cause."
â Assuming Without Verifying
BAD: "The API must be returning valid data"
GOOD: "Let me log the API response to see what it actually returns"
â Stopping at Symptoms
BAD: "The page is blank. Fixed by adding a null check."
GOOD: "The page is blank because user is null. User is null because
authentication token expired. Root cause: token not being refreshed."
â Debugging in Production
BAD: "Let me add console.log to production to see..."
GOOD: "Let me reproduce locally and debug there, or use proper logging"
â No Reproduction Steps
BAD: "It crashed once, let me guess why"
GOOD: "Let me find reliable way to reproduce it first"
Integration with Other Skills
- Use proof-of-work skill to document evidence
- Use test-driven-development skill to add regression test after fix
- Use explainer skill when explaining bug to others
- Use boy-scout-rule skill while fixing (improve surrounding code)
Remember
- Reproduce first - If you can't reproduce, you can't debug
- Gather evidence - Don't guess, look at data
- Form hypothesis - What do you think is wrong?
- Test systematically - Prove or disprove hypothesis
- Find root cause - Not just symptoms
- Document - Help future you and others
Debugging is detective work. Be methodical, not random.