Early access · Teller’s Tech Labs

Practice production judgment before production teaches you the hard way.

Code Duck is an AI-driven incident simulator for DevOps, SRE, cloud, and platform engineers. Instead of memorizing commands or clicking through multiple-choice labs, learners work through realistic production-style incidents in natural language. No clusters to spin up, no environments to clean up — just open the browser and start.

Browser-based No setup Kubernetes Terraform AWS CI/CD Incident Response On-Call Debugging Interview Prep

No setup. No cleanup. No surprise bills.

Most DevOps practice tools ask you to provision a cluster, configure tooling, and remember to tear it down. Code Duck does none of that.

Nothing to provision

No clusters to spin up. No state files. No cloud account. Scenarios load in your browser in seconds.

Nothing to break

Each scenario is a fresh, isolated environment. Roll back, blow it up, re-run from scratch as many times as you want.

Nothing to clean up

Close the tab when you’re done. No teardown, no orphaned resources, no surprise charges next month.

You investigate. Code Duck pushes back.

A short snippet from a real Kubernetes incident scenario. Notice the AI doesn't just give you the answer — it nudges your reasoning forward.

youWhat does the on-call queue look like right now?
duckTwo active alerts on orders-api:
  - HighMemoryUsage (3 pods OOMKilling)
  - 5xxRate (spiked at 14:32)
Recent change: deploy at 14:25 by @backend.
youRoll back the deploy.
duckBefore you do — what happens if the rollback also reverts the new schema migration that shipped with this deploy? Would orders-api still match the database state?
youOh. Let me check the migration status before I touch anything.
↳ that's the move. you just avoided a worse incident.
What You Can Type
kubectl get pods -n orders
command
What changed recently?
question
I think it’s a missing env var
hypothesis
Rollback the deployment
action
I’d tell the team checkout success is down
communication

The hard skill isn't command recall.

Most DevOps training overvalues command recall and undervalues operational reasoning. In real incidents, you rarely get a clean prompt — you get symptoms, partial logs, noisy alerts, unclear ownership, recent changes, pressure, and incomplete context.

Code Duck is built to help engineers practice how to think through that mess. The AI plays the role of the system, the logs, the teammate, the interviewer, and the coach — while you investigate, ask questions, form hypotheses, choose safe next steps, and explain your reasoning.

One scenario, run the way real on-call works.

Open-ended, non-linear investigation. You can go in the "wrong" direction, recover, ask better questions, and improve.

Pick a scenario

Pre-built incidents based on real-world production failure patterns. Pick by stack or skill area.

Investigate in natural language

Ask for logs, metrics, configs, recent changes, or context. No rigid command-match required.

Form hypotheses & choose next steps

Propose what's wrong, what you'd check next, and what action you'd take — with awareness of risk.

Get a coaching debrief

Feedback on troubleshooting flow, hypothesis quality, risk awareness, assumptions, and gaps to practice.

Three modes for how deep you want to go.

The same scenarios, three different relationships with feedback. Pick what matches your moment.

Practice

Hints + coaching

Open-ended investigation with proactive hints during play and a full coaching debrief at the end. Good when you’re building muscle memory or learning a new stack.

Assessment

No hints, recap only

Open-ended, but the AI stays out of your way until you’re done. Recap arrives at the end. Closest to a real on-call shift — or a take-home interview.

Guided

Optional nudges

Open-ended with light nudges available on request. Useful when you want to go solo but keep a safety net for when you’re truly stuck.

The rubric is built around judgment, not trivia.

Knowing every flag isn't the goal. Knowing what to investigate, why it matters, and what a safe next step looks like — that's the goal.

Troubleshooting flow

Symptoms β†’ evidence β†’ cause

Hypothesis quality

Reasoning from what you saw

Technical understanding

Concepts, even without exact syntax

Risk awareness

Avoiding "just restart everything"

Systems thinking

App / infra / deploy / cloud

Communication

Explaining what & why clearly

Assumptions

Spotting what you're inferring

Recovery from wrong turns

Backing out cleanly & learning

The six scoring categories
evidence_gathering prioritization action_safety communication missed_signals stronger_next_steps

The kind of incidents you actually get paged for.

Each scenario is built around a real production failure pattern — not a tutorial.

Kubernetes

App failing after a config change — OOMKilled in a loop, but the change looked harmless.

Terraform

Unexpected infrastructure drift after a routine apply, and state isn't where you'd expect.

CI/CD

GitHub Actions deployment failing after a secrets rotation — with confusing error output.

AWS

Service degraded due to an IAM or networking misconfiguration with no obvious smoking gun.

Kafka

Consumer lag spiking during broker pressure — and rebalances making everything worse.

RDS

Failover triggered, but the application is now in a connection-pool death spiral.

Join the Code Duck early access list.

Be one of the first to try Code Duck. No spam, just build updates and an invite when scenarios are ready for early testers.

Experience level β€” optional
Scenarios you'd care about β€” pick any
Would you use this β€” individually or with a team?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top