// DETERMINISTIC FAULT INJECTION

PROVE YOUR SYSTEM
SURVIVES FAILURE.

Your unit tests cover the happy path. But what happens when the database dies mid-write, or a dependency times out mid-call? That's the failure everyone assumes they handle and almost nobody tests, until it's an incident. Shinari makes it a test: bring up your real system, inject one deterministic fault, and assert how it recovers, the same way on every run. One CLI, one YAML file, in the CI you already have.
From assumption to assertion.

Get started → See a scenario →

$ go build -o shinari ./cli && shinari run

Deterministic the fault lands at the same point every run, so it gates a merge instead of flaking a pipeline.
Zero platform one binary, in the CI you have. No cluster, no agents.
Findings ledger record a known gap as green; the build flips red the day it regresses or someone quietly fixes it.

You're already testing this by hand

If you've checked that your service survives a dead dependency, you've probably done it with a sleep(), a hand-wired proxy, and a test that passes on your laptop and flakes in CI.

Roll it yourself and you own all of this:

timing the fault with sleeps instead of the system's actual lifecycle,
driving the whole steady-state to fault to recovery sequence by hand,
keeping the checks honest every time the code changes.

Shinari makes it a test. One CLI, one YAML file, your real system via docker compose. The fault lands at the same point in the lifecycle every run, the whole sequence runs itself, and every run ends in a verdict you can gate a merge on, instead of a flaky one-off you wired together yourself.

Turn it should recover from an assumption into an assertion.

A crash is a test case

The whole harness is one YAML file. Write the failure you fear on the left — Shinari runs it for real on the right.

scenarios/resilience/cache-outage.yml

kind: Scenario
name: checkout-survives-cache-outage

steadyState:            # only test a healthy system
  - run: http.get
    with: /health

method:
  - phase: "Kill the cache out from under the API"
    steps:
      - run: docker.kill
        with: redis
      - run: http.get   # checkout must answer without it
        with: /checkout/42
        as: rsp

  - phase: "Bring the cache back"
    steps:
      - run: docker.start
        with: redis

verify:
  - run: assert
    with: { of: "${.outputs.rsp.value.total}", equals: 19.90 }
    desc: "served from Postgres, priced correctly"
  - run: http.get
    with: /metrics
    as: metrics
  - run: assert
    with: { of: "${.outputs.metrics.value.p99_ms}", lt: 200 }
    desc: "p99 back under 200ms"
    finding: "cold cache spikes ~30s"

shinari run

$ shinari run

━━ checkout-survives-cache-outage ──────────────────────────
  steady    ✓ http.get
  method    ⚡ docker.kill (fault injected)
            ✓ docker.kill
            ✓ http.get
            ✓ docker.start
  recovery  ✓ http.get
  verify    ✓ served from Postgres, priced correctly
            ✓ http.get
            ◆ p99 back under 200ms · FINDING: cold cache spikes ~30s

  ✔ PASSED · 1 finding held · 1.8s

1 scenario: 1 passed — 1 finding held (2s)
reports → shinari-out/{results.json,junit.xml,findings.sarif,…}

See that finding:? A known gap stays documented, asserted, and green. The day someone fixes it, the run flips red and says promote this to a hard assertion. Your suite grows with your code.

Or drive it interactively

The same engine behind a terminal control center. Browse and search scenarios, read the run plan, and run one while the steady state, injected fault, recovery and verdict stream in live. One keystroke to shinari tui.

shinari tui

What you can break

Every capability is a namespaced verb. Eleven native providers ship in the binary — and you compose your own vocabulary in YAML, zero Go.

docker.kill

drop a container mid-flight

toxiproxy.partition

sever the link between two services

net.nxdomain

poison DNS for one hostname

toxiproxy.latency

add lag, watch the timeouts cascade

http.get · exec.run

probe real APIs, run any script

queue.poison_message

your domain verbs, composed in YAML

Browse all providers →

Where Shinari fits

Shinari proves resilience before you ship, not by experimenting on live production. That boundary keeps it one deterministic binary you can gate a merge with.

Reach for it when

you want to prove a service survives a dependency failure before it merges, not after it pages someone,
you want it in CI on every change, with the fault landing at the same point every time,
you want to reproduce a specific outage from nothing, no cluster or platform to stand up,
you'd otherwise script the fault by hand and want it reproducible and maintained, not a flaky one-off.

Reach for something else when

you are running experiments against live production traffic,
you need continuous fault injection across a fleet, with blast-radius controls and scheduling,
the faults you care about only exist in production infrastructure you cannot bring up locally.

Field manual

Tutorials to learn, how-to guides to solve, reference to look up, concepts to understand — and a developers track to extend.

SECTOR 01

PROVE YOUR SYSTEM
SURVIVES FAILURE.

You're already testing this by hand

A crash is a test case

Or drive it interactively

What you can break

Where Shinari fits

Reach for it when

Reach for something else when

Field manual

Tutorials

How-to guides

Reference

Concepts

Developers

PROVE YOUR SYSTEMSURVIVES FAILURE.

You're already testing this by hand

A crash is a test case

Or drive it interactively

What you can break

Where Shinari fits

Reach for it when

Reach for something else when

Field manual

Tutorials

How-to guides

Reference

Concepts

Developers

PROVE YOUR SYSTEM
SURVIVES FAILURE.