You're going to need multiple agents and a PLAN.md won't cut it to keep them coordinated.
The default planner for LLM agents is a TODO list, an unordered set with no transition semantics. The reader is the state machine, interpreting dependencies and priorities on each read. Interpretation errors waste tokens or cause error loops. Semantic drift between reads or readers can lead to further divergence.
We created hence, a command line tool that gives defeasible reasoning capabilities to LLM agents, to plan, coordinate and learn as they pursue a set of goals. In Hence, the derivation logic is in the file, not in the mind of the reader.
Download it and try the demo:
curl -fsSL anuna.io/h | bash
A plan file defines tasks, dependencies, and assignments. Consider two agents building an API:
;; Tasks
(given task-design)
(given task-impl)
(given task-tests)
(given task-review)
(given no-deps-design)
;; Agents
(given agent-coder)
(given agent-security)
;; Design is ready immediately (no dependencies)
(normally r1 (and task-design no-deps-design) ready-design)
;; Implementation waits for design completion
(normally r2 (and task-impl completed-design) ready-impl)
;; Tests and review wait for impl completion
(normally r3 (and task-tests completed-impl) ready-tests)
(normally r4 (and task-review completed-impl) ready-review)
;; Assignments
(normally r10 (and ready-tests agent-coder) assign-tests-coder)
(normally r11 (and ready-review agent-security) assign-review-security)
;; Defeater: blocked review defeats ready-review
(except d1 blocked-review (not ready-review))
The inference graph looks like this;
task-design task-impl task-tests task-review
│ │ │ │
│ r1 │ │ │
▼ │ │ │
ready-design │ │ │
│ │ │ │
+ completed ──────►│ │ │
│ r2 │ │
▼ │ │
ready-impl │ │
│ │ │
+ completed ──────►├────────────────►│
│ r3 │ r4
▼ ▼
ready-tests ready-review ◄══ blocked-review
│ │ d1 (except)
+ agent-coder + agent-security
│ │
▼ ▼
+∂ assign-tests -∂ assign-review
-coder -security
Agents interact with the plan through shell commands:
$ hence next plan.spl
Next actions:
1. ready-design
$ hence complete plan.spl design
Added: (given completed-design)
Next actions:
1. ready-impl
$ hence complete plan.spl impl
Added: (given completed-impl)
Next actions:
1. assign-tests-coder
2. assign-review-security
When something blocks, agents can see why:
$ hence why-not plan.spl assign-review-security
Why not: assign-review-security
Missing prerequisite: ready-review
Defeated by: d1
blocked-review ~> (not ready-review)
When agents discover something, they assert it:
$ hence assert plan.spl '(given discovered-vulnerability)' --agent security
Added: (given discovered-vulnerability)
$ hence next plan.spl
Next actions:
1. needs-fix
Blocked:
- ready-deploy (defeated by d-block-deploy)
The fact is appended to the plan. Conclusions are redrawn on the next query. No central coordinator needed.
Classical logic is monotonic, as premises are added the set of conclusions can only increase.
In the real-world we rarely operate from complete knowledge of the facts. We might be wrong!
Therefore, we cannot rely on partial knowledge to draw permanent conclusions. We must learn as we move through the world and acquire new knowledge that may contradict earlier conclusions.
Things I once thought
Unbelievable
In my life
Have all taken place
PJ Harvey, Good Fortune
John McCarthy identified the need for non-monotonic reasoning in the late 1970s to address how intelligent agents would solve problems with incomplete information.
It was the following video that captured my interest in the subject many years ago;
John McCarthy explains the motivation for non-monotonic reasoning (1986).
The time is ripe for defeasible logic. Previously, applying it required experts in both the formalism and the domain being modeled, a rare combination. LLMs bridge this gap: they understand domains through natural language and can translate between human intent and formal rules. And where LLM output is non-deterministic, defeasible reasoning is not. Ground the fuzzy in the formal, and you get plans that update predictably as new facts arrive.
Defeasible logic, one of several formalisms that emerged from this line of inquiry, handles non-monotonicity in linear time through facts, rules, and relations:
given, >>) assert what is truealways, =>) can't be defeated: invariants, hard constraintsnormally, ->) hold by default but can be overriddenexcept, ~>) block conclusions without asserting alternativesprefer, >) resolve conflicts when multiple rules firehence builds on Spindle, a defeasible reasoning engine created by Ho-Pun Lam and Guido Governatori at NICTA, and Maher's proof that propositional defeasible logic has linear complexity.
Spindle-Racket adds abduction, semantic annotations, reasoning traces, process mining, temporal logic based on Allen's interval calculus and a novel approach to auto-epistemic reasoning whereby agents can create a spindle theory to reason about their confidence in claims made by their peers.
Spindle-Rust is a Rust port that compiles to WebAssembly, enabling defeasible reasoning in browsers and Node.js. It implements the same DSL (Spindle Lisp) and supports first-order variables with Datalog-style grounding. Compiled to WASM, it powers the client-side reasoning engine in the hence.run web UI.
Spindle Lisp has grown beyond propositional rules. Plans can import reusable modules:
(import "./agents.spl")
First-order variables with Datalog-style grounding allow rules to quantify over entities:
(given (parent alice bob))
(normally r1 (parent ?x ?y) (ancestor ?x ?y))
Agents can wrap assertions with identity and timestamps using claims:
(claims agent:security
:at "2026-01-21T09:00:00Z"
(given scan-complete))
And metadata annotations attach descriptions, confidence, and provenance to rules:
(meta r20
(description "Block deploy on vulnerability")
(confidence 0.95))
Single file, no external state. The plan is plain text. Version control it, diff it. Writes under 4096 bytes are atomic on POSIX, so concurrent agents don't need coordination.
Explanations built in. hence explain shows derivation chains. hence trace provides full reasoning traces. hence why-not shows blockers. hence describe inspects rules, facts, and their metadata. hence require reasons backwards: given a goal, what facts would make it true? Use --assume to see what's needed after certain facts are established.
Hypothetical reasoning. hence what-if shows what conclusions would become derivable if specified facts were added, without modifying the plan.
Task lifecycle. hence claim marks tasks in-progress, preventing other agents from taking them. hence unclaim releases a claimed task using chain cancellation with versioned facts. hence block adds a defeater with a reason. hence unblock removes the block. All mutations are timestamped for audit trails.
Kanban view. hence board shows a visual overview of task states: backlog, ready, in-progress, blocked, done. --tree and --dag for other views.
Belief revision. hence assert lets agents record discoveries with full DSL power: facts, rules, defeaters, and superiority relations. Write rules that trigger on certain facts, and when those facts appear, conclusions update automatically.
Process mining. hence learn mines patterns from completed plans, extracting institutional knowledge from execution history. hence extract-log converts plans into structured event logs.
Semantic annotations. Metadata on rules, facts and conclusions such as descriptions, confidence, provenance, loosely coupled from reasoning. Trace conclusions back to source documents or code.
Interactive REPL. hence repl opens an interactive session for exploring plans, testing queries, and experimenting with rule changes before committing them.
hence.run hosts plans for distributed agent coordination. Create a plan with hence new, and it returns three capability URLs: read-only, append, and admin. Share the append URL with agents, and they can query and update the plan from anywhere.
$ hence new project.spl
Created: https://hence.run/p/abc123...
Admin: https://hence.run/p/xyz789...
Append: https://hence.run/p/def456...
$ hence next https://hence.run/p/def456...
Next actions:
1. ready-design
hence.run — DAG view with activity log, proof tree, and task details.
All appends are Ed25519-signed for authentication. An append-only log with monotonically increasing sequence numbers ensures strong consistency without race conditions. Real-time WebSocket sync keeps connected agents updated within 50ms. The web UI shows Kanban boards, DAG and tree views, dependency graphs, and proof trees explaining why each task is ready or blocked. A search palette (Cmd+K) and dark mode round out the interface.
The architecture is offline-first: a full log replica in IndexedDB and WASM reasoning in Web Workers allow agents to continue working through 24+ hours of disconnection, syncing when connectivity returns.
Plans can be copied between local and remote, listed, and archived:
# Copy a local plan to hence.run
$ hence cp plan.spl
# List hosted plans
$ hence ls
# Archive a completed plan to cold storage
$ hence rm <url>
# Install
$ curl -fsSL https://files.anuna.io/hence/latest/install.sh | bash
# Validate a plan
$ hence validate plan.spl
# See what's actionable
$ hence next plan.spl
# Kanban board view
$ hence board plan.spl
# Filter by agent
$ hence next plan.spl --agent coder
# SPL format reference for LLMs
$ hence llm
# Copy a local plan to hence.run
$ hence cp plan.spl
# List hosted plans
$ hence ls
# Archive a completed plan
$ hence rm <url>
The repository has example plans for different coordination patterns.
If coordination logic can live outside the model, it should. Hence puts the reasoning in a plain text file that agents query, update, and learn from, not in context windows they burn through.
Plans that can't change are brittle. Plans that change without structure are chaos.
Defeasible logic gives us plans that be revised in response to new information while preserving the reasoning that led to each revision.
Plans need the capacity to change, because the future is unwritten.
--
hence is open source under AGPL-3.0. spindle-rust is open source under LGPL-3.0.
Go back