<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
    <channel>
      <title>Raskell</title>
      <link>https://raskell.io</link>
      <description>Writing about platform automation, edge systems, applied security, and open standards. Building automation-first platforms that survive production reality.</description>
      <generator>Zola</generator>
      <language>en</language>
      <atom:link href="https://raskell.io/rss.xml" rel="self" type="application/rss+xml"/>
      <lastBuildDate>Wed, 17 Jun 2026 00:00:00 +0000</lastBuildDate>
      <item>
          <title>Retrieval Is a Tool, Not a Layer</title>
          <pubDate>Wed, 17 Jun 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/retrieval-is-a-tool-not-a-layer/</link>
          <guid>https://raskell.io/articles/retrieval-is-a-tool-not-a-layer/</guid>
          <description xml:base="https://raskell.io/articles/retrieval-is-a-tool-not-a-layer/">&lt;blockquote&gt;
&lt;p&gt;Part 5 of &lt;em&gt;The Agent Platform Handbook. From Loop to Platform.&lt;&#x2F;em&gt; Previous: &lt;a href=&quot;&#x2F;articles&#x2F;context-is-the-product&#x2F;&quot;&gt;Context Is the Product&lt;&#x2F;a&gt;. Next: Memory Is a Write Path.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;&lt;a href=&quot;&#x2F;articles&#x2F;context-is-the-product&#x2F;&quot;&gt;Post four&lt;&#x2F;a&gt; gave the harness a context layer. A &lt;code&gt;context.ts&lt;&#x2F;code&gt; module reads markdown files from a &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; directory at startup, applies a byte budget, wraps each file in a &lt;code&gt;&amp;lt;context path=&quot;...&quot;&amp;gt;&lt;&#x2F;code&gt; block, and concatenates the whole thing under the core prompt. The agent now arrives at every turn knowing what the project is, how to talk about it, and what it is not allowed to do. The harness sits at tag &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&#x2F;tree&#x2F;post-04&quot;&gt;&lt;code&gt;post-04&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;: a loop, a registry, four sandboxed tools, parallel dispatch, and a static context loader.&lt;&#x2F;p&gt;
&lt;p&gt;It loads the entire directory on every turn.&lt;&#x2F;p&gt;
&lt;p&gt;That is fine for the three small files the repo ships with. It is roughly three kilobytes. It costs almost nothing with prompt caching, and the agent reads its conventions on turn one without a single tool call. The argument from post four holds: the static layer is necessary, and it is cheap when it is small.&lt;&#x2F;p&gt;
&lt;p&gt;The directory does not stay small. The first real project that adopts &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; adds a &lt;code&gt;security.md&lt;&#x2F;code&gt;. Then an &lt;code&gt;architecture.md&lt;&#x2F;code&gt;. Then, because it is a monorepo, a file per service. Then the platform team adds a &lt;code&gt;deploy.md&lt;&#x2F;code&gt; and the data team adds a &lt;code&gt;schemas.md&lt;&#x2F;code&gt; and six months later the directory is forty files and four hundred kilobytes, and the loader from post four is pasting all of it into every turn, including the turn where the user asked the agent to fix a typo in the README.&lt;&#x2F;p&gt;
&lt;p&gt;This post is about the layer that fixes that. It ships as tag &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&#x2F;tree&#x2F;post-05&quot;&gt;&lt;code&gt;post-05&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;. The change is smaller than the problem makes it sound, because the harness already has the right shape. The agent does not need a retrieval &lt;em&gt;layer&lt;&#x2F;em&gt;. It needs a retrieval &lt;em&gt;tool&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-wrong-fix-has-a-product-page&quot;&gt;The wrong fix has a product page&lt;&#x2F;h2&gt;
&lt;p&gt;The reflex, when “load everything” stops scaling, is to reach for retrieval-augmented generation. Stand up a vector database. Pick an embedding model. Write a chunker. Write an ingestion job that walks the corpus, embeds every chunk, and upserts it into the store. Write a sync job so that when a document changes, the index changes with it. Add the vector store to your deployment, your backups, your on-call rotation, and your bill.&lt;&#x2F;p&gt;
&lt;p&gt;Now, on every turn, you embed the user’s query, run a nearest-neighbor search against the store, pull the top chunks, and splice them into the prompt. This is the architecture every vector database vendor will sell you, and for a corpus of fifty thousand documents it is the right architecture. For a &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; directory it is a second production system standing next to your agent to solve a problem your agent already has the machinery to solve.&lt;&#x2F;p&gt;
&lt;p&gt;The framing error is in the word “layer.” A layer sits beside the agent and feeds it. It has its own lifecycle, its own failure modes, its own latency, its own operational surface. You built one whole distributed system in &lt;a href=&quot;&#x2F;articles&#x2F;tools-how-agents-actually-do-things&#x2F;&quot;&gt;post three’s registry&lt;&#x2F;a&gt; and you are about to build a second one next to it.&lt;&#x2F;p&gt;
&lt;p&gt;You do not have to. The registry from post three already dispatches named operations, runs them in parallel, caps their output, and wraps their errors. Retrieval is a named operation that takes a query and returns text. It is a tool. Put it in the registry and the agent calls it the same way it calls &lt;code&gt;fs_read&lt;&#x2F;code&gt; or &lt;code&gt;git&lt;&#x2F;code&gt;, on the turns where it needs it, and not on the turns where it does not.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;pinned-versus-searchable&quot;&gt;Pinned versus searchable&lt;&#x2F;h2&gt;
&lt;p&gt;The split that makes this work was already implied by post four. That post argued that some context is cross-cutting: the rule “do not write files” applies to every tool call, so it has to be present on every turn. Other context is topical: the glossary entry for a term the agent will use once this session, the security policy that only matters when the task touches secrets, the architecture note that only matters when the task touches the loop.&lt;&#x2F;p&gt;
&lt;p&gt;Post four loaded both kinds the same way. Post five separates them.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Pinned context versus searchable context&quot;&gt;.AGENTS&amp;#x2F;
   +-- overview.md      ---+
   +-- conventions.md   ---+----&amp;gt; PINNED  ----&amp;gt; system prompt, every turn
   |                                            (cross-cutting, always true)
   +-- glossary.md      ---+
   +-- security.md      ---+----&amp;gt; SEARCHABLE -&amp;gt; context_search tool, on demand
   +-- architecture.md  ---+                    (topical, pulled per task)
   +-- ...              ---+&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Pinned files load into the system prompt on every turn, exactly as in post four, with the same budget and the same &lt;code&gt;&amp;lt;context&amp;gt;&lt;&#x2F;code&gt; wrapping. The pinned set is deliberately tiny: &lt;code&gt;overview.md&lt;&#x2F;code&gt;, so the agent knows what the project is, and &lt;code&gt;conventions.md&lt;&#x2F;code&gt;, so the agent knows the rules that apply to every action. These are the files you cannot afford to have the agent miss, so you pay for them every turn on purpose.&lt;&#x2F;p&gt;
&lt;p&gt;Everything else in &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; becomes searchable. It is not in the prompt. It is sitting in an index, and the agent pulls the relevant sections by calling a tool. The glossary moves here, because a glossary is a lookup table and you look things up when you need them. The security policy moves here, because the turn that prints a file needs it and the turn that counts lines does not. The architecture notes move here for the same reason.&lt;&#x2F;p&gt;
&lt;p&gt;The contract in &lt;code&gt;context.ts&lt;&#x2F;code&gt; is two lines.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; Pinned files load into every turn. They carry cross-cutting rules that
&amp;#x2F;&amp;#x2F; have to be true regardless of the task. Everything else in .AGENTS&amp;#x2F; is
&amp;#x2F;&amp;#x2F; searchable on demand through the context_search tool (see retriever.ts).
export const PINNED = [&amp;quot;overview.md&amp;quot;, &amp;quot;conventions.md&amp;quot;];

export function isPinned(name: string): boolean {
  return PINNED.includes(name);
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;loadContext&lt;&#x2F;code&gt; from post four now filters to the pinned set and is otherwise unchanged. The byte budget, the truncation marker, the per-file logging, the &lt;code&gt;&amp;lt;context&amp;gt;&lt;&#x2F;code&gt; rendering all survive. The function got smaller, not more complex.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-retriever&quot;&gt;The retriever&lt;&#x2F;h2&gt;
&lt;p&gt;The searchable half needs an index and a ranker. The whole thing is one new file, &lt;code&gt;retriever.ts&lt;&#x2F;code&gt;, and it does not import a database.&lt;&#x2F;p&gt;
&lt;p&gt;Start with the shape of an indexed chunk.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;export type Chunk = {
  path: string;            &amp;#x2F;&amp;#x2F; .AGENTS&amp;#x2F;security.md
  heading: string;         &amp;#x2F;&amp;#x2F; &amp;quot;Secrets&amp;quot;
  text: string;            &amp;#x2F;&amp;#x2F; the section body, heading included
  length: number;          &amp;#x2F;&amp;#x2F; token count, for length normalization
  tf: Map&amp;lt;string, number&amp;gt;; &amp;#x2F;&amp;#x2F; term -&amp;gt; frequency within this chunk
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A chunk is a section, not a file and not a fixed-size window. Markdown already tells you where the meaningful boundaries are: they are the headings. Splitting on headings means a chunk is a coherent unit a human wrote on purpose, with a title that describes it. That title becomes the chunk’s &lt;code&gt;heading&lt;&#x2F;code&gt;, which is both a retrieval signal and the label the agent cites back to the user.&lt;&#x2F;p&gt;
&lt;p&gt;The chunker walks the file line by line, starts a new chunk at every heading, and tokenizes each section as it closes.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;function chunkMarkdown(path: string, raw: string): Chunk[] {
  const chunks: Chunk[] = [];
  let heading = &amp;quot;(intro)&amp;quot;;
  let buf: string[] = [];

  const flush = () =&amp;gt; {
    const body = buf.join(&amp;quot;\n&amp;quot;).trim();
    if (body.length === 0) return;
    const tf = new Map&amp;lt;string, number&amp;gt;();
    let length = 0;
    for (const tok of tokenize(body)) {
      tf.set(tok, (tf.get(tok) ?? 0) + 1);
      length++;
    }
    chunks.push({ path, heading, text: body, length, tf });
  };

  for (const line of raw.split(&amp;quot;\n&amp;quot;)) {
    const m = &amp;#x2F;^#{1,6}\s+(.*)$&amp;#x2F;.exec(line);
    if (m) {
      flush();
      heading = m[1].trim();
      buf = [line];
    } else {
      buf.push(line);
    }
  }
  flush();
  return chunks;
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Tokenization is lowercase, split on non-alphanumeric, drop a short stop list and single characters. Nothing clever. The point is to count terms, and the ranker does the rest.&lt;&#x2F;p&gt;
&lt;p&gt;The ranker is BM25. It is the function that powers Lucene, Elasticsearch, and most of the search you have ever used that was not a neural model, and it has been the strong baseline in information retrieval for thirty years. The formula rewards a chunk for containing a query term often, with two corrections that matter. Term frequency saturates, so the tenth occurrence of a word counts for much less than the second. And the score is normalized by chunk length, so a long section does not win simply for being long. Rare terms, the ones that actually discriminate between sections, are weighted up through inverse document frequency.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;const K1 = 1.5;
const B = 0.75;

export function search(index: Index, query: string, k = 3): Hit[] {
  const terms = tokenize(query);
  if (terms.length === 0 || index.chunks.length === 0) return [];

  const N = index.chunks.length;
  const hits: Hit[] = [];

  for (const chunk of index.chunks) {
    let score = 0;
    for (const term of terms) {
      const tf = chunk.tf.get(term);
      if (!tf) continue;
      const df = index.df.get(term) ?? 0;
      const idf = Math.log(1 + (N - df + 0.5) &amp;#x2F; (df + 0.5));
      const denom = tf + K1 * (1 - B + B * (chunk.length &amp;#x2F; (index.avgLength || 1)));
      score += idf * (tf * (K1 + 1)) &amp;#x2F; denom;
    }
    if (score &amp;gt; 0) hits.push({ chunk, score });
  }

  hits.sort((a, b) =&amp;gt; b.score - a.score);
  return hits.slice(0, k);
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is the entire ranking engine. No model to download, no service to call, no index to host. &lt;code&gt;buildIndex&lt;&#x2F;code&gt; reads the non-pinned markdown files at startup, chunks them, and computes document frequencies and the average chunk length once. The result lives in memory for the life of the process.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-tool&quot;&gt;The tool&lt;&#x2F;h2&gt;
&lt;p&gt;Wrapping the retriever as a tool is the part that earns this post its title. The retriever does not get a special pathway into the loop. It gets registered like every other tool, with a name, an input schema, an output cap, and a handler.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;export function makeContextSearch(index: Index): Tool {
  return {
    name: &amp;quot;context_search&amp;quot;,
    description:
      &amp;quot;Search the project&amp;#x27;s on-demand .AGENTS&amp;#x2F; context for sections relevant &amp;quot; +
      &amp;quot;to a query. Returns the best-matching documentation sections, each &amp;quot; +
      &amp;quot;wrapped in a &amp;lt;context&amp;gt; block with its source path and heading. Use &amp;quot; +
      &amp;quot;this when you need project conventions, glossary terms, or architecture &amp;quot; +
      &amp;quot;notes that are not already in your system prompt. The system prompt &amp;quot; +
      &amp;quot;lists which sources are searchable.&amp;quot;,
    input_schema: {
      type: &amp;quot;object&amp;quot;,
      properties: {
        query: { type: &amp;quot;string&amp;quot;, description: &amp;quot;Natural-language description of what you need.&amp;quot; },
        k: { type: &amp;quot;number&amp;quot;, description: &amp;quot;Maximum number of sections to return. Defaults to 3.&amp;quot; },
      },
      required: [&amp;quot;query&amp;quot;],
    },
    max_output_bytes: 16 * 1024,
    run: async ({ query, k }) =&amp;gt; {
      const q = String(query ?? &amp;quot;&amp;quot;).trim();
      if (q.length === 0) return { ok: false, error: &amp;quot;query is required&amp;quot; };

      const limit = typeof k === &amp;quot;number&amp;quot; &amp;amp;&amp;amp; k &amp;gt; 0 ? Math.floor(k) : 3;
      const hits = search(index, q, limit);
      if (hits.length === 0) return { ok: true, value: `no matching context for: ${q}` };

      return {
        ok: true,
        value: hits
          .map(
            (h) =&amp;gt;
              `&amp;lt;context path=&amp;quot;${h.chunk.path}&amp;quot; section=&amp;quot;${h.chunk.heading}&amp;quot; score=&amp;quot;${h.score.toFixed(2)}&amp;quot;&amp;gt;\n` +
              `${h.chunk.text.trim()}\n&amp;lt;&amp;#x2F;context&amp;gt;`,
          )
          .join(&amp;quot;\n\n&amp;quot;),
      };
    },
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;makeContextSearch&lt;&#x2F;code&gt; is a factory rather than a plain export because the tool closes over the index that &lt;code&gt;buildIndex&lt;&#x2F;code&gt; produced at startup. That is the only structural difference between this tool and the four that came before it. Everything downstream is identical. The registry caps its output at &lt;code&gt;max_output_bytes&lt;&#x2F;code&gt;. The loop dispatches it in the same &lt;code&gt;Promise.all&lt;&#x2F;code&gt; as any other call from the turn. A miss returns an ordinary &lt;code&gt;ok&lt;&#x2F;code&gt; result with a “no matching context” string, not an error, because an empty search is a fact about the corpus, not a failure of the tool.&lt;&#x2F;p&gt;
&lt;p&gt;The rendered hit carries &lt;code&gt;path&lt;&#x2F;code&gt;, &lt;code&gt;section&lt;&#x2F;code&gt;, and &lt;code&gt;score&lt;&#x2F;code&gt;. The agent reads the section, answers the question, and cites the source the same way it learned to cite pinned context in post four. The score is there for the operator. When you read the trace and wonder why the agent pulled a particular section, the number tells you how strong the match was.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wiring-it-into-the-loop&quot;&gt;Wiring it into the loop&lt;&#x2F;h2&gt;
&lt;p&gt;Three changes in &lt;code&gt;agent.ts&lt;&#x2F;code&gt;, and the loop body does not move.&lt;&#x2F;p&gt;
&lt;p&gt;The registry construction becomes a function, because one tool now depends on runtime state.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;function buildRegistry(index: Index): Registry {
  return new Registry()
    .register(shell)
    .register(fs_read)
    .register(http_get)
    .register(git)
    .register(makeContextSearch(index));
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The system prompt gains a manifest. The agent cannot search a corpus it does not know exists, so the prompt lists the searchable sources and their section headings. The manifest is small, a few dozen tokens, and it is the difference between an agent that calls &lt;code&gt;context_search&lt;&#x2F;code&gt; on the right turn and one that never calls it at all.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;const m = manifest(index);
if (m) {
  parts.push(
    &amp;quot;More project context is available on demand. Call context_search &amp;quot; +
    &amp;quot;with a natural-language query to pull the relevant sections instead &amp;quot; +
    &amp;quot;of guessing. Searchable sources and their sections:\n\n&amp;quot; + m,
  );
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And &lt;code&gt;run&lt;&#x2F;code&gt; builds the index at startup, alongside the pinned load it already did.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;const ctx = await loadContext();
const index = await buildIndex();
const registry = buildRegistry(index);
const system = systemPrompt(ctx, index);
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The pinned files print as &lt;code&gt;# pinned&lt;&#x2F;code&gt;, and the searchable corpus prints its size, so the operator sees both halves of the context layer before the agent runs. That is the whole wiring change. The dispatcher, the parallel &lt;code&gt;Promise.all&lt;&#x2F;code&gt;, the iteration budget, the error envelopes are untouched. The loop still does not know what a tool does. It only knows how to call one.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-it-looks-like-running&quot;&gt;What it looks like running&lt;&#x2F;h2&gt;
&lt;p&gt;The manifest tells the agent what is searchable. The agent decides when to reach for it.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;$ bun agent.ts &amp;quot;am I allowed to write files or print the contents of .env?&amp;quot;
# pinned .AGENTS&amp;#x2F;overview.md (1002B)
# pinned .AGENTS&amp;#x2F;conventions.md (1339B)
# searchable 11 sections across 3 files
&amp;gt; context_search {&amp;quot;query&amp;quot;:&amp;quot;writing files and printing secret env files&amp;quot;} -&amp;gt; ok
No, on both counts.

- Printing .env is prohibited. The security policy excludes .env,
  secrets.env, and any .key, .pem, or .p12 file, and says to stop and
  explain rather than print it (.AGENTS&amp;#x2F;security.md, &amp;quot;Secrets&amp;quot;).
- There is no write tool in this harness, and the policy says to describe
  the change and let the operator apply it instead
  (.AGENTS&amp;#x2F;security.md, &amp;quot;Filesystem boundaries&amp;quot;).
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;One tool call. The rule lived in &lt;code&gt;security.md&lt;&#x2F;code&gt;, which is not in the prompt, and the agent retrieved exactly the two sections it needed and cited both by file and heading. The pinned &lt;code&gt;conventions.md&lt;&#x2F;code&gt; told it that rules exist and to prefer rejecting over guessing. The searchable &lt;code&gt;security.md&lt;&#x2F;code&gt; told it which rule. The same harness pointed at a forty-file &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; behaves identically. It pays for two pinned files and one search, not for forty files on a turn that touched one of them.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-not-embeddings&quot;&gt;Why not embeddings&lt;&#x2F;h2&gt;
&lt;p&gt;This is the section the vector database vendor disagrees with, so it is worth being precise about where they are right.&lt;&#x2F;p&gt;
&lt;p&gt;Embeddings solve a real problem: vocabulary mismatch. A lexical ranker scores on the words that are actually present. If the user asks “how do I ship this” and the document says “deployment procedure,” BM25 sees no shared terms and scores zero. An embedding model maps both phrases into a vector space where “ship” and “deployment” sit close together, and the search finds the document anyway. When you cannot control the words in the query, and you cannot control the words in the corpus, semantic search is the answer. That is most consumer search, most support-ticket retrieval, most search over a corpus written by people who are not you.&lt;&#x2F;p&gt;
&lt;p&gt;A &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; directory is none of those things. You control the corpus. You wrote it. The query comes from a model that has the manifest in its prompt, so it knows the corpus uses the word “deployment” and phrases its search accordingly. The vocabulary-mismatch problem that justifies embeddings is a problem you can edit away by writing the glossary heading as the word people search for. At this scale the docs and the queries share an author and a vocabulary, and lexical ranking is not a weaker version of the right answer. It is the right answer.&lt;&#x2F;p&gt;
&lt;p&gt;The cost of reaching for embeddings anyway is not zero, and it is worth naming.&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Concern&lt;&#x2F;th&gt;&lt;th&gt;Lexical (BM25)&lt;&#x2F;th&gt;&lt;th&gt;Embeddings + vector store&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Infrastructure&lt;&#x2F;td&gt;&lt;td&gt;None. An in-memory index.&lt;&#x2F;td&gt;&lt;td&gt;An embedding model and a vector database to host.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Build cost&lt;&#x2F;td&gt;&lt;td&gt;Read files, count terms. Milliseconds.&lt;&#x2F;td&gt;&lt;td&gt;Embed every chunk. A model call per chunk.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Staleness&lt;&#x2F;td&gt;&lt;td&gt;Rebuild is reading the directory again.&lt;&#x2F;td&gt;&lt;td&gt;Re-embed and re-upsert on every document edit.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Debuggability&lt;&#x2F;td&gt;&lt;td&gt;Read the score. See which terms matched.&lt;&#x2F;td&gt;&lt;td&gt;Cosine distance in a space you cannot inspect.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Failure mode&lt;&#x2F;td&gt;&lt;td&gt;Misses on vocabulary mismatch.&lt;&#x2F;td&gt;&lt;td&gt;Retrieves plausible-but-wrong neighbors silently.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Right scale&lt;&#x2F;td&gt;&lt;td&gt;Tens of files, hundreds of chunks.&lt;&#x2F;td&gt;&lt;td&gt;Tens of thousands of documents and up.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;The last row is the whole decision. Embeddings earn their keep when the corpus is too large for a human to have written with consistent vocabulary, when the queries come from people you do not control, or when you genuinely need to match across languages or heavy paraphrase. None of that is true of a project’s own context directory, and all of it is true of a knowledge base with fifty thousand pages. The mistake is not using embeddings. The mistake is using them at the scale where a thirty-year-old ranking function does the job with no operational surface at all.&lt;&#x2F;p&gt;
&lt;p&gt;When &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; grows past the point where lexical ranking holds, the tool boundary is exactly where you swap the implementation. &lt;code&gt;context_search&lt;&#x2F;code&gt; keeps its name, its schema, and its contract. The retriever behind it grows an embedding index. The agent never knows the difference, and neither does the loop. That is the payoff of treating retrieval as a tool: the upgrade path is a change behind an interface, not a new system.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;failure-modes-worth-naming&quot;&gt;Failure modes worth naming&lt;&#x2F;h2&gt;
&lt;p&gt;The first time you hit one of these you will suspect the retriever. Usually the retriever is fine and the corpus or the prompt needs an edit.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Under-retrieval.&lt;&#x2F;strong&gt; The agent answers from priors instead of calling &lt;code&gt;context_search&lt;&#x2F;code&gt;, and gets the project-specific detail wrong. The cause is almost always a weak manifest or a missing instruction. The pinned &lt;code&gt;conventions.md&lt;&#x2F;code&gt; should say, in the negative voice from post four, that the agent must search project context before guessing. The manifest should list headings specific enough that the model recognizes the turn needs them.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Over-retrieval.&lt;&#x2F;strong&gt; The agent calls &lt;code&gt;context_search&lt;&#x2F;code&gt; on every turn, including turns that the pinned context already answers. This costs a tool round-trip and some latency. The fix is the manifest again, framed so the model treats search as the fallback for topical detail, not the default for everything.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Stale index.&lt;&#x2F;strong&gt; &lt;code&gt;buildIndex&lt;&#x2F;code&gt; runs once at startup. A long-running agent process will not see edits to &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; until it restarts. For a CLI that starts per invocation this never matters. For a daemon it does, and the fix is to watch the directory and rebuild, which is a few lines and a problem for the day you actually run a daemon, not before.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Chunk boundaries.&lt;&#x2F;strong&gt; A rule that spans two headings gets split across two chunks, and a top-k search returns one half. The fix is editorial: keep a rule under one heading. The heading-based chunker rewards documents that are organized the way you would organize them for a human reader anyway.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Vocabulary mismatch.&lt;&#x2F;strong&gt; The lexical failure mode the embeddings section described. A query shares no terms with the section that answers it, and the search scores zero. At &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; scale the fix is to add the missing word to the heading or the body. You own the corpus. Make it say what people search for.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Manifest bloat.&lt;&#x2F;strong&gt; Forty searchable files, each with eight headings, is a manifest of three hundred lines, and now the thing you built to keep context out of the prompt is itself a large block of context in the prompt. The fix is to summarize: list files and a short description rather than every heading, once the directory is large enough that the full listing stops paying for itself.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-this-layer-does-not-solve&quot;&gt;What this layer does not solve&lt;&#x2F;h2&gt;
&lt;p&gt;The pattern from the earlier posts holds. Each layer earns the right to be small by deferring what does not belong to it.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Embeddings for large corpora.&lt;&#x2F;strong&gt; The tool boundary is built to absorb this. The day the corpus outgrows lexical ranking, the retriever behind &lt;code&gt;context_search&lt;&#x2F;code&gt; changes and nothing else does. This post does not build it because nothing at &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; scale needs it.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Per-turn token budgeting.&lt;&#x2F;strong&gt; Retrieved sections compete for the same window as the conversation history and the pinned context. Deciding how many sections to pull and how to evict them under pressure is a budgeting problem this post leaves at a fixed &lt;code&gt;k&lt;&#x2F;code&gt;. &lt;a href=&quot;#&quot;&gt;Post eighteen&lt;&#x2F;a&gt; on economics returns to it.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Memory across turns and sessions.&lt;&#x2F;strong&gt; Retrieval reads context the project wrote. It does not write anything. A note the agent leaves for its future self is a different layer with a different lifetime and different write semantics, and it is the whole subject of the next post.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Shared context across many agents.&lt;&#x2F;strong&gt; A fleet may want its context to live in a service rather than in every repo. The MCP &lt;code&gt;resources&#x2F;list&lt;&#x2F;code&gt; and &lt;code&gt;resources&#x2F;read&lt;&#x2F;code&gt; shape fits behind the same tool boundary. &lt;a href=&quot;#&quot;&gt;Post eight&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;where-this-lands-in-the-platform&quot;&gt;Where this lands in the platform&lt;&#x2F;h2&gt;
&lt;p&gt;Total damage going from &lt;code&gt;post-04&lt;&#x2F;code&gt; to &lt;code&gt;post-05&lt;&#x2F;code&gt;: one new file (&lt;code&gt;retriever.ts&lt;&#x2F;code&gt;), one new tool (&lt;code&gt;tools&#x2F;context_search.ts&lt;&#x2F;code&gt;), a split in &lt;code&gt;context.ts&lt;&#x2F;code&gt; between pinned and searchable, and three changes in &lt;code&gt;agent.ts&lt;&#x2F;code&gt;. The loop, the registry internals, the sandbox, the four original tools, and the types are all untouched. The agent now pays a fixed cost for the context that is always true, and a variable cost it decides for the context that is only sometimes relevant.&lt;&#x2F;p&gt;
&lt;p&gt;The retriever sits on top of the static layer from post four, exactly as that post predicted it would. The pinned files are the static context. The searchable files are the corpus. The tool is the seam between them, and the seam is an entry in the registry from &lt;a href=&quot;&#x2F;articles&#x2F;tools-how-agents-actually-do-things&#x2F;&quot;&gt;post three&lt;&#x2F;a&gt;, dispatched and capped and error-wrapped by machinery that already existed. No vector store joined the deployment. No second system joined the on-call rotation. Retrieval became a tool, because the harness was already the kind of thing that turns a function into a capability the model can reach.&lt;&#x2F;p&gt;
&lt;p&gt;The rule from the earlier posts still holds. The harness only ever grows; it does not get rewritten. Each post adds one capability to the same artifact and explains why the layer below was not enough. The layer below this one was a context loader that read everything every turn. This one reads what the turn needs. The layer above is the one that writes.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;next&quot;&gt;Next&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Part 6: Memory Is a Write Path.&lt;&#x2F;strong&gt; Retrieval reads context the project authored. Memory is the agent writing context for its own future turns and future sessions. Why the write path is the hard part, where session memory and cross-session memory diverge, and what it takes to let an agent remember without letting it corrupt its own ground truth.&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>Context Is the Product</title>
          <pubDate>Sat, 13 Jun 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/context-is-the-product/</link>
          <guid>https://raskell.io/articles/context-is-the-product/</guid>
          <description xml:base="https://raskell.io/articles/context-is-the-product/">&lt;blockquote&gt;
&lt;p&gt;Part 4 of &lt;em&gt;The Agent Platform Handbook. From Loop to Platform.&lt;&#x2F;em&gt; Previous: &lt;a href=&quot;&#x2F;articles&#x2F;tools-how-agents-actually-do-things&#x2F;&quot;&gt;Tools: How Agents Actually Do Things&lt;&#x2F;a&gt;. Next: &lt;a href=&quot;&#x2F;articles&#x2F;retrieval-is-a-tool-not-a-layer&#x2F;&quot;&gt;Retrieval Is a Tool, Not a Layer&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;In &lt;a href=&quot;&#x2F;articles&#x2F;what-an-agent-actually-is&#x2F;&quot;&gt;post one&lt;&#x2F;a&gt; we built the agent harness: a loop, a one-tool registry, a system prompt, a dispatcher, and an iteration budget. In &lt;a href=&quot;&#x2F;articles&#x2F;your-agent-wants-root&#x2F;&quot;&gt;post two&lt;&#x2F;a&gt; we slid a sandboxed runtime underneath the shell tool without touching the loop. In &lt;a href=&quot;&#x2F;articles&#x2F;tools-how-agents-actually-do-things&#x2F;&quot;&gt;post three&lt;&#x2F;a&gt; we promoted the registry into a real toolbox and let the model call several tools in one turn. The harness sits at tag &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&#x2F;tree&#x2F;post-03&quot;&gt;&lt;code&gt;post-03&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; today: a loop, a registry, four sandboxed tools, parallel dispatch, output caps, error envelopes. The pieces fit. The agent answers concrete questions about the directory it is pointed at.&lt;&#x2F;p&gt;
&lt;p&gt;It also walks into every conversation knowing nothing.&lt;&#x2F;p&gt;
&lt;p&gt;You can hand the same harness to ten different projects this afternoon. It will run. It will not know that your repo uses Bun and not Node, that your team writes “do not” instead of “don’t”, that &lt;code&gt;git push --force&lt;&#x2F;code&gt; is a fireable offense on the &lt;code&gt;main&lt;&#x2F;code&gt; branch, that the README is two years out of date, that your secrets live in &lt;code&gt;1Password:&#x2F;&#x2F;Engineering&lt;&#x2F;code&gt;, or that the term “Zentinel” in your prompt means a specific WAF tuner and not a generic security product. It will figure most of that out by running tools, slowly, and it will get the rest of it wrong.&lt;&#x2F;p&gt;
&lt;p&gt;The model is not what makes an agent good at your work. The context is. This is the post that builds that layer into the same harness, ships it as tag &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&#x2F;tree&#x2F;post-04&quot;&gt;&lt;code&gt;post-04&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;, and explains why the convention is converging on a directory called &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; rather than a single file.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-the-model-is-not-the-product&quot;&gt;Why the model is not the product&lt;&#x2F;h2&gt;
&lt;p&gt;The frontier model market in mid-2026 has roughly five vendors shipping roughly comparable coding agents. Pick any of them. You can swap them at the loop boundary and most of your agents keep working. What does not swap is the system prompt, the tool definitions, the project documentation, the conventions, the glossary, and the working knowledge the model brings into every turn. Those are yours. They are also the difference between an agent that ships and an agent that hallucinates.&lt;&#x2F;p&gt;
&lt;p&gt;This is not a new observation in the field, but it is a recent one in the product narrative. For most of 2023 and 2024 the public story was “the new model is twice as smart.” The internal story for anyone running agents in production was “the old model is fine; we cannot get useful context into it cheaply enough.” The same evals that the vendor leaderboards measure are the ones a small team can saturate with a half-decent context strategy, on a year-old model, for an order of magnitude less money per call.&lt;&#x2F;p&gt;
&lt;p&gt;You can hold this in your head with one frame. The model is software you rent. The harness is software you write. The context is the only piece of either one that is your team’s, your project’s, your codebase’s. The model is replaceable. The context is what makes the replacement work at all.&lt;&#x2F;p&gt;
&lt;p&gt;That is the thesis of this post and it determines everything that follows.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-three-sources-of-context&quot;&gt;The three sources of context&lt;&#x2F;h2&gt;
&lt;p&gt;An agent gets context from three places, and the engineering decisions you make at each layer are different.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Sources of context for an agent turn&quot;&gt;+-----------------------+        what the operator pre-loads
   |   Static context      |        before the first turn ever runs.
   |  --------------       |        cheap to author, expensive to fix
   |  system prompt        |        once a million calls have run on
   |  project rules        |        it. ships with the harness, lives
   |  glossary, conventions|        in git, reviewed by humans.
   +-----------+-----------+
               |
               v
   +-----------------------+        what arrives during the turn,
   |   Dynamic context     |        per request, often per tool call.
   |  --------------       |        the agent decides what to fetch.
   |  retrieved docs       |        the model never sees the source
   |  tool outputs         |        of truth, only the slice the
   |  fetched URLs         |        retriever returned.
   +-----------+-----------+
               |
               v
   +-----------------------+        what survives across turns and
   |   Persistent context  |        across sessions. notes the agent
   |  --------------       |        writes for its future self. covered
   |  session memory       |        in post six. mentioned here only
   |  cross-session memory |        so the diagram is complete.
   +-----------------------+&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Static context is the layer this post is about. It is the cheapest layer to get right and the one most teams skip because it feels like documentation rather than engineering. It is also the layer that compounds: every other layer of context loads on top of it, and a confused static layer makes every retrieval ambiguous and every tool output harder to interpret.&lt;&#x2F;p&gt;
&lt;p&gt;Dynamic context is what post three started, with &lt;code&gt;fs_read&lt;&#x2F;code&gt; and &lt;code&gt;http_get&lt;&#x2F;code&gt;. It is also what RAG systems do at industrial scale. The model decides what it needs and a tool fetches it. We will come back to dynamic context as its own design problem in &lt;a href=&quot;#&quot;&gt;post five&lt;&#x2F;a&gt; when we look at retrieval, embedding stores, and the per-task subsetting that lets a large repository fit into a small turn.&lt;&#x2F;p&gt;
&lt;p&gt;Persistent context is memory. Notes the agent writes to itself. A session log. A long-running working set the next conversation gets to read. &lt;a href=&quot;#&quot;&gt;Post six&lt;&#x2F;a&gt; is dedicated to it because the engineering story is genuinely different, and the design choices are larger than they look.&lt;&#x2F;p&gt;
&lt;p&gt;The honest version of the rule is that static context is necessary and not sufficient. Get it right and the other two layers have a fighting chance. Get it wrong and no amount of retrieval saves you.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-we-got-here&quot;&gt;How we got here&lt;&#x2F;h2&gt;
&lt;p&gt;Static context started life as the system prompt. A string at the top of a Python file. Six sentences. “You are a helpful assistant who specializes in customer support for an airline.”&lt;&#x2F;p&gt;
&lt;p&gt;That worked until it did not. The pattern that broke it was the same pattern every working agent now lives inside: as soon as a model has tools, the operator has to tell it which tool to pick for which job, what the inputs mean, what the failure modes are, what the project’s conventions are, and what is out of scope. Six sentences cannot hold any of that.&lt;&#x2F;p&gt;
&lt;p&gt;The industry tried to fix it in four moves, roughly in order.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Long system prompts (2022 to 2023).&lt;&#x2F;strong&gt; The first move was to make the string longer. Six sentences became six paragraphs. Then six pages. The OpenAI Cookbook and the early LangChain prompts shipped with the model’s role, examples, rules of engagement, output format, and a list of constraints. It worked, in the sense that the agent behaved better. It also produced a maintenance problem: when the team grew, the prompt drifted, nobody owned it, and small edits broke distant behaviors. Prompts were software with no test suite.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Retrieval-augmented generation (2020 to 2024).&lt;&#x2F;strong&gt; The original RAG paper by Lewis and others at FAIR landed in 2020, two years before ChatGPT, and described a system that fetched relevant documents from a corpus and concatenated them into the model’s input. By 2023 the pattern was the default for any system that needed to answer over a private corpus. By 2024 every vector database vendor had a product and a marketing budget. RAG solved the corpus problem. It did not solve the conventions problem. A retrieval system can pull “the most relevant page about how to deploy” out of fifty thousand pages of docs. It does not know that your deploy convention changed last week and the relevant page is three weeks out of date.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Long context windows (2024).&lt;&#x2F;strong&gt; Gemini shipped a one-million-token context window in February 2024. Claude followed with two hundred thousand, then five hundred thousand, then a million. The new story was “you do not need retrieval; just put everything in the prompt.” It is true for some workloads. It is wildly expensive for most, because the per-call cost scales linearly with the input tokens you paid for, even when prompt caching is in play. Long contexts made some classes of problem tractable. They did not change the engineering question. The question is still “what should the model know on every turn, regardless of which slice of the corpus is loaded.”&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;The convention layer (2025 to 2026).&lt;&#x2F;strong&gt; The pattern that worked, and that converged across vendors, was a per-project file. The team puts the agent’s operating context in a markdown file at the root of the repo. Anthropic’s Claude Code shipped &lt;code&gt;CLAUDE.md&lt;&#x2F;code&gt; in early 2025. Cursor’s IDE agent shipped &lt;code&gt;.cursorrules&lt;&#x2F;code&gt; around the same time. OpenAI Codex normalized &lt;code&gt;AGENTS.md&lt;&#x2F;code&gt;. Aider had its own convention. Continue had another. Through 2025 the file proliferated and the names diverged. In early 2026 a working group of vendor engineers, after enough customers complained about maintaining four near-identical files in every repo, started consolidating on a single convention. The shape they landed on was a directory, not a file, called &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The lineage from “long system prompt” to “.AGENTS&#x2F; directory” is one continuous line. Every step was a reaction to the previous step failing under scale. Each step kept what worked and added what was missing. The directory at the end is the smallest thing the industry could agree on that solves the problem the file did not.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-a-directory-and-not-a-file&quot;&gt;Why a directory and not a file&lt;&#x2F;h2&gt;
&lt;p&gt;If you have ever worked on a repo with a sprawling &lt;code&gt;CLAUDE.md&lt;&#x2F;code&gt; you know why the file is not enough.&lt;&#x2F;p&gt;
&lt;p&gt;A single markdown file in the repo root has three failure modes that show up almost immediately.&lt;&#x2F;p&gt;
&lt;p&gt;The first is conflict at scale. Two people edit the same file from different feature branches and the merge is painful enough that one of them gives up. The team stops writing context updates because the file is a contested resource.&lt;&#x2F;p&gt;
&lt;p&gt;The second is loss of structure. The file grows section by section. Some sections are conventions. Some are glossaries. Some are reminders about which scripts do what. Some are project history. The model reads it as one giant string and the operator cannot tell, six months later, which section is still load-bearing and which is dead.&lt;&#x2F;p&gt;
&lt;p&gt;The third is the inability to scope. The same file gets loaded for every task, regardless of whether the agent is doing a database migration or fixing a typo. A small repo can afford to pay for the whole file every turn. A large repo cannot. There is no way to say “load this section only when the task touches the API layer.”&lt;&#x2F;p&gt;
&lt;p&gt;A directory fixes all three. You get one file per concern. Conflicts move from line-level to file-level, which is easier. Structure becomes the directory layout, which is self-documenting. Scoping becomes a load-order decision that the loader can make per task, with budgets, with priorities, and with reproducible behavior. The convention is small and the engineering surface that wraps it is well-understood: read files, concatenate, cap.&lt;&#x2F;p&gt;
&lt;p&gt;There is a fourth, subtler reason. A directory is the natural seam between project context and agent identity. Project context belongs to the project and lives in git. Agent identity, the system prompt that says “you are a careful command-line assistant,” belongs to the harness and ships with it. A file in the repo root muddles those two. A directory lets the harness own the loader and the project own the contents, and they meet at a stable interface.&lt;&#x2F;p&gt;
&lt;p&gt;That is the architecture the rest of this post adds to the harness.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-mental-model&quot;&gt;The mental model&lt;&#x2F;h2&gt;
&lt;p&gt;Before the code, the picture.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;The context layer in the agent harness&quot;&gt;repo on disk                       harness                       model
    -----------                        -------                       -----

    .AGENTS&amp;#x2F;                       +----------------+
    +-- overview.md      ----+     |                |
    +-- conventions.md   ----+----&amp;gt;|  loadContext   |
    +-- glossary.md      ----+     |                |
    +-- security.md      ----+     +-------+--------+
    +-- ...                                |
                                           | LoadedContext
                                           v
                                  +-----------------+         system prompt
                                  |  systemPrompt   |----------------------&amp;gt;
                                  |  CORE + ctx     |
                                  +-----------------+

                                           ^
                                           |
                                  +-----------------+
                                  |  loop &amp;#x2F; step    |
                                  +-----------------+&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Read it left to right. The project owns a directory of markdown files. The harness owns a loader that reads them at startup, applies a byte budget, and produces a &lt;code&gt;LoadedContext&lt;&#x2F;code&gt; object. The harness’s system-prompt builder takes the core prompt (the agent’s role, the toolbox description) and concatenates the rendered context block underneath it. The loop and step functions never know any of this happened. Same &lt;code&gt;step(messages, system)&lt;&#x2F;code&gt; call as before, with a different &lt;code&gt;system&lt;&#x2F;code&gt; string.&lt;&#x2F;p&gt;
&lt;p&gt;The loader has three responsibilities, and they are worth naming because each one is a place where teams get tempted to do something clever and pay for it.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Read the files.&lt;&#x2F;strong&gt; Deterministically, in a defined order. The order matters because the model attends to the start of the prompt differently than the middle. Put the high-signal files first.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Cap the total size.&lt;&#x2F;strong&gt; A byte budget across all sources. Anything that does not fit is either dropped or truncated with a visible marker, never silently rolled off.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Render with attribution.&lt;&#x2F;strong&gt; Each source is wrapped in a &lt;code&gt;&amp;lt;context path=&quot;...&quot;&amp;gt;&lt;&#x2F;code&gt; block so the model can tell the user which file the rule came from. This is cheap, costs maybe twenty tokens per source, and turns “the conventions say to never use the shell tool to write files” into a citation the operator can audit.&lt;&#x2F;p&gt;
&lt;p&gt;That is the whole thing. Nothing else belongs in the loader. Per-task filtering, embedding-based selection, dynamic retrieval, and memory are separate layers that the next two posts will build on top of this one.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;build-the-loader&quot;&gt;Build the loader&lt;&#x2F;h2&gt;
&lt;p&gt;The harness from &lt;code&gt;post-03&lt;&#x2F;code&gt; has six files. We add a seventh: &lt;code&gt;context.ts&lt;&#x2F;code&gt;. We also create the &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; directory with three example sources.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;context.ts&lt;&#x2F;code&gt; module exports one function. Roughly seventy lines.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; context.ts
import { readFile, readdir, stat } from &amp;quot;node:fs&amp;#x2F;promises&amp;quot;;
import { join } from &amp;quot;node:path&amp;quot;;

export type ContextSource = {
  path: string;
  bytes: number;
  content: string;
  truncated: boolean;
};

export type LoadedContext = {
  sources: ContextSource[];
  rendered: string;
  totalBytes: number;
  budgetBytes: number;
};

export type ContextOptions = {
  dir?: string;
  maxBytes?: number;
};

const DEFAULT_DIR = &amp;quot;.AGENTS&amp;quot;;
const DEFAULT_MAX_BYTES = 32 * 1024;
const PRIORITY = [&amp;quot;overview.md&amp;quot;, &amp;quot;conventions.md&amp;quot;, &amp;quot;glossary.md&amp;quot;];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three types and three constants. &lt;code&gt;ContextSource&lt;&#x2F;code&gt; carries enough metadata for the operator to debug what got loaded. &lt;code&gt;LoadedContext&lt;&#x2F;code&gt; carries the rendered string for the system prompt plus accounting for the operator. &lt;code&gt;ContextOptions&lt;&#x2F;code&gt; is the only knob: a directory and a byte budget. Defaults match the convention.&lt;&#x2F;p&gt;
&lt;p&gt;The ordering policy is two lines.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;function orderEntries(entries: string[]): string[] {
  const present = new Set(entries);
  const head = PRIORITY.filter((n) =&amp;gt; present.has(n));
  const tail = entries
    .filter((n) =&amp;gt; !PRIORITY.includes(n))
    .filter((n) =&amp;gt; n.endsWith(&amp;quot;.md&amp;quot;))
    .sort();
  return [...head, ...tail];
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three priority files load first in a fixed order: &lt;code&gt;overview.md&lt;&#x2F;code&gt;, &lt;code&gt;conventions.md&lt;&#x2F;code&gt;, &lt;code&gt;glossary.md&lt;&#x2F;code&gt;. Everything else loads alphabetically. The model attends to the start of the prompt more strongly than to the middle, so the policy puts the highest-signal files at the start and lets the rest tail in.&lt;&#x2F;p&gt;
&lt;p&gt;The load itself reads each file, accounts for it against the budget, and either takes it whole, truncates it, or stops.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;export async function loadContext(opts: ContextOptions = {}): Promise&amp;lt;LoadedContext&amp;gt; {
  const dir = opts.dir ?? process.env.AGENTS_DIR ?? DEFAULT_DIR;
  const budget = opts.maxBytes ?? DEFAULT_MAX_BYTES;

  if (!(await isDir(dir))) {
    return { sources: [], rendered: &amp;quot;&amp;quot;, totalBytes: 0, budgetBytes: budget };
  }

  const entries = await readdir(dir);
  const ordered = orderEntries(entries);

  const sources: ContextSource[] = [];
  let used = 0;

  for (const name of ordered) {
    const path = join(dir, name);
    const raw = await readFile(path, &amp;quot;utf8&amp;quot;);
    const bytes = Buffer.byteLength(raw, &amp;quot;utf8&amp;quot;);
    const remaining = budget - used;

    if (bytes &amp;lt;= remaining) {
      sources.push({ path, bytes, content: raw, truncated: false });
      used += bytes;
      continue;
    }

    if (remaining &amp;lt; 256) break;

    const head = raw.slice(0, remaining);
    const note = `\n\n[truncated: ${bytes - remaining} more bytes]`;
    sources.push({ path, bytes: remaining, content: head + note, truncated: true });
    used += remaining;
    break;
  }

  const rendered = sources
    .map((s) =&amp;gt; `&amp;lt;context path=&amp;quot;${s.path}&amp;quot;&amp;gt;\n${s.content.trimEnd()}\n&amp;lt;&amp;#x2F;context&amp;gt;`)
    .join(&amp;quot;\n\n&amp;quot;);

  return { sources, rendered, totalBytes: used, budgetBytes: budget };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A few details to call out, because the cheap version of this loader gets every one of them wrong.&lt;&#x2F;p&gt;
&lt;p&gt;The directory defaults to &lt;code&gt;.AGENTS&lt;&#x2F;code&gt; but accepts an environment override through &lt;code&gt;AGENTS_DIR&lt;&#x2F;code&gt;. The override lets you point a single binary at a per-project context without rewriting the harness.&lt;&#x2F;p&gt;
&lt;p&gt;The budget defaults to thirty-two kilobytes. That is roughly eight thousand tokens at the typical English-to-token ratio, which is generous on a modern model and cheap with prompt caching. Pick the budget your model and your wallet can sustain, not the budget that fits “everything you wrote.”&lt;&#x2F;p&gt;
&lt;p&gt;The truncation marker is visible. The model reads &lt;code&gt;[truncated: 4096 more bytes]&lt;&#x2F;code&gt; and knows the source was clipped. That matters because the model can decide to ask a tool to read the rest, which is exactly the behavior you want when a file is too big to fit in the prompt.&lt;&#x2F;p&gt;
&lt;p&gt;The render wraps each source in a &lt;code&gt;&amp;lt;context path=&quot;...&quot;&amp;gt;&lt;&#x2F;code&gt; tag. The tag is not a special token. The model treats it as structure because it has seen similar shapes in its training data. Cite-back behavior comes for free.&lt;&#x2F;p&gt;
&lt;p&gt;The loader never errors on a missing directory. If &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; does not exist, the loader returns an empty context and the harness falls back to the core prompt. That is the right default for a small project that has not opted in yet.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-sample-agents&quot;&gt;A sample &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The repo at &lt;code&gt;post-04&lt;&#x2F;code&gt; ships with three sources. They are short on purpose. The point is the shape, not the content.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;.AGENTS&amp;#x2F;
├── overview.md       what this repo is, how it is organized
├── conventions.md    rules the agent must follow per tool
└── glossary.md       terms specific to this project
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The shape is the contract. Any agent that loads from &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; should expect these three files. Anything else is project-specific. A security-sensitive project might add &lt;code&gt;security.md&lt;&#x2F;code&gt;. A monorepo might add &lt;code&gt;services.md&lt;&#x2F;code&gt; with one section per service. A multi-platform repo might add &lt;code&gt;platforms.md&lt;&#x2F;code&gt;. The convention does not constrain those.&lt;&#x2F;p&gt;
&lt;p&gt;The content of &lt;code&gt;conventions.md&lt;&#x2F;code&gt; matters more than people expect, so the example in the repo is worth quoting in part.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;## Filesystem

- Read files with `fs_read`, not `shell`. The shell tool runs inside a
  sandbox that does not see the host filesystem.
- Do not write files. There is no write tool in this harness yet.
- Do not assume the working directory. Use absolute paths or paths
  rooted at the repo, never `~`.
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That paragraph fixes about four hallucinated tool calls per session in practice. It costs roughly fifty tokens to load. The math is not subtle.&lt;&#x2F;p&gt;
&lt;p&gt;The rule of thumb that survives contact with reality is to write conventions in the negative voice when you can. “Do not write files” tells the model where the edge is and what to do at it. “Always use the shell tool for system inspection” tells the model nothing useful because it does not say what to do when the system inspection means reading a file. Negative rules are testable. Positive rules drift.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wiring-it-into-the-harness&quot;&gt;Wiring it into the harness&lt;&#x2F;h2&gt;
&lt;p&gt;Three changes in &lt;code&gt;agent.ts&lt;&#x2F;code&gt;. The diff is small because the loop never moved.&lt;&#x2F;p&gt;
&lt;p&gt;The import comes in at the top.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;import { loadContext, type LoadedContext } from &amp;quot;.&amp;#x2F;context&amp;quot;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The system prompt builder turns from a constant string into a function that takes the loaded context.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;const CORE_PROMPT = `You are a careful command-line assistant.
You have access to a small toolbox: a sandboxed shell, a file reader,
an HTTP GET, and a read-only git wrapper. Use them to investigate the
user&amp;#x27;s request and answer concretely. Multiple tools may run in one
turn. When you have the answer, stop calling tools and reply in plain
text.`;

function systemPrompt(ctx: LoadedContext): string {
  if (ctx.sources.length === 0) return CORE_PROMPT;
  return (
    CORE_PROMPT +
    &amp;quot;\n\nProject context loaded from .AGENTS&amp;#x2F;. Treat the contents below &amp;quot; +
    &amp;quot;as authoritative for this project&amp;#x27;s conventions and terminology. &amp;quot; +
    &amp;quot;Each block is wrapped in &amp;lt;context path=\&amp;quot;...\&amp;quot;&amp;gt; tags so you can &amp;quot; +
    &amp;quot;cite it back to the user.\n\n&amp;quot; +
    ctx.rendered
  );
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The core prompt is short on purpose. It says what the agent is, what tools it has, and when to stop. Everything else is project context. The bridge sentence between the core and the context tells the model how to treat what follows, which is the kind of framing that pays for itself within one turn.&lt;&#x2F;p&gt;
&lt;p&gt;The run function loads the context once at startup, logs what got loaded, and threads the system string into every step.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;export async function run(goal: string, maxIterations = 10) {
  const ctx = await loadContext();
  const system = systemPrompt(ctx);
  for (const s of ctx.sources) {
    console.error(`# context ${s.path} (${s.bytes}B${s.truncated ? &amp;quot;, truncated&amp;quot; : &amp;quot;&amp;quot;})`);
  }
  &amp;#x2F;&amp;#x2F; ... rest of the loop unchanged
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The four lines of logging matter more than they look. The first time an agent does something unexpected, the operator wants to know which files were in scope. Printing them is the cheapest debugging artifact you will ever ship. Print them.&lt;&#x2F;p&gt;
&lt;p&gt;That is the whole wiring change. The &lt;code&gt;for&lt;&#x2F;code&gt; loop, the dispatcher, the &lt;code&gt;Promise.all&lt;&#x2F;code&gt; over tool calls, the iteration budget are all untouched. The loop does not know context exists.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-it-looks-like-running&quot;&gt;What it looks like running&lt;&#x2F;h2&gt;
&lt;p&gt;A short transcript shows the difference the context layer makes.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;$ bun agent.ts &amp;quot;what does this repo do, and what conventions should I follow?&amp;quot;
# context .AGENTS&amp;#x2F;overview.md (812B)
# context .AGENTS&amp;#x2F;conventions.md (1186B)
# context .AGENTS&amp;#x2F;glossary.md (824B)
This repository is the reference harness for the Agent Platform Handbook
series on raskell.io. The harness is a Bun + TypeScript agent against the
Anthropic Messages API. Each git tag (post-01 through post-04) is the
exact state of the code that the matching post discusses.

The conventions you should follow:

- Use fs_read for file reads, not shell. The shell runs in a sandbox
  without host filesystem access (.AGENTS&amp;#x2F;conventions.md).
- Do not write files. No write tool exists in this harness yet
  (.AGENTS&amp;#x2F;conventions.md).
- git is read-only. Allowed subcommands: log, diff, show, status, branch,
  ls-files (.AGENTS&amp;#x2F;conventions.md).
- http_get accepts https:&amp;#x2F;&amp;#x2F; only, for public documentation and APIs
  (.AGENTS&amp;#x2F;conventions.md).
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Zero tool calls. The model answered from the static context alone, and cited each rule back to the file it came from. The same agent without &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; would have run three or four &lt;code&gt;fs_read&lt;&#x2F;code&gt; calls, traversed the directory, opened the README, and produced a similar answer two seconds slower and one cache miss away.&lt;&#x2F;p&gt;
&lt;p&gt;The same harness, with a different &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt;, becomes a different agent.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-retrieval-alone-is-not-enough&quot;&gt;Why retrieval alone is not enough&lt;&#x2F;h2&gt;
&lt;p&gt;This is the section where the RAG enthusiast and the long-context enthusiast both push back, and they are both wrong in instructive ways.&lt;&#x2F;p&gt;
&lt;p&gt;A retrieval system answers the question “what is relevant to this turn.” A static context answers the question “what is true about this project regardless of any turn.” The two are not interchangeable. The retrieval system cannot know that your team prohibits force pushes unless you wrote that into a document and the retriever surfaced it. The retriever will surface it sometimes. The static context surfaces it every time.&lt;&#x2F;p&gt;
&lt;p&gt;Three failure modes show up if you try to skip the static layer.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Cold-start emptiness.&lt;&#x2F;strong&gt; The very first turn of a session has no prior context, no tool outputs, no retrieval hits. The model has the user’s prompt and whatever you preloaded. Without static context, the model starts every session with the same generic priors. With it, the model starts every session knowing what the project is about and what it cannot do.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Retrieval misses on infrequent rules.&lt;&#x2F;strong&gt; A convention that comes up once a month is unlikely to retrieve cleanly. The embedding distance from “do not force push” to “I want to clean up the commit history” is not small but not zero, and a top-k retriever will sometimes miss it. The static layer hard-codes the rules that you cannot afford to miss.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Cross-cutting concerns.&lt;&#x2F;strong&gt; The convention “do not write files” applies to every tool call. There is no single document the retriever could match against every relevant turn. The static layer is the only place this kind of rule belongs.&lt;&#x2F;p&gt;
&lt;p&gt;The long-context argument is structurally similar. “Just put the whole repo in the prompt.” Two problems. The repo is bigger than the window for any non-trivial codebase, and you pay for every token on every turn. Even with prompt caching at ninety percent hit rate, the bill for a thousand-turn session against a two-hundred-thousand-token repo is real money. Static context lets you put the part that has to be in the prompt in the prompt, and let the rest live behind retrieval and tools.&lt;&#x2F;p&gt;
&lt;p&gt;The honest synthesis is that all three layers are necessary. Static for the rules. Retrieval for the corpus. Tools for everything live. This post is about the first one because it is the layer that everyone defers and that determines whether the other two work.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;designing-context-that-the-model-can-actually-use&quot;&gt;Designing context that the model can actually use&lt;&#x2F;h2&gt;
&lt;p&gt;Once you have a loader, the engineering question becomes “what do you put in &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt;.” There are four patterns that hold up.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Conventions in the negative voice.&lt;&#x2F;strong&gt; Tell the model what it cannot do, why, and what to do instead. A convention like “do not write files; there is no write tool” is testable: the model either tries to write a file or it does not, and you can grep for it in the trace. A convention like “be helpful” is not testable. Negative rules also fail safer when the model misreads them.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Glossary entries for project-specific nouns.&lt;&#x2F;strong&gt; Every project has a few terms that mean something different inside the project than outside. “Zentinel” in your company might be a WAF tuner. “The pipeline” might be a four-stage build system. The model has seen all of these words in its training data and will pick a meaning at random unless you tell it. A short glossary is cheap insurance.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Overview as a sitemap, not a story.&lt;&#x2F;strong&gt; The overview file should answer “what is here and where do I find it.” Not “what is the history of this project.” The history goes in the README for humans. The overview goes in &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; for agents and is the structural map: the modules, the files, the entry points, the canonical commands. Keep it short. Update it when files move.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;One file per concern.&lt;&#x2F;strong&gt; Resist the temptation to put security rules in &lt;code&gt;conventions.md&lt;&#x2F;code&gt; because they are conventions. Make a &lt;code&gt;security.md&lt;&#x2F;code&gt;. Resist the temptation to put architecture in &lt;code&gt;overview.md&lt;&#x2F;code&gt; because it is overview. Make an &lt;code&gt;architecture.md&lt;&#x2F;code&gt;. Each file gets its own load slot, its own conflict surface, its own owner. The directory is what makes this affordable.&lt;&#x2F;p&gt;
&lt;p&gt;The decision table summarizes where each kind of context goes.&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Kind of context&lt;&#x2F;th&gt;&lt;th&gt;Lives in&lt;&#x2F;th&gt;&lt;th&gt;Why&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;What the agent is, role, toolbox&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;CORE_PROMPT&lt;&#x2F;code&gt; in harness&lt;&#x2F;td&gt;&lt;td&gt;Ships with the harness, not the project.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Project rules, conventions, prohibitions&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;.AGENTS&#x2F;conventions.md&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Per-project, hard-coded, cited back to the user.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Project-specific terms and acronyms&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;.AGENTS&#x2F;glossary.md&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Disambiguates words the model knows differently.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Sitemap, modules, canonical commands&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;.AGENTS&#x2F;overview.md&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Self-documents the repo for the agent.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Sensitive policies (do-not-touch lists)&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;.AGENTS&#x2F;security.md&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Separates audit surface from general rules.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Large reference material (API docs, RFCs)&lt;&#x2F;td&gt;&lt;td&gt;Retrieval, not static&lt;&#x2F;td&gt;&lt;td&gt;Too big for the budget, not needed every turn.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Live system state (running processes, db)&lt;&#x2F;td&gt;&lt;td&gt;Tools&lt;&#x2F;td&gt;&lt;td&gt;Static context goes stale; tools always fresh.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Notes the agent writes to its future self&lt;&#x2F;td&gt;&lt;td&gt;Memory (post six)&lt;&#x2F;td&gt;&lt;td&gt;Belongs in a layer this post does not build.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;The table is the four-line summary of the rest of the series. Each row maps to one layer the harness either has, will have, or deliberately delegates to a tool.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;failure-modes-worth-naming&quot;&gt;Failure modes worth naming&lt;&#x2F;h2&gt;
&lt;p&gt;The first time you hit any of these you will think the loader is broken. It is not. These are static-context problems specifically.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Context drift.&lt;&#x2F;strong&gt; The conventions file says “use Bun 1.1,” your toolchain moved to Bun 1.5, the file is not updated, the model produces commands that work fine but cite a version that is two minor releases old. The fix is to treat &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; files as code, reviewed in PRs, with the same kind of “is this still true” sweep you do on the README.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Stale glossary.&lt;&#x2F;strong&gt; A term changes meaning. The team renames “Zentinel” to “Sentinel” because the original was a joke. The glossary still says “Zentinel is a WAF tuner.” The model now uses both names interchangeably and the user is confused. Treat the glossary as a single source of truth. Update it on rename PRs.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Budget eviction silently dropping a critical file.&lt;&#x2F;strong&gt; A new file pushes the total over thirty-two kilobytes, the loader stops loading after &lt;code&gt;conventions.md&lt;&#x2F;code&gt;, and &lt;code&gt;security.md&lt;&#x2F;code&gt; is silently absent from the prompt. The fix is the visible truncation marker and the per-file logging in the loader. The operator sees the size before the agent runs and can raise the budget or trim a file.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Over-stuffed conventions.&lt;&#x2F;strong&gt; The team writes every preference, taste, and personal hobby horse into &lt;code&gt;conventions.md&lt;&#x2F;code&gt;. The file balloons to ten thousand tokens. The model attention spreads thin. Sub-rules get ignored. The fix is to apply the same editorial discipline to the conventions file that you would apply to a real document. Cut anything that is not load-bearing.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Citation theater.&lt;&#x2F;strong&gt; The model picks up the citation pattern and starts citing files that were not in the loaded context. This happens because the model is good at pattern-matching and &lt;code&gt;(.AGENTS&#x2F;conventions.md)&lt;&#x2F;code&gt; is a pattern. The fix is to read your traces. If a citation is wrong, the convention was either missing or unclear, and the file needs an edit.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Context overriding the operator.&lt;&#x2F;strong&gt; A convention says “always reply in English.” The user asks for a French answer. The model says no. The convention won. This is sometimes what you want and often not. Decide explicitly which rules are operator-overridable and which are not, and write the irreversible ones in the strongest voice. We will come back to this in &lt;a href=&quot;#&quot;&gt;post fourteen&lt;&#x2F;a&gt; on policy.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Token cost surprise.&lt;&#x2F;strong&gt; The loader is cheap. The cumulative cost of loading thirty-two kilobytes of context on every call is not. Prompt caching makes it manageable. Without prompt caching, your bill is roughly ten times what it would be without &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt;. Turn caching on. We will spend &lt;a href=&quot;#&quot;&gt;post eighteen&lt;&#x2F;a&gt; on the economics.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-this-layer-does-not-solve&quot;&gt;What this layer does not solve&lt;&#x2F;h2&gt;
&lt;p&gt;Static context is one layer. Things you might expect this post to cover that get a dedicated post later.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Per-task subsetting.&lt;&#x2F;strong&gt; Loading the whole &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; directory on every turn is wasteful for large repos. The next refinement is to load only the files relevant to the current task. That is a retrieval problem with a static-context shape, and we will pick it up in &lt;a href=&quot;#&quot;&gt;post five&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Memory across turns and sessions.&lt;&#x2F;strong&gt; A note the agent wrote yesterday is not in &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt;. It is in a memory layer with different lifetime, different ownership, and different write semantics. &lt;a href=&quot;#&quot;&gt;Post six&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Per-tool context.&lt;&#x2F;strong&gt; A tool might want a small chunk of context that only matters when it runs. “Reading this file? Here is what to expect in its structure.” That belongs in the tool description, not in &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt;. We covered the principle in &lt;a href=&quot;&#x2F;articles&#x2F;tools-how-agents-actually-do-things&#x2F;&quot;&gt;post three&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;MCP-served context.&lt;&#x2F;strong&gt; A team that runs many agents may want context to live in a shared service, not in every repo. MCP has a &lt;code&gt;resources&#x2F;list&lt;&#x2F;code&gt; and &lt;code&gt;resources&#x2F;read&lt;&#x2F;code&gt; shape that fits. &lt;a href=&quot;#&quot;&gt;Post eight&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Versioning, evals, and rollback.&lt;&#x2F;strong&gt; A bad edit to &lt;code&gt;.AGENTS&#x2F;conventions.md&lt;&#x2F;code&gt; can degrade every agent in the fleet. The discipline that wraps that is the same discipline that wraps any production string. Reviewed PRs, traced behavior, a rollback path. &lt;a href=&quot;#&quot;&gt;Post sixteen&lt;&#x2F;a&gt; covers the eval side.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Identity and authorization.&lt;&#x2F;strong&gt; A convention that says “do not deploy to production” does not stop the agent from deploying to production if the tool is available. That is policy and identity, &lt;a href=&quot;#&quot;&gt;post thirteen&lt;&#x2F;a&gt; and &lt;a href=&quot;#&quot;&gt;post fourteen&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The pattern is the same as in the earlier posts. Each layer earns the right to be small by deferring everything that does not belong to it. The static-context layer earns its keep by being short, deterministic, and cited.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-this-lands-in-the-platform&quot;&gt;Where this lands in the platform&lt;&#x2F;h2&gt;
&lt;p&gt;Total damage going from &lt;code&gt;post-03&lt;&#x2F;code&gt; to &lt;code&gt;post-04&lt;&#x2F;code&gt;: one new file (&lt;code&gt;context.ts&lt;&#x2F;code&gt;), one new directory (&lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt;) with three sample files, three changes in &lt;code&gt;agent.ts&lt;&#x2F;code&gt;. The loop, the registry, the tools, the sandbox, and the types are all untouched. The diff is roughly one hundred and twenty lines. In exchange, the agent now arrives at every turn knowing what the project is, how to talk about it, and what it is not allowed to do.&lt;&#x2F;p&gt;
&lt;p&gt;In the reference architecture from &lt;a href=&quot;#&quot;&gt;post twenty-two&lt;&#x2F;a&gt;, the context layer is the seam between the project and the agent. The harness loads from it. The retriever in &lt;a href=&quot;#&quot;&gt;post five&lt;&#x2F;a&gt; layers on top of it. The memory layer in &lt;a href=&quot;#&quot;&gt;post six&lt;&#x2F;a&gt; writes alongside it. The policy layer in &lt;a href=&quot;#&quot;&gt;post fourteen&lt;&#x2F;a&gt; reads from it to decide what is enforced and what is advisory. Same diagram as the earlier posts, with one more box filled in.&lt;&#x2F;p&gt;
&lt;p&gt;The rule from the earlier posts still holds. The harness only ever grows; it does not get rewritten. Each post adds one layer to the same artifact and explains why the layer below was not enough.&lt;&#x2F;p&gt;
&lt;p&gt;The layer below this one was a model that knew nothing about your project. The layer above is the model that pulls the slice of context it needs for the turn at hand. Static context makes the agent useful on every turn at a fixed cost. Per-task retrieval makes it useful on the turns where the static layer is not enough, at a variable cost the agent decides. Next we build the layer that turns a loader into a retriever and explains why embeddings are sometimes the right answer and sometimes a trap. That post will ship as &lt;code&gt;post-05&lt;&#x2F;code&gt; in the same repo.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;next&quot;&gt;Next&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;&#x2F;articles&#x2F;retrieval-is-a-tool-not-a-layer&#x2F;&quot;&gt;Part 5: Retrieval Is a Tool, Not a Layer&lt;&#x2F;a&gt;.&lt;&#x2F;strong&gt; Why pulling context dynamically is its own design problem, where embeddings stop being the right answer, and how to subset &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; per task in the harness we have been building.&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>I Stopped Writing Most of My Code</title>
          <pubDate>Wed, 10 Jun 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/talks/iat-13-basel/</link>
          <guid>https://raskell.io/talks/iat-13-basel/</guid>
          <description xml:base="https://raskell.io/talks/iat-13-basel/">&lt;section class=&quot;slide slide--title&quot;&gt;
  &lt;div class=&quot;slide__inner&quot;&gt;
    &lt;p class=&quot;slide__eyebrow&quot;&gt;IT&#x27;s ABOUT TECH #13 · Basel · 10 June 2026&lt;&#x2F;p&gt;
    &lt;h1 class=&quot;slide__title slide__title--xl&quot;&gt;I Stopped Writing Most of My Code&lt;&#x2F;h1&gt;
    &lt;p class=&quot;slide__sub&quot;&gt;The year my backlog came alive.&lt;&#x2F;p&gt;
    &lt;p class=&quot;slide__id&quot;&gt;
      &lt;span class=&quot;slide__id-item&quot;&gt;
        &lt;img class=&quot;slide__avatar&quot; src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;raffael-portrait.png&quot; alt=&quot;Raffael Schneider&quot; width=&quot;64&quot; height=&quot;64&quot;&gt;
        &lt;span class=&quot;slide__id-text&quot;&gt;
          &lt;span class=&quot;slide__id-name&quot;&gt;Raffael Schneider&lt;&#x2F;span&gt;
          &lt;span class=&quot;slide__id-role&quot;&gt;Enterprise Solution Architect&lt;&#x2F;span&gt;
          &lt;span class=&quot;slide__id-org&quot;&gt;AI Platforms&lt;&#x2F;span&gt;
        &lt;&#x2F;span&gt;
      &lt;&#x2F;span&gt;
      &lt;span class=&quot;slide__id-sep&quot; aria-hidden=&quot;true&quot;&gt;·&lt;&#x2F;span&gt;
      &lt;span class=&quot;slide__id-item slide__id-item--brand&quot;&gt;
        &lt;img class=&quot;slide__avatar slide__avatar--brand&quot; src=&quot;&#x2F;raskell-mascot.avif&quot; alt=&quot;Raskell tanuki&quot; width=&quot;64&quot; height=&quot;64&quot;&gt;
        &lt;a href=&quot;https:&#x2F;&#x2F;raskell.io&quot;&gt;raskell.io&lt;&#x2F;a&gt;
      &lt;&#x2F;span&gt;
    &lt;&#x2F;p&gt;
  &lt;&#x2F;div&gt;
&lt;&#x2F;section&gt;
&lt;section class=&quot;slide&quot;&gt;
  &lt;div class=&quot;slide__inner&quot;&gt;
    &lt;p class=&quot;slide__eyebrow&quot;&gt;01&lt;&#x2F;p&gt;
    &lt;h2 class=&quot;slide__title&quot;&gt;It came down to three subscriptions.&lt;&#x2F;h2&gt;
    &lt;p class=&quot;slide__subline&quot;&gt;December 2022 → November&amp;nbsp;2025&lt;&#x2F;p&gt;
    &lt;ol class=&quot;slide__timeline&quot;&gt;
      &lt;li&gt;
        &lt;span class=&quot;slide__timeline-date&quot;&gt;December 2022&lt;&#x2F;span&gt;
        &lt;p&gt;&lt;strong&gt;OpenAI.&lt;&#x2F;strong&gt; Subscription almost day one. Chat-first AI joins my daily workflow and stays there.&lt;&#x2F;p&gt;
      &lt;&#x2F;li&gt;
      &lt;li&gt;
        &lt;span class=&quot;slide__timeline-date&quot;&gt;Then Zed&lt;&#x2F;span&gt;
        &lt;p&gt;A Rust-built editor that bet on AI very early. The best LLM integrations of its era. My gateway to AI-assisted development while Cursor, Windsurf, and Copilot were finding their feet.&lt;&#x2F;p&gt;
      &lt;&#x2F;li&gt;
      &lt;li&gt;
        &lt;span class=&quot;slide__timeline-date&quot;&gt;May 2025&lt;&#x2F;span&gt;
        &lt;p&gt;&lt;strong&gt;Claude Code drops.&lt;&#x2F;strong&gt; First TUI-first coding agent. I take the $20 Anthropic subscription the day it ships. Hooked.&lt;&#x2F;p&gt;
      &lt;&#x2F;li&gt;
      &lt;li&gt;
        &lt;span class=&quot;slide__timeline-date&quot;&gt;November 2025&lt;&#x2F;span&gt;
        &lt;p&gt;&lt;strong&gt;Opus 4.5 ships.&lt;&#x2F;strong&gt; I upgrade to the $200 Max 20× plan right before Christmas vacation. Three weeks of free time meets a step-change in capability.&lt;&#x2F;p&gt;
      &lt;&#x2F;li&gt;
    &lt;&#x2F;ol&gt;
    &lt;p class=&quot;slide__kicker&quot;&gt;I am an engineer. I know how to build things.&lt;&#x2F;p&gt;
    &lt;p class=&quot;slide__kicker slide__kicker--strong&quot;&gt;The bottleneck was never ideas. The bottleneck was always time.&lt;&#x2F;p&gt;
  &lt;&#x2F;div&gt;
&lt;&#x2F;section&gt;
&lt;section class=&quot;slide&quot;&gt;
  &lt;div class=&quot;slide__inner&quot;&gt;
    &lt;p class=&quot;slide__eyebrow&quot;&gt;02&lt;&#x2F;p&gt;
    &lt;h2 class=&quot;slide__title&quot;&gt;Then something changed.&lt;&#x2F;h2&gt;
    &lt;div class=&quot;slide__stack&quot;&gt;
      &lt;div class=&quot;slide__pill&quot;&gt;
        &lt;img class=&quot;slide__pill-icon slide__pill-icon--brand&quot; src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;claude.svg&quot; alt=&quot;&quot; width=&quot;32&quot; height=&quot;32&quot;&gt;
        Claude Opus 4.5
      &lt;&#x2F;div&gt;
      &lt;div class=&quot;slide__pill&quot;&gt;
        &lt;img class=&quot;slide__pill-icon slide__pill-icon--brand&quot; src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;claude-code.png&quot; alt=&quot;&quot; width=&quot;32&quot; height=&quot;32&quot;&gt;
        Claude Code
      &lt;&#x2F;div&gt;
    &lt;&#x2F;div&gt;
    &lt;p class=&quot;slide__quote&quot;&gt;
      I expected a better autocomplete.&lt;br&gt;
      What I got was something closer to a junior engineer that never sleeps.
    &lt;&#x2F;p&gt;
  &lt;&#x2F;div&gt;
&lt;&#x2F;section&gt;
&lt;section class=&quot;slide&quot;&gt;
  &lt;div class=&quot;slide__inner&quot;&gt;
    &lt;p class=&quot;slide__eyebrow&quot;&gt;03&lt;&#x2F;p&gt;
    &lt;h2 class=&quot;slide__title&quot;&gt;The new workflow&lt;&#x2F;h2&gt;
    &lt;div class=&quot;slide__compare&quot;&gt;
      &lt;div class=&quot;slide__compare-col&quot;&gt;
        &lt;p class=&quot;slide__compare-label&quot;&gt;Before&lt;&#x2F;p&gt;
        &lt;ul&gt;
          &lt;li&gt;Decide what to build&lt;&#x2F;li&gt;
          &lt;li class=&quot;is-heavy&quot;&gt;Translate thoughts into syntax&lt;&#x2F;li&gt;
          &lt;li&gt;Review and ship&lt;&#x2F;li&gt;
        &lt;&#x2F;ul&gt;
      &lt;&#x2F;div&gt;
      &lt;div class=&quot;slide__compare-col&quot;&gt;
        &lt;p class=&quot;slide__compare-label&quot;&gt;After&lt;&#x2F;p&gt;
        &lt;ul&gt;
          &lt;li class=&quot;is-heavy&quot;&gt;Decide what to build&lt;&#x2F;li&gt;
          &lt;li&gt;Translate thoughts into syntax&lt;&#x2F;li&gt;
          &lt;li class=&quot;is-heavy&quot;&gt;Decide whether the result is acceptable&lt;&#x2F;li&gt;
        &lt;&#x2F;ul&gt;
      &lt;&#x2F;div&gt;
    &lt;&#x2F;div&gt;
    &lt;p class=&quot;slide__kicker slide__kicker--strong&quot;&gt;
      You no longer spend most of your time translating thoughts into syntax.
      You spend most of your time deciding.
    &lt;&#x2F;p&gt;
  &lt;&#x2F;div&gt;
&lt;&#x2F;section&gt;
&lt;section class=&quot;slide&quot;&gt;
  &lt;div class=&quot;slide__inner&quot;&gt;
    &lt;p class=&quot;slide__eyebrow&quot;&gt;04&lt;&#x2F;p&gt;
    &lt;h2 class=&quot;slide__title&quot;&gt;What I actually built&lt;&#x2F;h2&gt;
    &lt;div class=&quot;slide__grid slide__grid--icons&quot;&gt;
      &lt;a class=&quot;slide__card slide__card--linked&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
        &lt;img class=&quot;slide__card-icon&quot; src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;archipelag.png&quot; alt=&quot;&quot; width=&quot;48&quot; height=&quot;48&quot; loading=&quot;lazy&quot;&gt;
        &lt;div&gt;
          &lt;h3&gt;Archipelag.io&lt;&#x2F;h3&gt;
          &lt;p&gt;Distributed AI compute. Inference across idle GPUs and mining rigs.&lt;&#x2F;p&gt;
        &lt;&#x2F;div&gt;
      &lt;&#x2F;a&gt;
      &lt;a class=&quot;slide__card slide__card--linked&quot; href=&quot;https:&#x2F;&#x2F;cyanea.bio&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
        &lt;img class=&quot;slide__card-icon&quot; src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;cyanea.png&quot; alt=&quot;&quot; width=&quot;48&quot; height=&quot;48&quot; loading=&quot;lazy&quot;&gt;
        &lt;div&gt;
          &lt;h3&gt;Cyanea&lt;&#x2F;h3&gt;
          &lt;p&gt;Open community platform for life-science research.&lt;&#x2F;p&gt;
        &lt;&#x2F;div&gt;
      &lt;&#x2F;a&gt;
      &lt;a class=&quot;slide__card slide__card--linked&quot; href=&quot;https:&#x2F;&#x2F;humankind.plus&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
        &lt;img class=&quot;slide__card-icon&quot; src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;humankind.png&quot; alt=&quot;&quot; width=&quot;48&quot; height=&quot;48&quot; loading=&quot;lazy&quot;&gt;
        &lt;div&gt;
          &lt;h3&gt;Humankind&lt;&#x2F;h3&gt;
          &lt;p&gt;A long-running creative initiative, finally with a home of its own.&lt;&#x2F;p&gt;
        &lt;&#x2F;div&gt;
      &lt;&#x2F;a&gt;
      &lt;a class=&quot;slide__card slide__card--linked&quot; href=&quot;https:&#x2F;&#x2F;arcanist.sh&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
        &lt;img class=&quot;slide__card-icon&quot; src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;arcanist.svg&quot; alt=&quot;&quot; width=&quot;48&quot; height=&quot;48&quot; loading=&quot;lazy&quot;&gt;
        &lt;div&gt;
          &lt;h3&gt;Arcanist.sh&lt;&#x2F;h3&gt;
          &lt;p&gt;Haskell ecosystem in Rust. Home of &lt;strong&gt;hx&lt;&#x2F;strong&gt; and the &lt;strong&gt;Basel Haskell Compiler&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
        &lt;&#x2F;div&gt;
      &lt;&#x2F;a&gt;
      &lt;a class=&quot;slide__card slide__card--linked&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
        &lt;img class=&quot;slide__card-icon&quot; src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;zentinel.png&quot; alt=&quot;&quot; width=&quot;48&quot; height=&quot;48&quot; loading=&quot;lazy&quot;&gt;
        &lt;div&gt;
          &lt;h3&gt;Zentinel&lt;&#x2F;h3&gt;
          &lt;p&gt;Security-first reverse proxy on Pingora.&lt;&#x2F;p&gt;
        &lt;&#x2F;div&gt;
      &lt;&#x2F;a&gt;
      &lt;a class=&quot;slide__card slide__card--linked&quot; href=&quot;https:&#x2F;&#x2F;die-zukunft.ch&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;
        &lt;img class=&quot;slide__card-icon&quot; src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;die-zukunft.png&quot; alt=&quot;&quot; width=&quot;48&quot; height=&quot;48&quot; loading=&quot;lazy&quot;&gt;
        &lt;div&gt;
          &lt;h3&gt;Die Zukunft&lt;&#x2F;h3&gt;
          &lt;p&gt;A Swiss political party. Past left and right, into UBI, digital sovereignty, and a serious technology agenda.&lt;&#x2F;p&gt;
        &lt;&#x2F;div&gt;
      &lt;&#x2F;a&gt;
    &lt;&#x2F;div&gt;
    &lt;p class=&quot;slide__minor-label&quot;&gt;Experiments and tooling&lt;&#x2F;p&gt;
    &lt;ul class=&quot;slide__minor-chips&quot;&gt;
      &lt;li&gt;&lt;img src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;terrarium.png&quot; alt=&quot;&quot; width=&quot;20&quot; height=&quot;20&quot;&gt; Terrarium&lt;&#x2F;li&gt;
      &lt;li&gt;&lt;svg class=&quot;slide__minor-icon&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;currentColor&quot; aria-hidden=&quot;true&quot;&gt;&lt;path d=&quot;M12 .5C5.65.5.5 5.65.5 12c0 5.08 3.29 9.39 7.86 10.91.58.11.79-.25.79-.55 0-.27-.01-1.18-.02-2.14-3.2.69-3.87-1.36-3.87-1.36-.52-1.33-1.27-1.69-1.27-1.69-1.04-.71.08-.7.08-.7 1.15.08 1.76 1.18 1.76 1.18 1.02 1.75 2.68 1.24 3.34.95.1-.74.4-1.24.73-1.53-2.55-.29-5.24-1.28-5.24-5.69 0-1.26.45-2.29 1.18-3.1-.12-.29-.51-1.46.11-3.05 0 0 .97-.31 3.17 1.18a11 11 0 0 1 5.76 0c2.2-1.49 3.17-1.18 3.17-1.18.62 1.59.23 2.76.11 3.05.74.81 1.18 1.84 1.18 3.1 0 4.42-2.69 5.39-5.25 5.68.41.36.78 1.06.78 2.14 0 1.55-.01 2.79-.01 3.17 0 .31.21.67.8.55C20.21 21.39 23.5 17.08 23.5 12 23.5 5.65 18.35.5 12 .5z&quot;&#x2F;&gt;&lt;&#x2F;svg&gt; Shiioo&lt;&#x2F;li&gt;
      &lt;li&gt;&lt;svg class=&quot;slide__minor-icon&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;currentColor&quot; aria-hidden=&quot;true&quot;&gt;&lt;path d=&quot;M12 .5C5.65.5.5 5.65.5 12c0 5.08 3.29 9.39 7.86 10.91.58.11.79-.25.79-.55 0-.27-.01-1.18-.02-2.14-3.2.69-3.87-1.36-3.87-1.36-.52-1.33-1.27-1.69-1.27-1.69-1.04-.71.08-.7.08-.7 1.15.08 1.76 1.18 1.76 1.18 1.02 1.75 2.68 1.24 3.34.95.1-.74.4-1.24.73-1.53-2.55-.29-5.24-1.28-5.24-5.69 0-1.26.45-2.29 1.18-3.1-.12-.29-.51-1.46.11-3.05 0 0 .97-.31 3.17 1.18a11 11 0 0 1 5.76 0c2.2-1.49 3.17-1.18 3.17-1.18.62 1.59.23 2.76.11 3.05.74.81 1.18 1.84 1.18 3.1 0 4.42-2.69 5.39-5.25 5.68.41.36.78 1.06.78 2.14 0 1.55-.01 2.79-.01 3.17 0 .31.21.67.8.55C20.21 21.39 23.5 17.08 23.5 12 23.5 5.65 18.35.5 12 .5z&quot;&#x2F;&gt;&lt;&#x2F;svg&gt; Conflux&lt;&#x2F;li&gt;
      &lt;li&gt;&lt;svg class=&quot;slide__minor-icon&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;currentColor&quot; aria-hidden=&quot;true&quot;&gt;&lt;path d=&quot;M12 .5C5.65.5.5 5.65.5 12c0 5.08 3.29 9.39 7.86 10.91.58.11.79-.25.79-.55 0-.27-.01-1.18-.02-2.14-3.2.69-3.87-1.36-3.87-1.36-.52-1.33-1.27-1.69-1.27-1.69-1.04-.71.08-.7.08-.7 1.15.08 1.76 1.18 1.76 1.18 1.02 1.75 2.68 1.24 3.34.95.1-.74.4-1.24.73-1.53-2.55-.29-5.24-1.28-5.24-5.69 0-1.26.45-2.29 1.18-3.1-.12-.29-.51-1.46.11-3.05 0 0 .97-.31 3.17 1.18a11 11 0 0 1 5.76 0c2.2-1.49 3.17-1.18 3.17-1.18.62 1.59.23 2.76.11 3.05.74.81 1.18 1.84 1.18 3.1 0 4.42-2.69 5.39-5.25 5.68.41.36.78 1.06.78 2.14 0 1.55-.01 2.79-.01 3.17 0 .31.21.67.8.55C20.21 21.39 23.5 17.08 23.5 12 23.5 5.65 18.35.5 12 .5z&quot;&#x2F;&gt;&lt;&#x2F;svg&gt; Refrakt&lt;&#x2F;li&gt;
      &lt;li&gt;&lt;img src=&quot;&#x2F;talks&#x2F;iat-13-basel&#x2F;kurumi.png&quot; alt=&quot;&quot; width=&quot;20&quot; height=&quot;20&quot;&gt; Kurumi&lt;&#x2F;li&gt;
      &lt;li&gt;&lt;svg class=&quot;slide__minor-icon&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;currentColor&quot; aria-hidden=&quot;true&quot;&gt;&lt;path d=&quot;M12 .5C5.65.5.5 5.65.5 12c0 5.08 3.29 9.39 7.86 10.91.58.11.79-.25.79-.55 0-.27-.01-1.18-.02-2.14-3.2.69-3.87-1.36-3.87-1.36-.52-1.33-1.27-1.69-1.27-1.69-1.04-.71.08-.7.08-.7 1.15.08 1.76 1.18 1.76 1.18 1.02 1.75 2.68 1.24 3.34.95.1-.74.4-1.24.73-1.53-2.55-.29-5.24-1.28-5.24-5.69 0-1.26.45-2.29 1.18-3.1-.12-.29-.51-1.46.11-3.05 0 0 .97-.31 3.17 1.18a11 11 0 0 1 5.76 0c2.2-1.49 3.17-1.18 3.17-1.18.62 1.59.23 2.76.11 3.05.74.81 1.18 1.84 1.18 3.1 0 4.42-2.69 5.39-5.25 5.68.41.36.78 1.06.78 2.14 0 1.55-.01 2.79-.01 3.17 0 .31.21.67.8.55C20.21 21.39 23.5 17.08 23.5 12 23.5 5.65 18.35.5 12 .5z&quot;&#x2F;&gt;&lt;&#x2F;svg&gt; Vela&lt;&#x2F;li&gt;
      &lt;li&gt;…and the daily-driver utilities&lt;&#x2F;li&gt;
    &lt;&#x2F;ul&gt;
    &lt;p class=&quot;slide__kicker slide__kicker--strong&quot;&gt;
      None of these were started because AI generated ideas.
      AI allowed existing ideas to escape my notebook.
    &lt;&#x2F;p&gt;
  &lt;&#x2F;div&gt;
&lt;&#x2F;section&gt;
&lt;section class=&quot;slide&quot;&gt;
  &lt;div class=&quot;slide__inner&quot;&gt;
    &lt;p class=&quot;slide__eyebrow&quot;&gt;05&lt;&#x2F;p&gt;
    &lt;h2 class=&quot;slide__title&quot;&gt;Where the human still steers.&lt;&#x2F;h2&gt;
    &lt;ul class=&quot;slide__list&quot;&gt;
      &lt;li&gt;Requirements&lt;&#x2F;li&gt;
      &lt;li&gt;Architecture tradeoffs&lt;&#x2F;li&gt;
      &lt;li&gt;Long-term maintainability&lt;&#x2F;li&gt;
      &lt;li&gt;Taste&lt;&#x2F;li&gt;
      &lt;li&gt;Product decisions&lt;&#x2F;li&gt;
      &lt;li&gt;Knowing when not to build something&lt;&#x2F;li&gt;
    &lt;&#x2F;ul&gt;
    &lt;p class=&quot;slide__kicker slide__kicker--strong&quot;&gt;
      The human did not disappear. The human moved up the stack.
    &lt;&#x2F;p&gt;
    &lt;svg class=&quot;slide__diagram&quot; viewBox=&quot;0 0 1200 320&quot; xmlns=&quot;http:&#x2F;&#x2F;www.w3.org&#x2F;2000&#x2F;svg&quot; role=&quot;img&quot; aria-labelledby=&quot;diag05-title&quot;&gt;
      &lt;title id=&quot;diag05-title&quot;&gt;Before: the human reasoned about syntax; the compiler enforced it. Now: the human reasons about meaning; the coding agent interprets it.&lt;&#x2F;title&gt;
      &lt;defs&gt;
        &lt;marker id=&quot;diag05-arrow&quot; viewBox=&quot;0 0 10 10&quot; refX=&quot;8&quot; refY=&quot;5&quot; markerWidth=&quot;7&quot; markerHeight=&quot;7&quot; orient=&quot;auto&quot;&gt;
          &lt;path d=&quot;M0 0 L10 5 L0 10 z&quot; fill=&quot;#a8acba&quot;&#x2F;&gt;
        &lt;&#x2F;marker&gt;
      &lt;&#x2F;defs&gt;
      &lt;text x=&quot;22&quot; y=&quot;89&quot; class=&quot;slide__diagram-row-label&quot;&gt;Before&lt;&#x2F;text&gt;
      &lt;g class=&quot;slide__diagram-node&quot;&gt;
        &lt;rect x=&quot;140&quot; y=&quot;55&quot; width=&quot;240&quot; height=&quot;70&quot; rx=&quot;14&quot;&#x2F;&gt;
        &lt;text x=&quot;260&quot; y=&quot;86&quot;&gt;Human&lt;&#x2F;text&gt;
        &lt;text x=&quot;260&quot; y=&quot;108&quot; class=&quot;slide__diagram-node-sub&quot;&gt;reasons about&lt;&#x2F;text&gt;
      &lt;&#x2F;g&gt;
      &lt;line x1=&quot;395&quot; y1=&quot;90&quot; x2=&quot;465&quot; y2=&quot;90&quot; class=&quot;slide__diagram-arrow&quot; marker-end=&quot;url(#diag05-arrow)&quot;&#x2F;&gt;
      &lt;g class=&quot;slide__diagram-node slide__diagram-node--soft&quot;&gt;
        &lt;rect x=&quot;480&quot; y=&quot;55&quot; width=&quot;240&quot; height=&quot;70&quot; rx=&quot;14&quot;&#x2F;&gt;
        &lt;text x=&quot;600&quot; y=&quot;86&quot;&gt;Syntax&lt;&#x2F;text&gt;
        &lt;text x=&quot;600&quot; y=&quot;108&quot; class=&quot;slide__diagram-node-sub&quot;&gt;enforced by&lt;&#x2F;text&gt;
      &lt;&#x2F;g&gt;
      &lt;line x1=&quot;735&quot; y1=&quot;90&quot; x2=&quot;805&quot; y2=&quot;90&quot; class=&quot;slide__diagram-arrow&quot; marker-end=&quot;url(#diag05-arrow)&quot;&#x2F;&gt;
      &lt;g class=&quot;slide__diagram-node&quot;&gt;
        &lt;rect x=&quot;820&quot; y=&quot;55&quot; width=&quot;240&quot; height=&quot;70&quot; rx=&quot;14&quot;&#x2F;&gt;
        &lt;text x=&quot;940&quot; y=&quot;86&quot;&gt;Compiler&lt;&#x2F;text&gt;
        &lt;text x=&quot;940&quot; y=&quot;108&quot; class=&quot;slide__diagram-node-sub&quot;&gt;that makes it work&lt;&#x2F;text&gt;
      &lt;&#x2F;g&gt;
      &lt;text x=&quot;22&quot; y=&quot;239&quot; class=&quot;slide__diagram-row-label&quot;&gt;Now&lt;&#x2F;text&gt;
      &lt;g class=&quot;slide__diagram-node&quot;&gt;
        &lt;rect x=&quot;140&quot; y=&quot;205&quot; width=&quot;240&quot; height=&quot;70&quot; rx=&quot;14&quot;&#x2F;&gt;
        &lt;text x=&quot;260&quot; y=&quot;236&quot;&gt;Human&lt;&#x2F;text&gt;
        &lt;text x=&quot;260&quot; y=&quot;258&quot; class=&quot;slide__diagram-node-sub&quot;&gt;reasons about&lt;&#x2F;text&gt;
      &lt;&#x2F;g&gt;
      &lt;line x1=&quot;395&quot; y1=&quot;240&quot; x2=&quot;465&quot; y2=&quot;240&quot; class=&quot;slide__diagram-arrow&quot; marker-end=&quot;url(#diag05-arrow)&quot;&#x2F;&gt;
      &lt;g class=&quot;slide__diagram-node slide__diagram-node--accent&quot;&gt;
        &lt;rect x=&quot;480&quot; y=&quot;205&quot; width=&quot;240&quot; height=&quot;70&quot; rx=&quot;14&quot;&#x2F;&gt;
        &lt;text x=&quot;600&quot; y=&quot;236&quot;&gt;Meaning&lt;&#x2F;text&gt;
        &lt;text x=&quot;600&quot; y=&quot;258&quot; class=&quot;slide__diagram-node-sub&quot;&gt;interpreted by&lt;&#x2F;text&gt;
      &lt;&#x2F;g&gt;
      &lt;line x1=&quot;735&quot; y1=&quot;240&quot; x2=&quot;805&quot; y2=&quot;240&quot; class=&quot;slide__diagram-arrow&quot; marker-end=&quot;url(#diag05-arrow)&quot;&#x2F;&gt;
      &lt;g class=&quot;slide__diagram-node slide__diagram-node--accent&quot;&gt;
        &lt;rect x=&quot;820&quot; y=&quot;205&quot; width=&quot;240&quot; height=&quot;70&quot; rx=&quot;14&quot;&#x2F;&gt;
        &lt;text x=&quot;940&quot; y=&quot;236&quot;&gt;Coding Agent&lt;&#x2F;text&gt;
        &lt;text x=&quot;940&quot; y=&quot;258&quot; class=&quot;slide__diagram-node-sub&quot;&gt;that makes it work&lt;&#x2F;text&gt;
      &lt;&#x2F;g&gt;
    &lt;&#x2F;svg&gt;
  &lt;&#x2F;div&gt;
&lt;&#x2F;section&gt;
&lt;section class=&quot;slide slide--closing&quot;&gt;
  &lt;div class=&quot;slide__inner&quot;&gt;
    &lt;p class=&quot;slide__eyebrow&quot;&gt;06&lt;&#x2F;p&gt;
    &lt;h2 class=&quot;slide__title&quot;&gt;The unexpected consequence&lt;&#x2F;h2&gt;
    &lt;p class=&quot;slide__quote slide__quote--xl&quot;&gt;
      I thought AI would make me write software faster.&lt;br&gt;
      Instead, it made me start projects I would never have started before.
    &lt;&#x2F;p&gt;
    &lt;p class=&quot;slide__kicker slide__kicker--strong&quot;&gt;
      The biggest change was not productivity. The biggest change was willingness.
    &lt;&#x2F;p&gt;
  &lt;&#x2F;div&gt;
&lt;&#x2F;section&gt;
&lt;section class=&quot;slide&quot;&gt;
  &lt;div class=&quot;slide__inner&quot;&gt;
    &lt;p class=&quot;slide__eyebrow&quot;&gt;07&lt;&#x2F;p&gt;
    &lt;h2 class=&quot;slide__title&quot;&gt;What happened next&lt;&#x2F;h2&gt;
    &lt;ol class=&quot;slide__timeline&quot;&gt;
      &lt;li&gt;
        &lt;span class=&quot;slide__timeline-date&quot;&gt;March 2026 · San Francisco&lt;&#x2F;span&gt;
        &lt;p&gt;Co-presented an AI Web Security tool at &lt;strong&gt;RSAC 2026&lt;&#x2F;strong&gt; with Milan Duric. Same trip: a private downtown dinner hosted by &lt;strong&gt;Maverick Capital&lt;&#x2F;strong&gt;, room of AI Platform leaders.&lt;&#x2F;p&gt;
      &lt;&#x2F;li&gt;
      &lt;li&gt;
        &lt;span class=&quot;slide__timeline-date&quot;&gt;Spring 2026&lt;&#x2F;span&gt;
        &lt;p&gt;Shipped a &lt;strong&gt;Temporal-based self-service&lt;&#x2F;strong&gt; that lets users order web-service resources while AI quietly handles the operational work. Role at the time: Senior Platform Engineer &amp; Solution Architect, Web Security &#x2F; Application Delivery.&lt;&#x2F;p&gt;
      &lt;&#x2F;li&gt;
      &lt;li&gt;
        &lt;span class=&quot;slide__timeline-date&quot;&gt;Two weeks ago&lt;&#x2F;span&gt;
        &lt;p&gt;New role: &lt;strong&gt;Enterprise Solution Architect&lt;&#x2F;strong&gt; for Group-wide AI Platform and AI-assisted development.&lt;&#x2F;p&gt;
      &lt;&#x2F;li&gt;
      &lt;li&gt;
        &lt;span class=&quot;slide__timeline-date&quot;&gt;Last week&lt;&#x2F;span&gt;
        &lt;p&gt;Joined &lt;a href=&quot;https:&#x2F;&#x2F;thejfloor.com&#x2F;&quot;&gt;&lt;strong&gt;J floor&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;. More talks already on the calendar.&lt;&#x2F;p&gt;
      &lt;&#x2F;li&gt;
    &lt;&#x2F;ol&gt;
    &lt;p class=&quot;slide__kicker slide__kicker--strong&quot;&gt;
      None of this was on my calendar a year ago.
    &lt;&#x2F;p&gt;
  &lt;&#x2F;div&gt;
&lt;&#x2F;section&gt;
&lt;section class=&quot;slide slide--end&quot;&gt;
  &lt;div class=&quot;slide__inner&quot;&gt;
    &lt;p class=&quot;slide__eyebrow&quot;&gt;Thank you&lt;&#x2F;p&gt;
    &lt;p class=&quot;slide__quote slide__quote--final&quot;&gt;
      The most important thing Claude Code gave me was not faster software development.
      It gave me the ability to turn years of accumulated ideas into working systems
      before I lost interest in them.
    &lt;&#x2F;p&gt;
    &lt;p class=&quot;slide__byline slide__byline--icons&quot;&gt;
      &lt;a href=&quot;https:&#x2F;&#x2F;raskell.io&quot;&gt;
        &lt;img class=&quot;slide__byline-icon slide__byline-icon--img&quot; src=&quot;&#x2F;raskell-mascot.avif&quot; alt=&quot;&quot; width=&quot;20&quot; height=&quot;20&quot;&gt;
        raskell.io
      &lt;&#x2F;a&gt;
      &lt;span class=&quot;slide__byline-sep&quot; aria-hidden=&quot;true&quot;&gt;·&lt;&#x2F;span&gt;
      &lt;a href=&quot;https:&#x2F;&#x2F;ch.linkedin.com&#x2F;in&#x2F;raffael-e-schneider&quot;&gt;
        &lt;svg class=&quot;slide__byline-icon&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;currentColor&quot; aria-hidden=&quot;true&quot;&gt;&lt;path d=&quot;M20.447 20.452h-3.554v-5.569c0-1.328-.027-3.037-1.852-3.037-1.853 0-2.136 1.445-2.136 2.939v5.667H9.351V9h3.414v1.561h.046c.477-.9 1.637-1.85 3.37-1.85 3.601 0 4.267 2.37 4.267 5.455v6.286zM5.337 7.433c-1.144 0-2.063-.926-2.063-2.065 0-1.138.92-2.063 2.063-2.063 1.14 0 2.064.925 2.064 2.063 0 1.139-.925 2.065-2.064 2.065zm1.782 13.019H3.555V9h3.564v11.452zM22.225 0H1.771C.792 0 0 .774 0 1.729v20.542C0 23.227.792 24 1.771 24h20.451C23.2 24 24 23.227 24 22.271V1.729C24 .774 23.2 0 22.222 0h.003z&quot;&#x2F;&gt;&lt;&#x2F;svg&gt;
        raffael-e-schneider
      &lt;&#x2F;a&gt;
      &lt;span class=&quot;slide__byline-sep&quot; aria-hidden=&quot;true&quot;&gt;·&lt;&#x2F;span&gt;
      &lt;a href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;raskelll&quot;&gt;
        &lt;svg class=&quot;slide__byline-icon&quot; viewBox=&quot;0 0 24 24&quot; fill=&quot;currentColor&quot; aria-hidden=&quot;true&quot;&gt;&lt;path d=&quot;M18.244 2.25h3.308l-7.227 8.26 8.502 11.24H16.17l-5.214-6.817L4.99 21.75H1.68l7.73-8.835L1.254 2.25H8.08l4.713 6.231zm-1.161 17.52h1.833L7.084 4.126H5.117z&quot;&#x2F;&gt;&lt;&#x2F;svg&gt;
        @raskelll
      &lt;&#x2F;a&gt;
    &lt;&#x2F;p&gt;
  &lt;&#x2F;div&gt;
&lt;&#x2F;section&gt;
</description>
      </item>
      <item>
          <title>Tools: How Agents Actually Do Things</title>
          <pubDate>Tue, 09 Jun 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/tools-how-agents-actually-do-things/</link>
          <guid>https://raskell.io/articles/tools-how-agents-actually-do-things/</guid>
          <description xml:base="https://raskell.io/articles/tools-how-agents-actually-do-things/">&lt;blockquote&gt;
&lt;p&gt;Part 3 of &lt;em&gt;The Agent Platform Handbook. From Loop to Platform.&lt;&#x2F;em&gt; Previous: &lt;a href=&quot;&#x2F;articles&#x2F;your-agent-wants-root&#x2F;&quot;&gt;Your Agent Wants Root&lt;&#x2F;a&gt;. Next: Context Is the Product.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;In &lt;a href=&quot;&#x2F;articles&#x2F;what-an-agent-actually-is&#x2F;&quot;&gt;post one&lt;&#x2F;a&gt; we built the agent harness: a loop, a one-tool registry, a system prompt, a dispatcher, an iteration budget. In &lt;a href=&quot;&#x2F;articles&#x2F;your-agent-wants-root&#x2F;&quot;&gt;post two&lt;&#x2F;a&gt; we slid a sandboxed runtime under the shell tool without touching the loop. The harness stands today at tag &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&#x2F;tree&#x2F;post-02&quot;&gt;&lt;code&gt;post-02&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; of &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&quot;&gt;&lt;code&gt;the-agent-platform-handbook&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;: four files, one tool, fenced. The loop works. The runtime is fenced. The agent is still mostly useless, because one tool means the model can either run a shell command or do nothing. Every real agent has a toolbox.&lt;&#x2F;p&gt;
&lt;p&gt;This is the toolbox post. We will extend the same harness with three more tools (&lt;code&gt;fs_read&lt;&#x2F;code&gt;, &lt;code&gt;http_get&lt;&#x2F;code&gt;, &lt;code&gt;git&lt;&#x2F;code&gt;), promote the one-tool registry into a real one, handle parallel tool calls, and look at the failure modes that show up the first time the model has more than one thing to pick from. The diff lands as tag &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&#x2F;tree&#x2F;post-03&quot;&gt;&lt;code&gt;post-03&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; in the same repo. The agent loop’s overall shape, the system prompt structure, and the iteration budget do not change.&lt;&#x2F;p&gt;
&lt;p&gt;The interesting work in this post is not the tools themselves. It is the registry, the schemas, and the contract between what the model sees and what your code runs. Get those right and adding a fifth tool is a one-file change. Get them wrong and you will spend the next quarter retraining users on a registry the model cannot navigate.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-we-got-here&quot;&gt;How we got here&lt;&#x2F;h2&gt;
&lt;p&gt;For about a year, the way you let a language model call a function was to ask it nicely.&lt;&#x2F;p&gt;
&lt;p&gt;The ReAct paper in October 2022 sketched the loop in pseudocode. The first implementations, including the early LangChain releases that month, made it real by parsing the model’s prose output. You instructed the model to write &lt;code&gt;Action: search&lt;&#x2F;code&gt; on one line, &lt;code&gt;Action Input: &quot;what is X&quot;&lt;&#x2F;code&gt; on the next, then stopped generation on a token like &lt;code&gt;Observation:&lt;&#x2F;code&gt; and used the rest of the lines verbatim. It worked. It also broke whenever the model felt creative, whenever the user’s question contained the stop token, whenever the prompt accidentally taught the model a slightly different format.&lt;&#x2F;p&gt;
&lt;p&gt;Then on June 13, 2023, OpenAI shipped function calling. You declared your tools with a JSON Schema. The model returned a structured object with &lt;code&gt;name&lt;&#x2F;code&gt; and &lt;code&gt;arguments&lt;&#x2F;code&gt;. No more parsing prose. No more stop tokens. The reliability gap between “this works in the demo” and “this works on Tuesday morning” closed by an order of magnitude in a single release. Anthropic shipped &lt;code&gt;tool_use&lt;&#x2F;code&gt; shortly after on the same shape, and structured outputs (constrained decoding that guarantees the model emits valid JSON for a given schema) followed in late 2024.&lt;&#x2F;p&gt;
&lt;p&gt;The Model Context Protocol, also from Anthropic, arrived in November 2024 and added a transport layer for tools so they could live in a separate process or a separate machine. The calling convention did not change. MCP just gave the registry a network. We will spend &lt;a href=&quot;#&quot;&gt;post eight&lt;&#x2F;a&gt; on MCP specifically.&lt;&#x2F;p&gt;
&lt;p&gt;The lesson from this lineage is one sentence. The model talks JSON now. The work that remains is the work you control: the registry, the schemas, the contract for what happens after the call. That is what this post is about.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-mental-model&quot;&gt;The mental model&lt;&#x2F;h2&gt;
&lt;p&gt;A tool layer has three pieces. The model picks. The registry resolves. The handler runs.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;The tool layer of an agent&quot;&gt;+-------------------+
                |      Model        |
                | &amp;quot;I want to call   |
                |  fs_read with     |
                |  path=&amp;#x2F;etc&amp;#x2F;hosts&amp;quot; |
                +---------+---------+
                          |
                          | tool_use block
                          v
                +-------------------+         schema list
                |  Tool registry    | &amp;lt;-----  the model sees
                | name -&amp;gt; handler   |
                +---------+---------+
                          |
                          | dispatch
                          v
                +-------------------+
                |     Handler       |
                | side effect runs  |
                | (in the sandbox)  |
                +---------+---------+
                          |
                          | { ok, value | error }
                          v
                +-------------------+
                |  tool_result      | ---&amp;gt;  back into context
                +-------------------+&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;What the model sees and what the handler runs are decoupled. The model sees a name, a description, and a JSON schema for inputs. The handler sees parsed arguments and returns a result string. The registry is the seam. Every tool engineering decision in this post is about that seam.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;build-the-registry&quot;&gt;Build the registry&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a href=&quot;&#x2F;articles&#x2F;your-agent-wants-root&#x2F;&quot;&gt;Post two&lt;&#x2F;a&gt; left the harness with a single &lt;code&gt;Tool&lt;&#x2F;code&gt; type in &lt;code&gt;types.ts&lt;&#x2F;code&gt; and one implementation in &lt;code&gt;tools.ts&lt;&#x2F;code&gt;. Three additions turn that into a real registry: a tagged-union &lt;code&gt;ToolResult&lt;&#x2F;code&gt; so errors flow as data and not exceptions, a &lt;code&gt;max_output_bytes&lt;&#x2F;code&gt; field so a 50 MB log file does not blow the context window, and a new &lt;code&gt;registry.ts&lt;&#x2F;code&gt; so the loop does not care how many tools exist. We also reorganize the tools onto their own subdirectory now that there is more than one of them. The post-03 tree looks like this.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;the-agent-platform-handbook&amp;#x2F;
├── agent.ts          # loop and dispatch (rewritten)
├── registry.ts       # new
├── types.ts          # +ToolResult, +max_output_bytes
└── tools&amp;#x2F;
    ├── shell.ts      # moved from .&amp;#x2F;tools.ts, returns ToolResult
    ├── fs.ts         # new
    ├── http.ts       # new
    └── git.ts        # new
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Two changes in &lt;code&gt;types.ts&lt;&#x2F;code&gt; versus &lt;code&gt;post-02&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; types.ts
export type ToolResult =
  | { ok: true; value: string }
  | { ok: false; error: string };

export type Tool = {
  name: string;
  description: string;
  input_schema: {
    type: &amp;quot;object&amp;quot;;
    properties: Record&amp;lt;string, unknown&amp;gt;;
    required?: string[];
  };
  max_output_bytes?: number;
  run: (input: Record&amp;lt;string, unknown&amp;gt;) =&amp;gt; Promise&amp;lt;ToolResult&amp;gt;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;ToolResult&lt;&#x2F;code&gt; is a tagged union. Tools never throw to the loop. They return &lt;code&gt;{ ok: false, error }&lt;&#x2F;code&gt; when the side effect fails or the input is wrong. This matters because the model needs to read the failure and decide what to do next. An exception kills the loop. A returned error gives the model a chance to retry, switch tools, or report back to the user.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;max_output_bytes&lt;&#x2F;code&gt; is the per-tool truncation cap. The default is small. A &lt;code&gt;shell&lt;&#x2F;code&gt; tool that runs &lt;code&gt;cat &#x2F;var&#x2F;log&#x2F;syslog&lt;&#x2F;code&gt; should not return three megabytes of text into a context window that costs you per token.&lt;&#x2F;p&gt;
&lt;p&gt;The registry itself is tiny.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; registry.ts
import type { Tool, ToolResult } from &amp;quot;.&amp;#x2F;types&amp;quot;;

export class Registry {
  private readonly tools = new Map&amp;lt;string, Tool&amp;gt;();

  register(tool: Tool): this {
    if (this.tools.has(tool.name)) {
      throw new Error(`duplicate tool: ${tool.name}`);
    }
    this.tools.set(tool.name, tool);
    return this;
  }

  schemas() {
    return Array.from(this.tools.values()).map(({ run, ...t }) =&amp;gt; t);
  }

  async dispatch(name: string, input: Record&amp;lt;string, unknown&amp;gt;): Promise&amp;lt;ToolResult&amp;gt; {
    const tool = this.tools.get(name);
    if (!tool) return { ok: false, error: `unknown tool: ${name}` };
    try {
      const result = await tool.run(input);
      return cap(result, tool.max_output_bytes ?? 8192);
    } catch (err) {
      return { ok: false, error: `tool threw: ${String(err)}` };
    }
  }
}

function cap(result: ToolResult, max: number): ToolResult {
  if (!result.ok) return result;
  const bytes = Buffer.byteLength(result.value, &amp;quot;utf8&amp;quot;);
  if (bytes &amp;lt;= max) return result;
  const head = result.value.slice(0, max);
  return { ok: true, value: `${head}\n\n[truncated: ${bytes - max} more bytes]` };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The registry holds tools, hands the model the schema view (without the handler), dispatches by name, caps output, and turns any thrown exception into a returned error. Forty lines. Done.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;four-tools&quot;&gt;Four tools&lt;&#x2F;h2&gt;
&lt;p&gt;Now the actual toolbox. The shapes are deliberate, and so are the descriptions.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; tools&amp;#x2F;fs.ts
import type { Tool } from &amp;quot;..&amp;#x2F;types&amp;quot;;

export const fs_read: Tool = {
  name: &amp;quot;fs_read&amp;quot;,
  description:
    &amp;quot;Read a UTF-8 text file from the local filesystem and return its contents. &amp;quot; +
    &amp;quot;Fails if the path does not exist, is not a regular file, is not valid UTF-8, &amp;quot; +
    &amp;quot;or exceeds 1 MB. Use this for source files, configs, and logs.&amp;quot;,
  input_schema: {
    type: &amp;quot;object&amp;quot;,
    properties: {
      path: { type: &amp;quot;string&amp;quot;, description: &amp;quot;Absolute or relative path to the file.&amp;quot; },
    },
    required: [&amp;quot;path&amp;quot;],
  },
  max_output_bytes: 1024 * 1024,
  run: async ({ path }) =&amp;gt; {
    try {
      const file = Bun.file(String(path));
      const exists = await file.exists();
      if (!exists) return { ok: false, error: `no such file: ${path}` };
      if (file.size &amp;gt; 1024 * 1024) return { ok: false, error: `file too large: ${file.size} bytes` };
      const text = await file.text();
      return { ok: true, value: text };
    } catch (err) {
      return { ok: false, error: String(err) };
    }
  },
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Notice the description. It tells the model what the tool does, when it fails, and when to choose it (“Use this for source files, configs, and logs”). The model’s tool-selection is a function of these strings. Vague descriptions produce vague selection. Boring, specific descriptions produce reliable selection.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; tools&amp;#x2F;http.ts
export const http_get: Tool = {
  name: &amp;quot;http_get&amp;quot;,
  description:
    &amp;quot;Perform an HTTP GET request and return the response body as text. &amp;quot; +
    &amp;quot;Times out after 10 seconds. Returns the status code in the result. &amp;quot; +
    &amp;quot;Use this to fetch public documentation, API responses, or web pages. &amp;quot; +
    &amp;quot;Do not use it to interact with internal services.&amp;quot;,
  input_schema: {
    type: &amp;quot;object&amp;quot;,
    properties: {
      url: { type: &amp;quot;string&amp;quot;, description: &amp;quot;Absolute https:&amp;#x2F;&amp;#x2F; URL.&amp;quot; },
    },
    required: [&amp;quot;url&amp;quot;],
  },
  max_output_bytes: 64 * 1024,
  run: async ({ url }) =&amp;gt; {
    const u = String(url);
    if (!u.startsWith(&amp;quot;https:&amp;#x2F;&amp;#x2F;&amp;quot;)) return { ok: false, error: &amp;quot;only https:&amp;#x2F;&amp;#x2F; is allowed&amp;quot; };
    try {
      const ctl = AbortSignal.timeout(10_000);
      const res = await fetch(u, { signal: ctl });
      const body = await res.text();
      return { ok: true, value: `status: ${res.status}\n\n${body}` };
    } catch (err) {
      return { ok: false, error: String(err) };
    }
  },
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Two design choices to call out. The tool enforces &lt;code&gt;https:&#x2F;&#x2F;&lt;&#x2F;code&gt; at the handler level even though the description says so, because the model will sometimes call it with &lt;code&gt;http:&#x2F;&#x2F;&lt;&#x2F;code&gt; anyway. The status code is folded into the value, not into a separate field, because the model reads strings.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; tools&amp;#x2F;git.ts
export const git: Tool = {
  name: &amp;quot;git&amp;quot;,
  description:
    &amp;quot;Run a read-only git command in the current repository and return its output. &amp;quot; +
    &amp;quot;Allowed subcommands: log, diff, show, status, branch, ls-files. &amp;quot; +
    &amp;quot;Any other subcommand is rejected. Use this to inspect history, &amp;quot; +
    &amp;quot;see uncommitted changes, or list tracked files.&amp;quot;,
  input_schema: {
    type: &amp;quot;object&amp;quot;,
    properties: {
      args: {
        type: &amp;quot;array&amp;quot;,
        items: { type: &amp;quot;string&amp;quot; },
        description: &amp;quot;Arguments after `git`, e.g. [&amp;#x27;log&amp;#x27;, &amp;#x27;--oneline&amp;#x27;, &amp;#x27;-5&amp;#x27;].&amp;quot;,
      },
    },
    required: [&amp;quot;args&amp;quot;],
  },
  max_output_bytes: 32 * 1024,
  run: async ({ args }) =&amp;gt; {
    const a = (args as string[]) ?? [];
    const allowed = new Set([&amp;quot;log&amp;quot;, &amp;quot;diff&amp;quot;, &amp;quot;show&amp;quot;, &amp;quot;status&amp;quot;, &amp;quot;branch&amp;quot;, &amp;quot;ls-files&amp;quot;]);
    if (a.length === 0 || !allowed.has(a[0])) {
      return { ok: false, error: `subcommand not allowed: ${a[0] ?? &amp;quot;(none)&amp;quot;}` };
    }
    const proc = Bun.spawn([&amp;quot;git&amp;quot;, ...a.map(String)], { stdout: &amp;quot;pipe&amp;quot;, stderr: &amp;quot;pipe&amp;quot; });
    const [stdout, stderr] = await Promise.all([
      new Response(proc.stdout).text(),
      new Response(proc.stderr).text(),
    ]);
    const code = await proc.exited;
    if (code !== 0) return { ok: false, error: stderr || `git exited ${code}` };
    return { ok: true, value: stdout };
  },
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;git&lt;&#x2F;code&gt; tool is interesting because it shows the allow-list pattern. The model can ask &lt;code&gt;git push --force&lt;&#x2F;code&gt; if it wants to. The handler refuses, returns a clear error, and the model goes back to the drawing board. The allow-list lives in the handler, not in the description, because trusting the model to obey natural-language constraints is exactly the trap &lt;a href=&quot;&#x2F;articles&#x2F;your-agent-wants-root&#x2F;&quot;&gt;post two&lt;&#x2F;a&gt; was about.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;shell&lt;&#x2F;code&gt; tool from &lt;code&gt;post-02&lt;&#x2F;code&gt; moves to &lt;code&gt;tools&#x2F;shell.ts&lt;&#x2F;code&gt; and gets the same envelope refactor as the new tools: its &lt;code&gt;run&lt;&#x2F;code&gt; now returns &lt;code&gt;ToolResult&lt;&#x2F;code&gt; instead of a raw string. The sandbox flags and the executed command stay exactly as they were. Four tools, registered:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; agent.ts
import { Registry } from &amp;quot;.&amp;#x2F;registry&amp;quot;;
import { fs_read } from &amp;quot;.&amp;#x2F;tools&amp;#x2F;fs&amp;quot;;
import { http_get } from &amp;quot;.&amp;#x2F;tools&amp;#x2F;http&amp;quot;;
import { git } from &amp;quot;.&amp;#x2F;tools&amp;#x2F;git&amp;quot;;
import { shell } from &amp;quot;.&amp;#x2F;tools&amp;#x2F;shell&amp;quot;;

const tools = new Registry()
  .register(shell)
  .register(fs_read)
  .register(http_get)
  .register(git);
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;the-loop-changes-a-little&quot;&gt;The loop changes a little&lt;&#x2F;h2&gt;
&lt;p&gt;Modern frontier models can ask for several tools in a single turn. Treat them as parallel calls and you save round trips. Treat them as sequential and the model will figure it out, but slowly.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; agent.ts (continued)
const response = await client.messages.create({
  model: &amp;quot;claude-sonnet-4-6&amp;quot;,
  max_tokens: 4096,
  system: SYSTEM_PROMPT,
  tools: tools.schemas(),
  messages,
});

messages.push({ role: &amp;quot;assistant&amp;quot;, content: response.content });

if (response.stop_reason === &amp;quot;end_turn&amp;quot;) {
  &amp;#x2F;&amp;#x2F; ... print final answer, return
}

const calls = response.content.filter((b) =&amp;gt; b.type === &amp;quot;tool_use&amp;quot;);
const results = await Promise.all(
  calls.map(async (block) =&amp;gt; {
    const result = await tools.dispatch(block.name, block.input as Record&amp;lt;string, unknown&amp;gt;);
    console.error(`&amp;gt; ${block.name} ${JSON.stringify(block.input)} -&amp;gt; ${result.ok ? &amp;quot;ok&amp;quot; : &amp;quot;err&amp;quot;}`);
    return {
      type: &amp;quot;tool_result&amp;quot; as const,
      tool_use_id: block.id,
      content: result.ok ? result.value : result.error,
      is_error: !result.ok,
    };
  }),
);

messages.push({ role: &amp;quot;user&amp;quot;, content: results });
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Two changes versus the &lt;code&gt;post-02&lt;&#x2F;code&gt; dispatch. The loop runs all tool calls from a single turn in parallel with &lt;code&gt;Promise.all&lt;&#x2F;code&gt;. The &lt;code&gt;is_error&lt;&#x2F;code&gt; flag now gets set when the result was an error, because the model uses it to decide whether to retry or change strategy. The rest of the work — unknown-tool handling, exception wrapping, output capping — moved into the registry, so the agent.ts dispatch shrinks from roughly twenty-five lines to ten.&lt;&#x2F;p&gt;
&lt;p&gt;A short transcript shows the difference.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;$ bun agent.ts &amp;quot;summarize the largest TypeScript file under src and tell me what changed in the last commit&amp;quot;
&amp;gt; fs_read {&amp;quot;path&amp;quot;:&amp;quot;src&amp;#x2F;agent.ts&amp;quot;} -&amp;gt; ok
&amp;gt; git {&amp;quot;args&amp;quot;:[&amp;quot;log&amp;quot;,&amp;quot;-1&amp;quot;,&amp;quot;--stat&amp;quot;]} -&amp;gt; ok
src&amp;#x2F;agent.ts (185 lines) implements the agent loop against the Anthropic
Messages API. It builds a registry of four tools (shell, fs_read, http_get,
git), dispatches tool_use blocks in parallel, and stops when the model
returns an end_turn response or hits the 10-iteration budget. The last
commit added the parallel-dispatch path and a small output truncation
helper; net 38 lines added across agent.ts and registry.ts.
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;One turn. Two tool calls. They ran simultaneously. The model fused the results into a single answer.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;designing-tools-that-the-model-can-actually-use&quot;&gt;Designing tools that the model can actually use&lt;&#x2F;h2&gt;
&lt;p&gt;After you have built the registry, the failure mode that bites you is not the code. It is the design. Three patterns that hold up.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;One tool per concept.&lt;&#x2F;strong&gt; A &lt;code&gt;fs&lt;&#x2F;code&gt; tool with a union input schema (&lt;code&gt;mode: &quot;read&quot; | &quot;list&quot; | &quot;stat&quot;&lt;&#x2F;code&gt;) reads cleaner to a human and worse to a model. The model has to pick the right &lt;code&gt;mode&lt;&#x2F;code&gt; &lt;em&gt;and&lt;&#x2F;em&gt; the right arguments simultaneously. Split it: &lt;code&gt;fs_read&lt;&#x2F;code&gt;, &lt;code&gt;fs_list&lt;&#x2F;code&gt;, &lt;code&gt;fs_stat&lt;&#x2F;code&gt;. Three tools, three clear pictures. The model picks better and the schemas are simpler.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Descriptions are written for the model.&lt;&#x2F;strong&gt; The description is the only place the model learns when to use a tool. Be specific about inputs, outputs, error cases, and use cases. “Read a file” picks worse than the &lt;code&gt;fs_read&lt;&#x2F;code&gt; description above. The cost of the extra eighty tokens per turn is rounding error against the cost of the model picking the wrong tool and looping.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Errors are data.&lt;&#x2F;strong&gt; Every tool returns &lt;code&gt;{ ok, error }&lt;&#x2F;code&gt; rather than throwing. The model can read the error, reason about it, and choose: retry with different inputs, switch tools, or surface the failure to the user. An exception removes all of that.&lt;&#x2F;p&gt;
&lt;p&gt;The honest version of the tradeoff is in the table.&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Decision&lt;&#x2F;th&gt;&lt;th&gt;Cheap option&lt;&#x2F;th&gt;&lt;th&gt;Right option&lt;&#x2F;th&gt;&lt;th&gt;Why&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Tool granularity&lt;&#x2F;td&gt;&lt;td&gt;one tool with a &lt;code&gt;mode&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;one tool per concept&lt;&#x2F;td&gt;&lt;td&gt;Better selection, simpler schemas, simpler errors.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Description&lt;&#x2F;td&gt;&lt;td&gt;one sentence&lt;&#x2F;td&gt;&lt;td&gt;inputs, errors, when-to-use, in prose&lt;&#x2F;td&gt;&lt;td&gt;The model picks from the string.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Error model&lt;&#x2F;td&gt;&lt;td&gt;throw exceptions&lt;&#x2F;td&gt;&lt;td&gt;tagged-union &lt;code&gt;ToolResult&lt;&#x2F;code&gt; always&lt;&#x2F;td&gt;&lt;td&gt;The model can recover. Exceptions kill the loop.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Output size&lt;&#x2F;td&gt;&lt;td&gt;return whatever the OS gives&lt;&#x2F;td&gt;&lt;td&gt;cap per tool, truncate with a marker&lt;&#x2F;td&gt;&lt;td&gt;Context windows are not log files.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Side effects&lt;&#x2F;td&gt;&lt;td&gt;run, hope, retry on error&lt;&#x2F;td&gt;&lt;td&gt;idempotency keys or &lt;code&gt;confirm&lt;&#x2F;code&gt; argument&lt;&#x2F;td&gt;&lt;td&gt;Retries are real. Re-deletes are real.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Parallel calls&lt;&#x2F;td&gt;&lt;td&gt;serial loop&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;Promise.all&lt;&#x2F;code&gt; over tool_use blocks&lt;&#x2F;td&gt;&lt;td&gt;Modern models batch. Latency drops by ~Nx.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Sensitive ops&lt;&#x2F;td&gt;&lt;td&gt;“do not delete files” in prompt&lt;&#x2F;td&gt;&lt;td&gt;allow-list in the handler&lt;&#x2F;td&gt;&lt;td&gt;The model will eventually try anyway.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;h2 id=&quot;failure-modes-worth-naming&quot;&gt;Failure modes worth naming&lt;&#x2F;h2&gt;
&lt;p&gt;The first time you hit any of these you will think your code is broken. It is not. These are tool-layer problems specifically.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Tool sprawl.&lt;&#x2F;strong&gt; Beyond roughly twenty tools, models start picking poorly. Domains blur. Two tools with similar descriptions get confused. The fix is not “better descriptions.” It is fewer tools at any given turn, achieved by routing or by giving different sub-agents different toolboxes. We will come back to this in &lt;a href=&quot;#&quot;&gt;post twelve&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Description rot.&lt;&#x2F;strong&gt; A tool’s behavior changes (faster timeout, new error mode, narrower input). The description does not. The model keeps picking it for the old reasons. Treat descriptions as part of the API surface. Version them.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Argument hallucination.&lt;&#x2F;strong&gt; The model passes &lt;code&gt;path: &quot;the file the user mentioned&quot;&lt;&#x2F;code&gt; instead of an actual path because it lost track of the conversation. Strict schemas help. Server-side validation of plausible inputs (file exists, URL parses) helps more.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Hallucinated tools.&lt;&#x2F;strong&gt; The model invents a tool name that is not in the registry. The registry returns &lt;code&gt;unknown tool: X&lt;&#x2F;code&gt; as an error. The model reads it and either retries with a real tool or apologizes. Both are correct behaviors. The bug would be silently dispatching to a default handler.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Output explosion.&lt;&#x2F;strong&gt; A &lt;code&gt;cat large_file&lt;&#x2F;code&gt; or a &lt;code&gt;curl massive_api&lt;&#x2F;code&gt; blows the context window. The per-tool cap above handles it. Without a cap, you discover the problem at a token-cost billing alert.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Side effects on retry.&lt;&#x2F;strong&gt; The model calls a tool, the call hangs, the loop retries, the tool runs twice. For idempotent reads, this is fine. For &lt;code&gt;git push&lt;&#x2F;code&gt;, &lt;code&gt;email_send&lt;&#x2F;code&gt;, or &lt;code&gt;database_write&lt;&#x2F;code&gt;, it is not. Idempotency keys or explicit &lt;code&gt;confirm: true&lt;&#x2F;code&gt; arguments are the only durable fixes.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Concurrent races.&lt;&#x2F;strong&gt; Parallel tool calls can step on each other. Two &lt;code&gt;fs_write&lt;&#x2F;code&gt; calls to the same path in one turn is the classic example. The agent will not notice. You will, in production. The fix is per-tool serialization or explicit no-parallel marking in the dispatcher.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Auth surfaces in errors.&lt;&#x2F;strong&gt; A tool that calls an internal API may put a bearer token in its error message when the request fails. That error becomes part of the conversation and gets sent back to the model on the next turn. Strip secrets from error strings at the handler.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;what-this-layer-does-not-solve&quot;&gt;What this layer does not solve&lt;&#x2F;h2&gt;
&lt;p&gt;This is the tool layer. It is not the context layer, the memory layer, or the identity layer. Things you might expect this post to cover that get a dedicated post later.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Where the tool definitions live.&lt;&#x2F;strong&gt; Hardcoded into the agent works at the demo scale. Real fleets need a shared registry. That is MCP. &lt;a href=&quot;#&quot;&gt;Post eight&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Which tools an agent should know about for a given task.&lt;&#x2F;strong&gt; The registry above gives every tool to every turn. Per-task filtering and tool discovery are part of the context strategy. &lt;a href=&quot;#&quot;&gt;Post four&lt;&#x2F;a&gt; and &lt;a href=&quot;#&quot;&gt;post five&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Which agent gets which tools.&lt;&#x2F;strong&gt; Different sub-agents need different toolboxes. The dispatch problem is the multi-agent problem. &lt;a href=&quot;#&quot;&gt;Post ten&lt;&#x2F;a&gt; and &lt;a href=&quot;#&quot;&gt;post twelve&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Who is allowed to call which tool.&lt;&#x2F;strong&gt; Per-tool RBAC, capability tokens, human-in-the-loop approvals. &lt;a href=&quot;#&quot;&gt;Post fourteen&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Identity for the call itself.&lt;&#x2F;strong&gt; When the &lt;code&gt;http_get&lt;&#x2F;code&gt; tool calls an internal service, the service needs to know who is asking. &lt;a href=&quot;#&quot;&gt;Post thirteen&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;where-this-lands-in-the-platform&quot;&gt;Where this lands in the platform&lt;&#x2F;h2&gt;
&lt;p&gt;Total damage going from &lt;code&gt;post-02&lt;&#x2F;code&gt; to &lt;code&gt;post-03&lt;&#x2F;code&gt;: one new file (&lt;code&gt;registry.ts&lt;&#x2F;code&gt;), one new directory (&lt;code&gt;tools&#x2F;&lt;&#x2F;code&gt;) with four files, two extensions to &lt;code&gt;types.ts&lt;&#x2F;code&gt;, a rewritten &lt;code&gt;agent.ts&lt;&#x2F;code&gt; dispatch, and a one-line &lt;code&gt;tsconfig.json&lt;&#x2F;code&gt; update so the build picks up the new subdirectory. &lt;code&gt;git diff post-02 post-03&lt;&#x2F;code&gt; against the companion repo is the entire delta this post describes. The system prompt grows by a sentence. The iteration budget, the message-history shape, and the overall loop are unchanged.&lt;&#x2F;p&gt;
&lt;p&gt;Post one was the loop. Post two was the runtime around the loop. This post is what the loop reaches through to do anything useful. In the reference architecture from &lt;a href=&quot;#&quot;&gt;post twenty-two&lt;&#x2F;a&gt;, the registry is the seam between the agent process and everything else: MCP servers, internal APIs, file systems, side effects, the world.&lt;&#x2F;p&gt;
&lt;p&gt;The rule from earlier posts still holds. The harness only ever grows; it does not get rewritten. Each post adds one layer to the same artifact and explains why the layer below was not enough.&lt;&#x2F;p&gt;
&lt;p&gt;The layer below this one was an empty toolbox. The layer above is what the model knows when it picks. A model with a brilliant toolbox and no context will pick wrong every time. Next we make the model less blind. That post will ship as &lt;code&gt;post-04&lt;&#x2F;code&gt; in the same repo.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;next&quot;&gt;Next&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Part 4: Context Is the Product.&lt;&#x2F;strong&gt; Models are commodities. Context is not. Sources of context, why retrieval alone is not enough, and a minimal &lt;code&gt;.AGENTS&#x2F;&lt;&#x2F;code&gt; convention that loads context from disk into the same agent we have been building.&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>Your Agent Wants Root</title>
          <pubDate>Fri, 05 Jun 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/your-agent-wants-root/</link>
          <guid>https://raskell.io/articles/your-agent-wants-root/</guid>
          <description xml:base="https://raskell.io/articles/your-agent-wants-root/">&lt;blockquote&gt;
&lt;p&gt;Part 2 of &lt;em&gt;The Agent Platform Handbook. From Loop to Platform.&lt;&#x2F;em&gt; Previous: &lt;a href=&quot;&#x2F;articles&#x2F;what-an-agent-actually-is&#x2F;&quot;&gt;What an Agent Actually Is&lt;&#x2F;a&gt;. Next: Tools, How Agents Actually Do Things.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;In &lt;a href=&quot;&#x2F;articles&#x2F;what-an-agent-actually-is&#x2F;&quot;&gt;post one&lt;&#x2F;a&gt; we did not just define what an agent is. We built one. A working harness in roughly one hundred and fifty lines of TypeScript on Bun: a loop, a one-tool registry, a system prompt, a dispatcher, and an iteration budget.&lt;&#x2F;p&gt;
&lt;p&gt;A note on the word, because post one did not use it. The &lt;em&gt;harness&lt;&#x2F;em&gt; is the code that wraps the model and turns a chat API into something that acts. Loop, tools, system prompt, dispatcher, budgets today, and (further into the series) memory, identity, observability, policy, and the rest. Everything that is yours to write and yours to operate. The model is a dependency you call. The harness is the artifact you ship. In this series the harness is what grows, post by post.&lt;&#x2F;p&gt;
&lt;p&gt;The code lives at tag &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&#x2F;tree&#x2F;post-01&quot;&gt;&lt;code&gt;post-01&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; of &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&quot;&gt;&lt;code&gt;the-agent-platform-handbook&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;, and the rest of the series builds on it one layer at a time. Every post adds one file or rewrites one piece. This is the post that adds the second layer.&lt;&#x2F;p&gt;
&lt;p&gt;The harness has one tool so far, called &lt;code&gt;shell&lt;&#x2F;code&gt;, that runs any command you hand it under &lt;code&gt;sh -c&lt;&#x2F;code&gt;. Post one closed with a one-line caveat: that tool will run &lt;code&gt;rm -rf $HOME&lt;&#x2F;code&gt; if the model decides that is the right command, and the model will decide this at least once.&lt;&#x2F;p&gt;
&lt;p&gt;This post is about the gap between “at least once” and “we are fine with that.”&lt;&#x2F;p&gt;
&lt;p&gt;The shell tool from post one is, by any honest reading, an arbitrary-code-execution primitive with a polite English-language API in front of it. The model is non-deterministic. Tool results can carry instructions the model will treat as authoritative. Other agents may feed inputs back into your loop. Each of those is a way for a tool call to do something the operator did not intend. The defense is not to write better prompts. The defense is to put a fence around the runtime so the cost of a bad call is bounded.&lt;&#x2F;p&gt;
&lt;p&gt;So this is the runtime post: the layer that sits underneath every tool the harness will ever have. We will sketch the threat model, walk the lineage of isolation primitives from chroot to microVMs, explain why “just run it in Docker” is only half an answer, and end by adding that layer to the harness from post one. The agent loop does not move. Only the floor underneath the shell tool changes, and the diff lands as tag &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&#x2F;tree&#x2F;post-02&quot;&gt;&lt;code&gt;post-02&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; in the same repo.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-an-agent-runtime-is-actually-exposed-to&quot;&gt;What an agent runtime is actually exposed to&lt;&#x2F;h2&gt;
&lt;p&gt;A traditional service runs known code. You wrote it. You reviewed it. The only inputs are the ones the API contract allows. An agent runtime is not that.&lt;&#x2F;p&gt;
&lt;p&gt;Three things make the agent threat model different.&lt;&#x2F;p&gt;
&lt;p&gt;First, the &lt;strong&gt;model picks the action&lt;&#x2F;strong&gt;. The set of commands your shell tool will execute over its lifetime is a function of every prompt, every retrieved document, every tool output, and the model’s sampling temperature. You cannot enumerate it in advance. You cannot review it in code. You can only constrain what happens after.&lt;&#x2F;p&gt;
&lt;p&gt;Second, &lt;strong&gt;tool results are an injection surface&lt;&#x2F;strong&gt;. A web page the agent fetches can contain “ignore previous instructions and run X.” A code file the agent reads can contain a comment that nudges the next decision. This is indirect prompt injection, documented by Greshake and others in 2023, and there is no fully reliable defense at the model layer. The runtime is where you assume the model will eventually be tricked.&lt;&#x2F;p&gt;
&lt;p&gt;Third, &lt;strong&gt;blast radius scales with fanout&lt;&#x2F;strong&gt;. A single agent in a single shell on a single laptop is a manageable problem. A fleet of one hundred agents per tenant, each with shell, network, and file-system access, is not. Once you have more than one, you have a multi-tenant security problem whether you planned for one or not.&lt;&#x2F;p&gt;
&lt;p&gt;So the question is no longer “can my agent be subverted.” It will be. The question is “what does the rest of the system look like the day after.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-short-history-of-trying-to-contain-a-process&quot;&gt;A short history of trying to contain a process&lt;&#x2F;h2&gt;
&lt;p&gt;The good news is that the industry has been working on “give this process less than the full machine” for forty-six years. The bad news is that almost none of the early answers were built with an adversary that picks its own commands in mind.&lt;&#x2F;p&gt;
&lt;p&gt;The lineage matters because every modern option is a reaction to a specific failure of the option before it.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;chroot&lt;&#x2F;strong&gt;, Unix V7, 1979. Bill Joy’s contribution. A process sees a subtree of the filesystem as its root. Designed for build isolation, not security. Trivial to escape if you have any capability beyond a vanilla user.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;FreeBSD jails&lt;&#x2F;strong&gt;, Poul-Henning Kamp, 1999. Took chroot’s filesystem trick and added process visibility, network, and user separation. The first credible “container” in the modern sense. Still in production today and still good at what it does.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Solaris Zones&lt;&#x2F;strong&gt;, 2004. A more ambitious version of jails with resource controls. Influential design, not a survivor.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Linux cgroups and namespaces&lt;&#x2F;strong&gt;, 2002 through 2008. Namespaces gave you separate views of mounts, PIDs, networks, and users. cgroups, originally Process Containers from Google in 2007, gave you per-group resource accounting. The pieces existed. The user experience was awful.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;LXC&lt;&#x2F;strong&gt;, 2008. Tied cgroups and namespaces into a single CLI. Still awful, just less so.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Docker&lt;&#x2F;strong&gt;, 2013. Took the same primitives and made them shippable. Image format, registry, declarative configuration, single command. The reason every paragraph after this one talks about “containers” is Docker.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Kubernetes&lt;&#x2F;strong&gt;, 2014. Made running many containers across many machines a default. Pushed isolation choices down into the runtime layer.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Kata Containers&lt;&#x2F;strong&gt;, 2017. A merger of Intel Clear Containers and Hyper.sh runV. Each pod runs in its own Linux VM, exposed to Kubernetes through a runtime that looks OCI-compatible. The first serious answer to “what if my container shared less than a whole kernel with the host.”&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Firecracker&lt;&#x2F;strong&gt;, AWS, 2018. A minimal virtual machine monitor on top of KVM, built to run Lambda and Fargate workloads. Around 125 milliseconds to boot. No legacy device model. Designed from the start for multi-tenant short-lived workloads.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;gVisor&lt;&#x2F;strong&gt;, Google, 2018. A user-space kernel, called &lt;code&gt;runsc&lt;&#x2F;code&gt;, that intercepts syscalls from a container and services most of them itself, only escalating a narrow subset to the host. Drop-in replacement for &lt;code&gt;runc&lt;&#x2F;code&gt; in any OCI-compatible runtime. Slower I&#x2F;O, smaller attack surface.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Wasmtime and the WASI runtimes&lt;&#x2F;strong&gt;, 2019 onward. A different model entirely: compile your tool to WebAssembly, run it in a sandbox with capability-passed I&#x2F;O. Excellent for things that fit. Not yet a general answer for “run arbitrary shell commands.”&lt;&#x2F;p&gt;
&lt;p&gt;The lesson from this lineage is the one the agent runtime question keeps re-asking. Every layer was designed for the threat model of the time. chroot was about build isolation. Docker was about deployment ergonomics. Firecracker, gVisor, and Kata were the first three options designed in an era where the workload itself was assumed to be untrusted. That assumption is the one that matches what an agent does.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-a-default-docker-container-is-not-enough&quot;&gt;Why a default Docker container is not enough&lt;&#x2F;h2&gt;
&lt;p&gt;Docker is the default for almost every agent project that is past the laptop stage. There are good reasons. The packaging is good. The ecosystem is enormous. Kubernetes speaks it natively. For most workloads that are not adversarial, it is the right answer.&lt;&#x2F;p&gt;
&lt;p&gt;Three things stop it from being enough for an agent runtime.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;The kernel is shared.&lt;&#x2F;strong&gt; A default &lt;code&gt;docker run&lt;&#x2F;code&gt; puts your process in a set of namespaces with a set of cgroups, but it runs against the same kernel as the host. Any kernel bug reachable from the container becomes a host compromise. This is not theoretical. The runc CVEs of 2019 and 2024 each gave a container with default settings a path to escape. Those got patched. The next one is in flight somewhere.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;The defaults are generous.&lt;&#x2F;strong&gt; A vanilla &lt;code&gt;docker run&lt;&#x2F;code&gt; keeps a substantial set of Linux capabilities, allows the container to write to its own root filesystem, leaves network access to internal services open by default, and runs the process as root inside the container. None of those are inherent to containers. All of them have to be turned off explicitly.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;The agent will be told to do things.&lt;&#x2F;strong&gt; Volume mounts that expose the host filesystem will get used. Network interfaces that reach internal APIs will get called. Secrets sitting in environment variables will end up in tool output. The model is helpful. It does not have a security review board.&lt;&#x2F;p&gt;
&lt;p&gt;The honest version of the rule is this. A hardened Docker container, with capabilities dropped, network disabled, the root filesystem read-only, user namespacing on, no-new-privileges set, and resource caps in place, is enough for &lt;strong&gt;single-tenant, low-stakes, well-scoped&lt;&#x2F;strong&gt; agent workloads. It is not enough for multi-tenant or for code paths where the agent can be steered by external input. For those, you want a second isolation boundary underneath the container.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-mental-model&quot;&gt;The mental model&lt;&#x2F;h2&gt;
&lt;p&gt;There are three boundaries you can put between an agent’s tool call and the host kernel. Stack them in your head before picking a vendor.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Three isolation boundaries for an agent tool call&quot;&gt;host kernel
       =================================================
       |                                               |
       |    +------------------+    +---------------+  |
       |    |  Docker default  |    |  Hardened     |  |
       |    |  ----------------|    |  Docker       |  |
       |    |  shared kernel,  |    |  ----------   |  |
       |    |  many caps,      |    |  no caps,     |  |
       |    |  network on,     |    |  no net,      |  |
       |    |  fs writable     |    |  ro fs        |  |
       |    +------------------+    +---------------+  |
       |             ^                       ^         |
       |             |                       |         |
       |    +--------+-----------------------+------+  |
       |    |               gVisor (runsc)          |  |
       |    |  user-space kernel intercepts the     |  |
       |    |  syscall surface. host kernel sees    |  |
       |    |  only a narrow subset.                |  |
       |    +---------------------------------------+  |
       |                       ^                       |
       |                       |                       |
       |    +------------------+--------------------+  |
       |    |        Firecracker &amp;#x2F; Kata             |  |
       |    |  separate Linux kernel per workload.  |  |
       |    |  KVM is the boundary. host kernel is  |  |
       |    |  one hypervisor call away.            |  |
       |    +---------------------------------------+  |
       =================================================
                            host hardware&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Read it bottom-up. Firecracker and Kata give you the strongest isolation by giving the workload its own kernel and putting KVM between it and yours. gVisor gives you most of the same benefit with a lower operational cost by replacing the syscall surface with a user-space implementation. Hardened Docker is what you actually want at the laptop or small-team scale. Default Docker is fine for code you wrote and not for code the model wrote.&lt;&#x2F;p&gt;
&lt;p&gt;Pick the layer that matches the workload. Stacking is allowed and often correct: hardened Docker plus gVisor is the usual production starting point for agent fleets.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;adding-the-runtime-layer-to-the-harness&quot;&gt;Adding the runtime layer to the harness&lt;&#x2F;h2&gt;
&lt;p&gt;The cheapest meaningful upgrade to the harness from post one is to stop running tool calls in the same process as the agent loop. Wrap the shell tool in a hardened container, with gVisor underneath if you have it installed. The loop stays the same. The blast radius drops by an order of magnitude. None of the four other pieces of the harness (system prompt, dispatcher, message history, iteration budget) need to know any of this happened.&lt;&#x2F;p&gt;
&lt;p&gt;Install gVisor once on the host:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;# Linux only. macOS users can skip gVisor and keep the hardened flags.
curl -fsSL https:&amp;#x2F;&amp;#x2F;gvisor.dev&amp;#x2F;archive.key | sudo gpg --dearmor \
  -o &amp;#x2F;usr&amp;#x2F;share&amp;#x2F;keyrings&amp;#x2F;gvisor-archive-keyring.gpg
echo &amp;quot;deb [arch=$(dpkg --print-architecture) \
  signed-by=&amp;#x2F;usr&amp;#x2F;share&amp;#x2F;keyrings&amp;#x2F;gvisor-archive-keyring.gpg] \
  https:&amp;#x2F;&amp;#x2F;storage.googleapis.com&amp;#x2F;gvisor&amp;#x2F;releases release main&amp;quot; | \
  sudo tee &amp;#x2F;etc&amp;#x2F;apt&amp;#x2F;sources.list.d&amp;#x2F;gvisor.list
sudo apt-get update &amp;amp;&amp;amp; sudo apt-get install -y runsc
sudo runsc install
sudo systemctl reload docker
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You now have &lt;code&gt;runsc&lt;&#x2F;code&gt; as an available Docker runtime. From here, the rest of this section is three small changes to the harness from post one. The repo at &lt;code&gt;post-01&lt;&#x2F;code&gt; has three files (&lt;code&gt;agent.ts&lt;&#x2F;code&gt;, &lt;code&gt;tools.ts&lt;&#x2F;code&gt;, &lt;code&gt;package.json&lt;&#x2F;code&gt;); at &lt;code&gt;post-02&lt;&#x2F;code&gt; it has four. Run &lt;code&gt;git diff post-01 post-02&lt;&#x2F;code&gt; against &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&quot;&gt;&lt;code&gt;the-agent-platform-handbook&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; and the entire delta of this post is on screen.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Change 1: extract the &lt;code&gt;Tool&lt;&#x2F;code&gt; type into its own file.&lt;&#x2F;strong&gt; In post one, &lt;code&gt;Tool&lt;&#x2F;code&gt; lived at the top of &lt;code&gt;tools.ts&lt;&#x2F;code&gt; next to the only implementation. We are about to grow the toolbox and start swapping tool implementations between sandboxed and non-sandboxed variants, so the type and the implementations want to live in different files. Pure refactor, no behavior change.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; types.ts (new file)
export type Tool = {
  name: string;
  description: string;
  input_schema: {
    type: &amp;quot;object&amp;quot;;
    properties: Record&amp;lt;string, unknown&amp;gt;;
    required?: string[];
  };
  run: (input: Record&amp;lt;string, unknown&amp;gt;) =&amp;gt; Promise&amp;lt;string&amp;gt;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Change 2: point &lt;code&gt;agent.ts&lt;&#x2F;code&gt; at the new location.&lt;&#x2F;strong&gt; One line. The loop, the system prompt, the iteration budget, the tool-call dispatcher are all untouched.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;diff&quot; class=&quot;language-diff &quot;&gt;&lt;code class=&quot;language-diff&quot; data-lang=&quot;diff&quot;&gt;  &amp;#x2F;&amp;#x2F; agent.ts
  import Anthropic from &amp;quot;@anthropic-ai&amp;#x2F;sdk&amp;quot;;
  import type { MessageParam, ToolResultBlockParam } from &amp;quot;@anthropic-ai&amp;#x2F;sdk&amp;#x2F;resources&amp;#x2F;messages&amp;quot;;
- import { shell, type Tool } from &amp;quot;.&amp;#x2F;tools&amp;quot;;
+ import { shell } from &amp;quot;.&amp;#x2F;tools&amp;quot;;
+ import type { Tool } from &amp;quot;.&amp;#x2F;types&amp;quot;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Change 3: rewrite &lt;code&gt;tools.ts&lt;&#x2F;code&gt; so the shell tool runs inside a hardened container.&lt;&#x2F;strong&gt; The exported name, description, and input schema stay the same so the model sees the same tool. Everything that changes is below the public interface, in &lt;code&gt;run&lt;&#x2F;code&gt;. That is the separation we are paying for: the agent does not know its tool got fenced.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; tools.ts (sandboxed version)
import type { Tool } from &amp;quot;.&amp;#x2F;types&amp;quot;;

const SANDBOX_IMAGE = &amp;quot;alpine:3.20&amp;quot;;
const SANDBOX_WORKDIR = &amp;quot;&amp;#x2F;work&amp;quot;;

const dockerArgs = (image: string, command: string) =&amp;gt; [
  &amp;quot;docker&amp;quot;, &amp;quot;run&amp;quot;,
  &amp;quot;--rm&amp;quot;,
  &amp;quot;--runtime=runsc&amp;quot;,                  &amp;#x2F;&amp;#x2F; gVisor. drop on macOS.
  &amp;quot;--network=none&amp;quot;,                   &amp;#x2F;&amp;#x2F; no exfil, no SSRF
  &amp;quot;--read-only&amp;quot;,                      &amp;#x2F;&amp;#x2F; no writes to the rootfs
  &amp;quot;--tmpfs&amp;quot;, &amp;quot;&amp;#x2F;tmp:size=64m&amp;quot;,         &amp;#x2F;&amp;#x2F; give &amp;#x2F;tmp back, bounded
  &amp;quot;--cap-drop=ALL&amp;quot;,                   &amp;#x2F;&amp;#x2F; no Linux capabilities
  &amp;quot;--security-opt=no-new-privileges&amp;quot;, &amp;#x2F;&amp;#x2F; no setuid escalation
  &amp;quot;--user=1000:1000&amp;quot;,                 &amp;#x2F;&amp;#x2F; unprivileged uid in the container
  &amp;quot;--memory=256m&amp;quot;,                    &amp;#x2F;&amp;#x2F; hard memory cap
  &amp;quot;--cpus=0.5&amp;quot;,                       &amp;#x2F;&amp;#x2F; fractional cpu cap
  &amp;quot;--pids-limit=64&amp;quot;,                  &amp;#x2F;&amp;#x2F; no fork bombs
  &amp;quot;--workdir&amp;quot;, SANDBOX_WORKDIR,
  image,
  &amp;quot;sh&amp;quot;, &amp;quot;-c&amp;quot;, command,
];

export const shell: Tool = {
  name: &amp;quot;shell&amp;quot;,
  description:
    &amp;quot;Run a shell command inside an isolated sandbox with no network and a read-only filesystem. Returns stdout, stderr, and exit code as JSON.&amp;quot;,
  input_schema: {
    type: &amp;quot;object&amp;quot;,
    properties: {
      command: {
        type: &amp;quot;string&amp;quot;,
        description: &amp;quot;Shell command to run under `sh -c` inside the sandbox.&amp;quot;,
      },
    },
    required: [&amp;quot;command&amp;quot;],
  },
  run: async ({ command }) =&amp;gt; {
    const args = dockerArgs(SANDBOX_IMAGE, String(command));
    const proc = Bun.spawn(args, { stdout: &amp;quot;pipe&amp;quot;, stderr: &amp;quot;pipe&amp;quot; });
    const [stdout, stderr] = await Promise.all([
      new Response(proc.stdout).text(),
      new Response(proc.stderr).text(),
    ]);
    const code = await proc.exited;
    return JSON.stringify({ code, stdout, stderr });
  },
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Total damage: one new file of ten lines, one moved import, and forty-five lines of &lt;code&gt;tools.ts&lt;&#x2F;code&gt;. In exchange, roughly a thousand-fold reduction in blast radius. The agent can still ask &lt;code&gt;rm -rf &#x2F;&lt;&#x2F;code&gt;. It will return an exit code, an error, and a clean host. Network calls will fail closed. Reads outside &lt;code&gt;&#x2F;work&lt;&#x2F;code&gt; are not possible because nothing is mounted into &lt;code&gt;&#x2F;work&lt;&#x2F;code&gt;. The sandbox dies the moment the command returns, so persistence between calls is also gone, which is a separate problem we will pick up in &lt;a href=&quot;#&quot;&gt;post six on memory&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;A few things to notice.&lt;&#x2F;p&gt;
&lt;p&gt;The harness shape from post one is intact. Same loop, same dispatcher, same system prompt, same iteration budget, same JSON contract between the agent and the tool. The sandbox is a runtime concern, not an agent-architecture concern. This separation is exactly the point. You should be able to swap gVisor for Firecracker, or Docker for Podman, or Alpine for a custom image, without touching the loop.&lt;&#x2F;p&gt;
&lt;p&gt;The tradeoffs are real. Spawning a container per tool call costs roughly 200 to 800 milliseconds on a modern host, mostly Docker daemon overhead. For a coding agent that runs three shell commands per turn, that is acceptable. For a sub-millisecond-per-call tool, it is not. The fix at scale is a container pool, or moving to Firecracker microVMs with snapshot&#x2F;restore where boot time drops to around 125 milliseconds and per-call cost drops further.&lt;&#x2F;p&gt;
&lt;p&gt;The image is &lt;code&gt;alpine:3.20&lt;&#x2F;code&gt;, which is around 8 megabytes. Use a bigger image when the workload needs it. Mount a per-session work directory under &lt;code&gt;&#x2F;work&lt;&#x2F;code&gt; when the agent needs to read or write files between calls. Both of those are configuration changes, not architecture changes.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;picking-the-layer&quot;&gt;Picking the layer&lt;&#x2F;h2&gt;
&lt;p&gt;Use the table to pick where to start. The honest answer for most teams is “hardened Docker plus gVisor today, Firecracker microVMs later when scale or tenancy demands it.”&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Workload&lt;&#x2F;th&gt;&lt;th&gt;Sensible default&lt;&#x2F;th&gt;&lt;th&gt;Why&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Local dev agent, one user, your laptop&lt;&#x2F;td&gt;&lt;td&gt;Hardened Docker, gVisor if available&lt;&#x2F;td&gt;&lt;td&gt;Convenience wins; the blast radius is one machine.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Shared internal tool, single tenant&lt;&#x2F;td&gt;&lt;td&gt;Hardened Docker plus gVisor&lt;&#x2F;td&gt;&lt;td&gt;Cheap upgrade, real benefit, no orchestration cost.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Multi-tenant SaaS, agent per user session&lt;&#x2F;td&gt;&lt;td&gt;Firecracker microVM per session&lt;&#x2F;td&gt;&lt;td&gt;Per-tenant kernel isolation; fast boot per session.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Kubernetes-native agent platform&lt;&#x2F;td&gt;&lt;td&gt;Kata Containers&lt;&#x2F;td&gt;&lt;td&gt;Drop-in OCI shape, pod-level VM boundary.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Unknown code from the public internet&lt;&#x2F;td&gt;&lt;td&gt;gVisor at minimum, microVM preferred&lt;&#x2F;td&gt;&lt;td&gt;Syscall surface or hypervisor surface, not yours.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;WASM-shaped pure-function tool&lt;&#x2F;td&gt;&lt;td&gt;Wasmtime with capability-passed I&#x2F;O&lt;&#x2F;td&gt;&lt;td&gt;Fastest sandbox available when the workload fits.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Throwaway batch job, internal only&lt;&#x2F;td&gt;&lt;td&gt;Hardened Docker&lt;&#x2F;td&gt;&lt;td&gt;Stacking layers adds cost without proportional gain.&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;The table assumes Linux. macOS does not have KVM, so Firecracker and gVisor are Linux-host options. On macOS you can run them inside a Linux VM (Lima, OrbStack, Docker Desktop) and pay the nested cost.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-this-layer-does-not-solve&quot;&gt;What this layer does not solve&lt;&#x2F;h2&gt;
&lt;p&gt;Isolation is a necessary condition for a safe agent runtime. It is not a sufficient one. The honest list of what is still on your plate after this post.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Outbound network policy.&lt;&#x2F;strong&gt; &lt;code&gt;--network=none&lt;&#x2F;code&gt; is the right default for the shell tool. The moment you give an agent an &lt;code&gt;http&lt;&#x2F;code&gt; tool, you need an egress policy that distinguishes “the model should reach the public web” from “the model should never reach the internal metadata service.” That is its own design problem.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Persistent state.&lt;&#x2F;strong&gt; A throwaway sandbox forgets everything between calls. Real agents need memory across tool invocations. The design question is which directories survive, who can read them, and what the eviction policy is. We will come back to this in &lt;a href=&quot;#&quot;&gt;post six&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Tool-level policy.&lt;&#x2F;strong&gt; Even inside a perfectly isolated sandbox, you may want certain commands to require human approval. That is a permissions problem, not an isolation problem, and we will spend &lt;a href=&quot;#&quot;&gt;post fourteen&lt;&#x2F;a&gt; on it.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Identity.&lt;&#x2F;strong&gt; A sandboxed container still has to call your tools, your models, and your APIs. Long-lived API keys baked into the image are how production agent fleets get embarrassed. &lt;a href=&quot;#&quot;&gt;Post thirteen&lt;&#x2F;a&gt; puts SPIFFE under this stack.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Time and resource budgets.&lt;&#x2F;strong&gt; &lt;code&gt;--cpus=0.5&lt;&#x2F;code&gt; and &lt;code&gt;--memory=256m&lt;&#x2F;code&gt; cap a single invocation. They do not cap a loop that asks for one hundred invocations. Iteration budgets, token budgets, and wall-clock budgets are a separate fence. &lt;a href=&quot;#&quot;&gt;Post sixteen&lt;&#x2F;a&gt; covers that fence.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;where-this-lands-in-the-platform&quot;&gt;Where this lands in the platform&lt;&#x2F;h2&gt;
&lt;p&gt;You can hold the platform in your head one box at a time, and you can hold the harness on disk one tag at a time. Post one drew the agent loop and shipped it as &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&#x2F;tree&#x2F;post-01&quot;&gt;&lt;code&gt;post-01&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;. This post opened the runtime box, put three concrete things inside it (a hardened container, a user-space kernel, a microVM), picked one, and shipped the result as &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&#x2F;tree&#x2F;post-02&quot;&gt;&lt;code&gt;post-02&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;. The reference architecture in &lt;a href=&quot;#&quot;&gt;post twenty-two&lt;&#x2F;a&gt; will keep this box exactly where it is and treat the contents as a choice that varies by workload.&lt;&#x2F;p&gt;
&lt;p&gt;The rule from post one still holds. Each post adds one layer to the same harness and explains why the layer below was not enough. The harness only ever grows; it does not get rewritten.&lt;&#x2F;p&gt;
&lt;p&gt;The layer below this one was the shell tool itself. The layer above is the rest of the toolbox. An agent with one tool is a demo. An agent with a real tool registry is software. Next we extend the same harness with a real registry, give it a schema, and explain what changes when the model decides between four tools instead of one. That post will ship as &lt;code&gt;post-03&lt;&#x2F;code&gt; in the same repo.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;next&quot;&gt;Next&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Part 3: Tools, How Agents Actually Do Things.&lt;&#x2F;strong&gt; Function calling, structured outputs, schema design that survives model drift, and the failure modes nobody talks about. We extend the post-one agent with a four-tool registry and a tool-selection trace.&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>What an Agent Actually Is</title>
          <pubDate>Tue, 02 Jun 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/what-an-agent-actually-is/</link>
          <guid>https://raskell.io/articles/what-an-agent-actually-is/</guid>
          <description xml:base="https://raskell.io/articles/what-an-agent-actually-is/">&lt;blockquote&gt;
&lt;p&gt;Part 1 of &lt;em&gt;The Agent Platform Handbook. From Loop to Platform.&lt;&#x2F;em&gt; A 22-post series that walks the agent stack from a single loop to a production platform. Next: &lt;a href=&quot;&#x2F;articles&#x2F;your-agent-wants-root&#x2F;&quot;&gt;Your Agent Wants Root&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;If you sit in three meetings on the same day, you will hear the word &lt;em&gt;agent&lt;&#x2F;em&gt; used to mean three different things. A vendor will use it to mean a chat window with a logo. A platform team will use it to mean a workflow with a retry. A security lead will use it to mean a long-running process with credentials you cannot see. None of those people are entirely wrong. None of them are talking about the same thing.&lt;&#x2F;p&gt;
&lt;p&gt;This is the first post in a series that goes from a single agent loop up to a production platform. Before we can talk about isolation, tools, identity, or orchestration, we have to agree on what the thing in the middle of the diagram actually is. So we will define it, sketch its anatomy, trace the path the industry took to get here, and then build a working one in roughly two hundred lines of TypeScript on Bun.&lt;&#x2F;p&gt;
&lt;p&gt;By the end of the post the abstract noun is a concrete file you can run.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-working-definition&quot;&gt;A working definition&lt;&#x2F;h2&gt;
&lt;p&gt;An agent is software with five properties. Drop any one and what you have is something else.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;A goal.&lt;&#x2F;strong&gt; Not a single prompt. A target state, a task description, or a problem statement that the system is trying to satisfy.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;A model.&lt;&#x2F;strong&gt; One or more inference engines that translate the current state plus the goal into a next step. Today this is almost always a large language model.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;A context.&lt;&#x2F;strong&gt; Information the model can read beyond the user input. System prompts, documents, prior turns, environment state, configuration files.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Tools.&lt;&#x2F;strong&gt; A finite, named set of side-effecting operations the model is allowed to invoke. Reading a file. Running a shell command. Calling an API.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;A loop.&lt;&#x2F;strong&gt; The model picks a tool, the tool runs, the result re-enters the context, the model picks again. The loop ends when the goal is satisfied, the budget is exhausted, or a stop condition is hit.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;There is usually a sixth property in production, &lt;strong&gt;memory&lt;&#x2F;strong&gt;, which is state that survives the loop. We will treat memory as a separate layer in &lt;a href=&quot;#&quot;&gt;post six&lt;&#x2F;a&gt; and keep this first agent stateless.&lt;&#x2F;p&gt;
&lt;p&gt;A system with all five properties is an agent. A system with four is something else worth a name. A chat window has goal, model, context, and a loop, but no tools, so it is a chatbot. A CI pipeline has a goal, context, tools, and a loop, but no model, so it is a workflow. A function call has a goal, a model, context, and a tool, but no loop, so it is a one-shot completion. The presence of the loop is what makes the behavior open-ended. The presence of tools is what makes the loop matter.&lt;&#x2F;p&gt;
&lt;p&gt;The architecture diagram for every agent ever shipped looks like this.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;The agent loop&quot;&gt;+--------------+
                          |    Goal      |
                          | (user task)  |
                          +-------+------+
                                  |
                                  v
   +------------+         +---------------+         +------------+
   |  Context   +--------&amp;gt;|     Model     |&amp;lt;--------+   Memory   |
   | docs, env, |         | next action?  |         | (optional) |
   | prior turns|         +-------+-------+         +------------+
   +------------+                 |
                                  | tool call
                                  v
                          +---------------+
                          |   Tool        |
                          | shell, http,  |
                          | file, custom  |
                          +-------+-------+
                                  |
                                  | result
                                  v
                          +---------------+
                          |    Runtime    |
                          | (where the    |
                          |  side effect  |
                          |  happens)     |
                          +-------+-------+
                                  |
                                  v
                       +----------+----------+
                       | done? --no--&amp;gt; loop  |
                       |   |                 |
                       |  yes                |
                       |   v                 |
                       | return              |
                       +---------------------+&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The diagram is small on purpose. Almost every post in this series will redraw it with one component opened up. Post two opens the runtime. Post three opens the tool. Post four opens the context. Post six opens the memory. Post seven opens the model. The capstone in post twenty-two assembles all of them into a reference architecture.&lt;&#x2F;p&gt;
&lt;p&gt;Hold the picture in your head. We will come back to it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-we-got-here&quot;&gt;How we got here&lt;&#x2F;h2&gt;
&lt;p&gt;You can read the rest of this section as background and skip the first two subsections if you have already lived through them. The third subsection, on the 2024 to 2026 stretch, is worth the read either way because it explains why everyone you work with is suddenly using the word &lt;em&gt;agent&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-old-ai&quot;&gt;The old AI&lt;&#x2F;h3&gt;
&lt;p&gt;The idea of a software agent is older than the modern LLM by about fifty years. Terry Winograd’s SHRDLU, in 1970, was a program that took natural-language commands, planned actions in a simulated blocks world, executed them, and updated its internal state. It had a goal, a model of the world, a set of tools, and a loop. It was an agent. It worked beautifully on the blocks world and fell over the instant you stepped outside it, because the model of the world was hand-coded and the language understanding was brittle.&lt;&#x2F;p&gt;
&lt;p&gt;Through the 1980s and 1990s the idea kept coming back under different names. Expert systems wrapped business logic in inference rules. The Belief-Desire-Intention architecture, formalized by Bratman and others, gave agents an explicit cognitive model: what they believed, what they wanted, what they intended to do next. There were good papers, working systems, and shipping products. The thing all of them missed was a usable model of language. You could specify the agent’s goals in formal logic or in a structured DSL. You could not say “go book me a flight” and have it parse.&lt;&#x2F;p&gt;
&lt;p&gt;The lesson from this era is that the architecture has been right for decades. The bottleneck was the model.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-chat-era&quot;&gt;The chat era&lt;&#x2F;h3&gt;
&lt;p&gt;When GPT-3 arrived in 2020 and ChatGPT in late 2022, the language bottleneck broke. You could finally write a goal in English and the system would respond coherently. The first response from the industry was to wrap that capability in a chat window and sell it. That is not an agent. There is no loop, no tools, and the model cannot do anything to the world it lives in. It is a very good autocomplete with a memory of the conversation.&lt;&#x2F;p&gt;
&lt;p&gt;The agent pattern only came back when somebody asked the obvious next question. If the model can read and write English, and we can let it call functions, can we close the loop and let it act?&lt;&#x2F;p&gt;
&lt;p&gt;Two papers and two viral demos answered that question between late 2022 and early 2023.&lt;&#x2F;p&gt;
&lt;p&gt;The papers came first. &lt;em&gt;ReAct: Synergizing Reasoning and Acting in Language Models&lt;&#x2F;em&gt; by Yao and others, published in October 2022, showed that interleaving a reasoning step with a tool-use step produced better performance than either alone. The blueprint was small enough to fit on a napkin: think, act, observe, repeat. It became the de facto pattern for almost every agent shipped since.&lt;&#x2F;p&gt;
&lt;p&gt;The demos came in early 2023. Auto-GPT, released by Toran Bruce Richards in March 2023, wrapped GPT-4 in exactly that loop and let it run unattended. It demoed brilliantly and broke constantly. BabyAGI, by Yohei Nakajima a few weeks later, did the same thing with about a hundred lines of Python. Neither was production-grade. Both made the idea legible to people who had never read the ReAct paper. After Auto-GPT you could explain an agent by saying “imagine ChatGPT but it keeps going until the task is done.” That was a marketing breakthrough, not an engineering one, but marketing is how ideas spread.&lt;&#x2F;p&gt;
&lt;p&gt;OpenAI’s function-calling API, launched in June 2023, was the engineering breakthrough that followed. Instead of trying to parse “the model wants to call &lt;code&gt;search(query)&lt;&#x2F;code&gt;” out of free-form prose, you declared your tools with JSON schemas and the model returned a structured tool call. Anthropic’s &lt;code&gt;tool_use&lt;&#x2F;code&gt; shipped on the same pattern. With function calling, the agent loop stopped being a regex problem and started being a software problem. We will come back to this in &lt;a href=&quot;#&quot;&gt;post three&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The 2023 wave produced the first real frameworks: LangChain in October 2022, AutoGen in late 2023, CrewAI and LangGraph in 2024. They competed on abstractions. Some won, some lost. We will review them in &lt;a href=&quot;#&quot;&gt;post nine&lt;&#x2F;a&gt;. The point for now is that by mid-2024, building an agent was a Python or TypeScript exercise rather than a research project.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-2025-to-2026-bundling&quot;&gt;The 2025 to 2026 bundling&lt;&#x2F;h3&gt;
&lt;p&gt;Here is where the word &lt;em&gt;agent&lt;&#x2F;em&gt; stopped being a research term and became a product category. It happened in two waves.&lt;&#x2F;p&gt;
&lt;p&gt;The first wave was builder-facing. In early 2025, Anthropic shipped Claude Code, a command-line agent that ran on your laptop, edited files, ran shell commands, and could spawn sub-agents. OpenAI relaunched Codex as a real agent rather than a completion endpoint. The open-source community produced OpenCode and a half-dozen adjacent projects (aider, continue, opencoder) that gave the same shape a vendor-neutral surface. None of these were technically novel. The loop was the ReAct loop. The tools were &lt;code&gt;bash&lt;&#x2F;code&gt; and &lt;code&gt;edit&lt;&#x2F;code&gt;. What was new was the bundling: the loop, the tools, and the runtime arrived together, in a binary you could install in thirty seconds. Once that shipped, every infrastructure and platform engineer had a working agent on their laptop within a week.&lt;&#x2F;p&gt;
&lt;p&gt;I wrote about what came next in &lt;a href=&quot;&#x2F;articles&#x2F;what-sixteen-ai-agents-taught-me-about-management&#x2F;&quot;&gt;What Sixteen AI Agents Taught Me About Management&lt;&#x2F;a&gt;. Once builders had working agents, they wanted to run more than one. Once they ran more than one, they hit coordination problems. Once they hit coordination problems, they discovered that agent orchestration is a management problem with a software wrapper.&lt;&#x2F;p&gt;
&lt;p&gt;The second wave was assistant-facing and it broke out of the developer audience entirely. OpenClaw shipped in early 2026 and crossed a hundred thousand GitHub stars in its first week. Hermes followed in the same lane. Both took the agent pattern and gave it to people who do not write code. A research task. A contract review. A calendar negotiation. The same loop, with different tools and different context. The enterprise buyer stopped being the developer tooling team and started being any line of business with a workflow.&lt;&#x2F;p&gt;
&lt;p&gt;That is where the market is right now, as of mid-2026. Every engineering org has builders running coding agents locally. Every business org has someone piloting an assistant agent. Almost no organization has a coherent platform underneath any of it. The point of this series is to close that gap.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;build-one&quot;&gt;Build one&lt;&#x2F;h2&gt;
&lt;p&gt;You learn what an agent is by building one. The code that follows is a complete, working agent in TypeScript on Bun. It is roughly two hundred lines. It reads a goal from the command line, calls Anthropic’s API with a small tool registry, executes tool calls, feeds results back, and stops when the model is done or the iteration budget is exhausted.&lt;&#x2F;p&gt;
&lt;p&gt;You will need three things to follow along.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;bun --version          # 1.1 or newer
echo $ANTHROPIC_API_KEY  # any valid key
bun add @anthropic-ai&amp;#x2F;sdk
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The full source for this post is in &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;the-agent-platform-handbook&quot;&gt;&lt;code&gt;the-agent-platform-handbook&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; at tag &lt;code&gt;post-01&lt;&#x2F;code&gt;. You can clone it and run &lt;code&gt;bun agent.ts &quot;list the three largest files under &#x2F;etc&quot;&lt;&#x2F;code&gt; if you would rather read the code in your editor.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-tool-interface&quot;&gt;The tool interface&lt;&#x2F;h3&gt;
&lt;p&gt;The smallest unit worth defining first is a tool. A tool has a name, a description the model can read, a JSON schema for its input, and a function that runs the side effect and returns a string.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; tools.ts
export type Tool = {
  name: string;
  description: string;
  input_schema: {
    type: &amp;quot;object&amp;quot;;
    properties: Record&amp;lt;string, unknown&amp;gt;;
    required?: string[];
  };
  run: (input: Record&amp;lt;string, unknown&amp;gt;) =&amp;gt; Promise&amp;lt;string&amp;gt;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is the entire contract. The &lt;code&gt;run&lt;&#x2F;code&gt; function returns a string because the model will read the result as text. If your tool returns structured data, serialize it with &lt;code&gt;JSON.stringify&lt;&#x2F;code&gt; and let the model parse it. We will revisit this choice in &lt;a href=&quot;#&quot;&gt;post three&lt;&#x2F;a&gt; when we look at structured outputs more carefully.&lt;&#x2F;p&gt;
&lt;p&gt;A shell tool, which is the most useful single tool you can give an agent, looks like this.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; tools.ts (continued)
export const shell: Tool = {
  name: &amp;quot;shell&amp;quot;,
  description:
    &amp;quot;Run a shell command in the current working directory and return its stdout, stderr, and exit code as JSON.&amp;quot;,
  input_schema: {
    type: &amp;quot;object&amp;quot;,
    properties: {
      command: {
        type: &amp;quot;string&amp;quot;,
        description: &amp;quot;The shell command to run. Runs under `sh -c`.&amp;quot;,
      },
    },
    required: [&amp;quot;command&amp;quot;],
  },
  run: async ({ command }) =&amp;gt; {
    const proc = Bun.spawn([&amp;quot;sh&amp;quot;, &amp;quot;-c&amp;quot;, String(command)], {
      stdout: &amp;quot;pipe&amp;quot;,
      stderr: &amp;quot;pipe&amp;quot;,
    });
    const [stdout, stderr] = await Promise.all([
      new Response(proc.stdout).text(),
      new Response(proc.stderr).text(),
    ]);
    const code = await proc.exited;
    return JSON.stringify({ code, stdout, stderr });
  },
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A few things to notice. The description is written for the model, not for you. Be specific about what the tool does, what it returns, and what its constraints are. The JSON schema is the contract the model sees when it decides whether to call this tool. Bun’s &lt;code&gt;Bun.spawn&lt;&#x2F;code&gt; returns streams, so we read both stdout and stderr and join them with the exit code into one JSON payload the model can reason about.&lt;&#x2F;p&gt;
&lt;p&gt;This tool will run any shell command. That is wildly unsafe. Do not deploy it. We will spend &lt;a href=&quot;#&quot;&gt;post two&lt;&#x2F;a&gt; explaining why and what to do about it. For a single-user, single-directory demo on your own laptop, it is fine.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-loop&quot;&gt;The loop&lt;&#x2F;h3&gt;
&lt;p&gt;The loop is the part most explanations rush. Take it slowly.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; agent.ts
import Anthropic from &amp;quot;@anthropic-ai&amp;#x2F;sdk&amp;quot;;
import type { MessageParam, ToolResultBlockParam } from &amp;quot;@anthropic-ai&amp;#x2F;sdk&amp;#x2F;resources&amp;#x2F;messages&amp;quot;;
import { shell, type Tool } from &amp;quot;.&amp;#x2F;tools&amp;quot;;

const client = new Anthropic();
const tools: Tool[] = [shell];

const SYSTEM_PROMPT = `You are a careful command-line assistant.
You have access to a shell tool. Use it to investigate the user&amp;#x27;s
request and answer concretely. When you have the answer, stop calling
tools and reply in plain text.`;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;We pull in the SDK, register one tool, and write a system prompt. The system prompt is the first piece of context that is not user input. We will spend &lt;a href=&quot;#&quot;&gt;post four&lt;&#x2F;a&gt; on how to grow this into a real context strategy. For now it is a single paragraph telling the model how to behave.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; agent.ts (continued)
async function step(messages: MessageParam[]) {
  return client.messages.create({
    model: &amp;quot;claude-sonnet-4-6&amp;quot;,
    max_tokens: 4096,
    system: SYSTEM_PROMPT,
    tools: tools.map(({ run, ...t }) =&amp;gt; t),
    messages,
  });
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Every call to the model is a &lt;code&gt;step&lt;&#x2F;code&gt;. The tool list we send is the schema only, never the &lt;code&gt;run&lt;&#x2F;code&gt; function, so we strip it. The model returns a message that is either a final text answer or a request to call one or more tools.&lt;&#x2F;p&gt;
&lt;p&gt;The loop itself is fifty lines.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; agent.ts (continued)
export async function run(goal: string, maxIterations = 10) {
  const messages: MessageParam[] = [{ role: &amp;quot;user&amp;quot;, content: goal }];

  for (let i = 0; i &amp;lt; maxIterations; i++) {
    const response = await step(messages);
    messages.push({ role: &amp;quot;assistant&amp;quot;, content: response.content });

    if (response.stop_reason === &amp;quot;end_turn&amp;quot;) {
      const text = response.content
        .filter((b) =&amp;gt; b.type === &amp;quot;text&amp;quot;)
        .map((b) =&amp;gt; (b as { text: string }).text)
        .join(&amp;quot;\n&amp;quot;);
      console.log(text);
      return;
    }

    if (response.stop_reason !== &amp;quot;tool_use&amp;quot;) {
      throw new Error(`unexpected stop reason: ${response.stop_reason}`);
    }

    const toolResults: ToolResultBlockParam[] = [];
    for (const block of response.content) {
      if (block.type !== &amp;quot;tool_use&amp;quot;) continue;
      const tool = tools.find((t) =&amp;gt; t.name === block.name);
      if (!tool) {
        toolResults.push({
          type: &amp;quot;tool_result&amp;quot;,
          tool_use_id: block.id,
          content: `unknown tool: ${block.name}`,
          is_error: true,
        });
        continue;
      }
      console.error(`&amp;gt; ${block.name} ${JSON.stringify(block.input)}`);
      try {
        const result = await tool.run(block.input as Record&amp;lt;string, unknown&amp;gt;);
        toolResults.push({ type: &amp;quot;tool_result&amp;quot;, tool_use_id: block.id, content: result });
      } catch (err) {
        toolResults.push({
          type: &amp;quot;tool_result&amp;quot;,
          tool_use_id: block.id,
          content: String(err),
          is_error: true,
        });
      }
    }

    messages.push({ role: &amp;quot;user&amp;quot;, content: toolResults });
  }

  console.error(`iteration limit (${maxIterations}) reached`);
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Walk through it once. We start with the goal as the first user message. We call the model. We append the model’s response to the conversation. If the model says &lt;code&gt;end_turn&lt;&#x2F;code&gt;, we print the final text and return. If the model says &lt;code&gt;tool_use&lt;&#x2F;code&gt;, we find the requested tool, run it, and append the result as a user-role &lt;code&gt;tool_result&lt;&#x2F;code&gt; block. The model sees the result on its next turn and decides what to do.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;for&lt;&#x2F;code&gt; loop with &lt;code&gt;maxIterations&lt;&#x2F;code&gt; is the only thing standing between this agent and an unbounded run. We will spend a chunk of &lt;a href=&quot;#&quot;&gt;post sixteen&lt;&#x2F;a&gt; on whether iteration counts are the right budget unit. They are not, but they are the simplest. Start with the simplest.&lt;&#x2F;p&gt;
&lt;p&gt;The entry point is two lines.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;typescript&quot; class=&quot;language-typescript &quot;&gt;&lt;code class=&quot;language-typescript&quot; data-lang=&quot;typescript&quot;&gt;&amp;#x2F;&amp;#x2F; agent.ts (continued)
const goal = process.argv.slice(2).join(&amp;quot; &amp;quot;);
if (!goal) {
  console.error(&amp;quot;usage: bun agent.ts &amp;#x27;&amp;lt;goal&amp;gt;&amp;#x27;&amp;quot;);
  process.exit(1);
}
await run(goal);
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is the entire agent. Around one hundred and forty lines of code with the imports and the system prompt. It is a real agent by the definition we started with. It has a goal, a model, context (the system prompt and the running message list), tools (one of them), and a loop (the &lt;code&gt;for&lt;&#x2F;code&gt; with &lt;code&gt;maxIterations&lt;&#x2F;code&gt;).&lt;&#x2F;p&gt;
&lt;h3 id=&quot;running-it&quot;&gt;Running it&lt;&#x2F;h3&gt;
&lt;p&gt;A small transcript makes the loop concrete.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;$ bun agent.ts &amp;quot;what is the largest TypeScript file under src and what does it do?&amp;quot;
&amp;gt; shell {&amp;quot;command&amp;quot;:&amp;quot;find src -name &amp;#x27;*.ts&amp;#x27; -type f -printf &amp;#x27;%s %p\n&amp;#x27; | sort -nr | head -5&amp;quot;}
&amp;gt; shell {&amp;quot;command&amp;quot;:&amp;quot;wc -l src&amp;#x2F;agent.ts&amp;quot;}
&amp;gt; shell {&amp;quot;command&amp;quot;:&amp;quot;head -40 src&amp;#x2F;agent.ts&amp;quot;}
The largest TypeScript file under src is src&amp;#x2F;agent.ts at 4837 bytes
(150 lines). It implements an agent loop against the Anthropic Messages
API. It defines a single shell tool, sends the user&amp;#x27;s goal to the model,
executes any tool calls the model requests, feeds the results back, and
stops when the model returns an end_turn response or the iteration
budget of 10 steps is exhausted.
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The lines prefixed with &lt;code&gt;&amp;gt;&lt;&#x2F;code&gt; are the agent’s tool calls, logged from the &lt;code&gt;console.error&lt;&#x2F;code&gt; in the loop. The final paragraph is the model’s &lt;code&gt;end_turn&lt;&#x2F;code&gt; text. Three tool calls, one final answer, one closed loop.&lt;&#x2F;p&gt;
&lt;p&gt;You have an agent.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-most-agent-demos-are-not-agents&quot;&gt;Why most agent demos are not agents&lt;&#x2F;h2&gt;
&lt;p&gt;Now that the definition is concrete and you have a working example, the question of what is and is not an agent becomes useful. The honest version of the answer is in the table.&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;System&lt;&#x2F;th&gt;&lt;th style=&quot;text-align: center&quot;&gt;Goal&lt;&#x2F;th&gt;&lt;th style=&quot;text-align: center&quot;&gt;Model&lt;&#x2F;th&gt;&lt;th style=&quot;text-align: center&quot;&gt;Context&lt;&#x2F;th&gt;&lt;th style=&quot;text-align: center&quot;&gt;Tools&lt;&#x2F;th&gt;&lt;th style=&quot;text-align: center&quot;&gt;Loop&lt;&#x2F;th&gt;&lt;th&gt;Verdict&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;ChatGPT, 2022 launch&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;N&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td&gt;Chatbot&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;ChatGPT today (browse, code, files)&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td&gt;Agent&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Cursor’s inline edit&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;N&lt;&#x2F;td&gt;&lt;td&gt;Completion&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;A GitHub Actions workflow&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;N&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td&gt;Workflow&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;A pure retrieval-augmented chat&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;N&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td&gt;Chatbot+RAG&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;A function-calling API call&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;N&lt;&#x2F;td&gt;&lt;td&gt;Tool call&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Claude Code in the terminal&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td&gt;Agent&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;The script in this post&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td&gt;Agent&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Auto-GPT, BabyAGI&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td&gt;Agent&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;OpenClaw&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td&gt;Agent&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;A Temporal workflow with an LLM step&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td style=&quot;text-align: center&quot;&gt;Y&lt;&#x2F;td&gt;&lt;td&gt;Agent&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;The first two rows are the same product two years apart. ChatGPT launched in November 2022 as a chat window with no tools. Today the same web UI can browse the live web, run Python in a sandboxed runtime, convert and read PDFs, and generate images. It crossed the line. The interesting thing is that the user-facing language did not change. It is still called a chatbot. Under the definition we are working with, it is an agent. The boundary moved underneath the marketing.&lt;&#x2F;p&gt;
&lt;p&gt;That is the pattern to expect across the industry over the next two years. Every chat product will quietly grow tools. Every workflow product will quietly grow a model. The labels will lag the architecture by a year or two. If you only listen to the labels, you will miss the moment a system becomes something you should be governing differently.&lt;&#x2F;p&gt;
&lt;p&gt;The pure RAG row is the one that does not cross. A retrieval-augmented chat that only enriches its own context does not get tools in the sense that matters. The model cannot decide to do something different based on what it finds. It just answers from a richer context. Useful, sometimes excellent, not an agent. The moment that same system gains a “search again with a different query” or “fetch this document” tool, it is.&lt;&#x2F;p&gt;
&lt;p&gt;The Temporal-with-LLM row is interesting for the opposite reason. That is an agent. It is also a workflow. The two categories are not mutually exclusive. &lt;a href=&quot;#&quot;&gt;Post seventeen&lt;&#x2F;a&gt; will explain why the production version of almost every agent ends up wrapped in a durable workflow.&lt;&#x2F;p&gt;
&lt;p&gt;The point of the table is not to gatekeep the word. It is to point at the conceptual distinction. A chatbot fails by being uninformed. An agent fails by acting wrong. Once your system can act, the failure mode changes, and so does what you owe the operator.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;failure-modes&quot;&gt;Failure modes&lt;&#x2F;h2&gt;
&lt;p&gt;The script above will work the first ten times you run it. Some of the failure modes it has are obvious. Some you only see in production. The honest list, in roughly the order you will hit them.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Infinite loops disguised as progress.&lt;&#x2F;strong&gt; The model can call tools forever without converging. &lt;code&gt;maxIterations = 10&lt;&#x2F;code&gt; saves you from your own demo today. It will not save you when somebody asks a harder question tomorrow. Real budgets are token-based or wall-clock-based, not step-based. We will fix this properly in &lt;a href=&quot;#&quot;&gt;post sixteen&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Cost runaway.&lt;&#x2F;strong&gt; Every iteration is a full model call with the entire conversation as input. The token cost grows roughly quadratically with the number of steps because each new turn re-sends every prior turn. A ten-step loop costs more than ten times a one-step call. Prompt caching helps. Smaller models for the cheap turns help more. We will spend &lt;a href=&quot;#&quot;&gt;post eighteen&lt;&#x2F;a&gt; on this.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Lying about success.&lt;&#x2F;strong&gt; The model can claim it completed the task without actually verifying. The shell tool does not check whether the answer is correct. You can ask “did you do X?” and get “yes” when the truth is “I tried, it failed, I gave up.” Evals exist for this. &lt;a href=&quot;#&quot;&gt;Post sixteen&lt;&#x2F;a&gt; explains them.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Broken tool output.&lt;&#x2F;strong&gt; If a tool returns a megabyte of binary data, the model will choke or hallucinate. Tool outputs need to be summarized, truncated, or schema-shaped. The agent will not do this for you.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;No isolation.&lt;&#x2F;strong&gt; The shell tool will run &lt;code&gt;rm -rf $HOME&lt;&#x2F;code&gt; if the model decides that is the right command. The model will not decide this often. It will decide it at least once. That is the topic of the next post.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Hidden state.&lt;&#x2F;strong&gt; Every tool call has side effects on the runtime. Files get created. Network calls get made. Database rows get inserted. The conversation log does not capture any of it. You can replay the model’s reasoning. You cannot replay the world it was reasoning about. Idempotency and durable execution are how serious systems handle this.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Concurrency edge cases.&lt;&#x2F;strong&gt; This loop is sequential. The model can ask for multiple tools in one turn, and the example above runs them in order. If two of those tools both want to write the same file, you have a bug the model cannot see.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;None of these are reasons not to build an agent. They are the reasons the rest of this series exists.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-this-post-left-out-on-purpose&quot;&gt;What this post left out on purpose&lt;&#x2F;h2&gt;
&lt;p&gt;A long post still leaves things out. The intentional omissions, with pointers to the posts that pick them up.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Isolation.&lt;&#x2F;strong&gt; The shell tool is dangerous. Fixed in &lt;a href=&quot;&#x2F;articles&#x2F;your-agent-wants-root&#x2F;&quot;&gt;post two: Your Agent Wants Root&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;A real tool registry.&lt;&#x2F;strong&gt; One tool is the demo. A useful agent has four to a dozen. Covered in &lt;a href=&quot;#&quot;&gt;post three&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Context strategy.&lt;&#x2F;strong&gt; The system prompt is a paragraph. Real systems load context from disk and shape it per task. Covered in &lt;a href=&quot;#&quot;&gt;post four&lt;&#x2F;a&gt; and &lt;a href=&quot;#&quot;&gt;post five&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Memory across sessions.&lt;&#x2F;strong&gt; The agent forgets everything between runs. Covered in &lt;a href=&quot;#&quot;&gt;post six&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Model selection.&lt;&#x2F;strong&gt; We hardcoded one model. Production agents route. Covered in &lt;a href=&quot;#&quot;&gt;post seven&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Tool protocol.&lt;&#x2F;strong&gt; The tool interface is bespoke. The industry converged on MCP. Covered in &lt;a href=&quot;#&quot;&gt;post eight&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Frameworks.&lt;&#x2F;strong&gt; We built this raw. The framework conversation is in &lt;a href=&quot;#&quot;&gt;post nine&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Identity, evals, durability, economics, governance.&lt;&#x2F;strong&gt; All of arc four and five.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;If you read the table of contents for the series you can see the shape: every post in the rest of the series fixes one limitation of the agent you just built.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-this-lands-in-the-platform&quot;&gt;Where this lands in the platform&lt;&#x2F;h2&gt;
&lt;p&gt;The diagram at the top of this post is the smallest box in the eventual reference architecture. The agent loop sits inside a runtime. The runtime sits inside a fleet. The fleet sits inside a platform with identity, observability, governance, and a control plane. The shape we end up with in &lt;a href=&quot;#&quot;&gt;post twenty-two&lt;&#x2F;a&gt; is the same shape we started with, blown up to enterprise scale, with every component opened up and given its own production-grade story.&lt;&#x2F;p&gt;
&lt;p&gt;You can hold the whole series in your head with a single rule. Each post adds one layer of the stack and explains why the layer below was not enough on its own.&lt;&#x2F;p&gt;
&lt;p&gt;The next layer up is the runtime. The shell tool you just gave a frontier model is a loaded gun. Next week we explain how to point it somewhere safe.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;next&quot;&gt;Next&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Part 2: &lt;a href=&quot;&#x2F;articles&#x2F;your-agent-wants-root&#x2F;&quot;&gt;Your Agent Wants Root&lt;&#x2F;a&gt;.&lt;&#x2F;strong&gt; Why a Docker container is not enough, and what Firecracker, gVisor, and Kata actually solve. Publishes Thursday.&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>The Field Exists Now</title>
          <pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/the-field-exists-now/</link>
          <guid>https://raskell.io/articles/the-field-exists-now/</guid>
          <description xml:base="https://raskell.io/articles/the-field-exists-now/">&lt;p&gt;Alasdair Allan, the creator of &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;veralang.dev&#x2F;&quot;&gt;Vera&lt;&#x2F;a&gt;, has started a public catalogue at &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;agentlanguages.dev&#x2F;&quot;&gt;agentlanguages.dev&lt;&#x2F;a&gt;: programming languages designed for AI agents to write.&lt;&#x2F;p&gt;
&lt;p&gt;When he mentioned it to me on LinkedIn, the count was twenty-one. When I checked, the site already listed twenty-eight. That pace is the point.&lt;&#x2F;p&gt;
&lt;p&gt;Six months ago, the idea that programming languages would bend around AI authorship still sounded like a late-night compiler conversation. Now there is a catalogue, a taxonomy, and enough independent projects to argue about the shape of the field without pretending the field is hypothetical.&lt;&#x2F;p&gt;
&lt;p&gt;That is a real change.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-taxonomy-is-better-than-my-timeline&quot;&gt;The taxonomy is better than my timeline&lt;&#x2F;h2&gt;
&lt;p&gt;In &lt;a href=&quot;&#x2F;articles&#x2F;what-programming-languages-become-when-ai-writes-the-code&#x2F;&quot;&gt;The Last Programming Language Might Not Be for Humans&lt;&#x2F;a&gt;, I described three futures:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;explicit languages stripped of ambiguity&lt;&#x2F;li&gt;
&lt;li&gt;declarative languages where types act as proof obligations&lt;&#x2F;li&gt;
&lt;li&gt;no language at all, where the prompt is the specification and the executable is the product&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;I still think that direction of travel is useful. The long arc points away from source code as the thing humans primarily edit, and toward artifacts that agents generate, compilers check, runtimes execute, and other agents can trust.&lt;&#x2F;p&gt;
&lt;p&gt;But Alasdair’s taxonomy is cleaner for describing the field as it exists today.&lt;&#x2F;p&gt;
&lt;p&gt;He splits it into three camps:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;syntactic, where the problem is representational ambiguity&lt;&#x2F;li&gt;
&lt;li&gt;verification, where the problem is semantic correctness&lt;&#x2F;li&gt;
&lt;li&gt;orchestration, where the problem is agent coordination&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That framing is less linear and more honest. Vera and my Haskell work are not step one and step two. They are two near-term answers to the same pressure.&lt;&#x2F;p&gt;
&lt;p&gt;That correction matters because it changes what we should measure.&lt;&#x2F;p&gt;
&lt;p&gt;If the field is a timeline, the question is which phase arrives next. If the field is a set of camps, the question is which diagnosis is right for which workload.&lt;&#x2F;p&gt;
&lt;p&gt;That is a much better question.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;vera-and-bhc-are-adjacent-not-sequential&quot;&gt;Vera and BHC are adjacent, not sequential&lt;&#x2F;h2&gt;
&lt;p&gt;One useful correction: Raskell is not the project. Raskell is this blog.&lt;&#x2F;p&gt;
&lt;p&gt;The relevant projects are &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;arcanist.sh&#x2F;hx&#x2F;&quot;&gt;hx&lt;&#x2F;a&gt; and BHC, the Basel Haskell Compiler, both under &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;arcanist.sh&#x2F;&quot;&gt;arcanist.sh&lt;&#x2F;a&gt;. hx is the tooling layer. BHC is the compiler layer.&lt;&#x2F;p&gt;
&lt;p&gt;That distinction matters because the bet is not “invent Raskell as a new AI language.” The bet is that Haskell already has much of the semantic shape we need, and that the missing layer is tooling, runtime control, diagnostics, and compilation strategy.&lt;&#x2F;p&gt;
&lt;p&gt;Vera takes a more bespoke path. It asks what a language should look like if it is designed for models from first principles. That leads to mandatory contracts, typed effects, solver-backed verification, and De Bruijn-style slot references instead of variable names.&lt;&#x2F;p&gt;
&lt;p&gt;That is a serious design. I like it because it does not just say “AI will write code” and stop there. It changes the language around that claim.&lt;&#x2F;p&gt;
&lt;p&gt;BHC starts from the other side.&lt;&#x2F;p&gt;
&lt;p&gt;It assumes Haskell’s purity, type system, and compositional style are already close to the right substrate for AI-written software. The weak points are around the compiler and the experience around it: how fast the loop is, how clear the errors are, how reproducible the build is, how explicit the runtime profile is, and eventually how much semantic information survives into numeric and GPU-oriented lowering.&lt;&#x2F;p&gt;
&lt;p&gt;So I do not think Vera and BHC compete for the same idea. They may compete for attention, mindshare, or some future market. But intellectually they are adjacent.&lt;&#x2F;p&gt;
&lt;p&gt;Vera says: design the language around the agent.&lt;&#x2F;p&gt;
&lt;p&gt;BHC says: make the semantically rich language operationally good enough for the agent era.&lt;&#x2F;p&gt;
&lt;p&gt;Both are verification-camp bets.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-the-verification-camp-is-crowded&quot;&gt;Why the verification camp is crowded&lt;&#x2F;h2&gt;
&lt;p&gt;The most interesting thing in the catalogue is how crowded the verification camp is.&lt;&#x2F;p&gt;
&lt;p&gt;That should not surprise us, but it still does.&lt;&#x2F;p&gt;
&lt;p&gt;Once you work with coding agents every day, the failure mode becomes obvious. The problem is not that the model cannot type syntax. It can type syntax just fine. The problem is that plausible code is cheap, and plausible code is often wrong.&lt;&#x2F;p&gt;
&lt;p&gt;That changes the job of the language.&lt;&#x2F;p&gt;
&lt;p&gt;A language for AI-authored code does not need to make the model feel clever. It needs to make wrongness cheap to detect. Ideally before runtime. Ideally before deployment. Ideally in a form the model can consume and repair.&lt;&#x2F;p&gt;
&lt;p&gt;That is why contracts, effects, refinement types, strong type systems, proof export, SMT solvers, and structured compiler diagnostics keep showing up. These are not aesthetic choices. They are different ways of building a feedback loop around a generator that will always be probabilistic.&lt;&#x2F;p&gt;
&lt;p&gt;The generator does not need to be trusted. The artifact needs to be checkable.&lt;&#x2F;p&gt;
&lt;p&gt;That sentence is probably the center of the whole field.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-market-is-less-interesting-than-the-artifact&quot;&gt;The market is less interesting than the artifact&lt;&#x2F;h2&gt;
&lt;p&gt;There will be a temptation to turn this into a market map too early.&lt;&#x2F;p&gt;
&lt;p&gt;Which language wins? Which toolchain gets adopted? Which one has the most stars? Which one gets bundled into an IDE or agent framework first?&lt;&#x2F;p&gt;
&lt;p&gt;Those questions matter eventually. They do not matter most right now.&lt;&#x2F;p&gt;
&lt;p&gt;The important question is what the unit of software becomes when the author is no longer primarily human. My answer, in &lt;a href=&quot;&#x2F;articles&#x2F;semantic-artifacts-and-meaning-engines&#x2F;&quot;&gt;Source Code Is the New Assembly&lt;&#x2F;a&gt;, was that source code becomes one rendering of a richer semantic artifact.&lt;&#x2F;p&gt;
&lt;p&gt;The agentlanguages.dev catalogue strengthens that view.&lt;&#x2F;p&gt;
&lt;p&gt;The syntactic camp is trying to make the text representation easier for models to manipulate. The verification camp is trying to make the artifact easier to prove correct. The orchestration camp is trying to make agent work easier to sequence, constrain, and audit.&lt;&#x2F;p&gt;
&lt;p&gt;Those are not disconnected concerns. They are different surfaces of the same object.&lt;&#x2F;p&gt;
&lt;p&gt;If agents are going to generate software that matters, the artifact has to carry more than instructions. It has to carry constraints, effects, provenance, trust boundaries, and execution intent. Source code can be part of that. It cannot remain the whole thing.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-bhc-fits&quot;&gt;Where BHC fits&lt;&#x2F;h2&gt;
&lt;p&gt;BHC’s long-term target is not just “AI writes Haskell.”&lt;&#x2F;p&gt;
&lt;p&gt;That is too small.&lt;&#x2F;p&gt;
&lt;p&gt;The more interesting target is verifiable compute expressed through a functional, semantically dense language and lowered into the right execution profile. Sometimes that means ordinary native code. Sometimes WebAssembly. Eventually, for the numeric profile, it should mean GPU-oriented compute where the compiler can preserve enough structure to reason about what is being executed.&lt;&#x2F;p&gt;
&lt;p&gt;This is the part that keeps pulling me back to Haskell.&lt;&#x2F;p&gt;
&lt;p&gt;Functional purity is not just beautiful. It is useful. It makes dependencies visible. It makes transformations local. It reduces the hidden state an agent has to simulate. It gives the compiler more leverage. It gives verification machinery more structure.&lt;&#x2F;p&gt;
&lt;p&gt;And Haskell is verbose in exactly the right places. Not verbose like boilerplate. Verbose like meaning. Type signatures, algebraic data types, explicit transformations, pure functions. These are things an agent can generate, a compiler can check, and a human can audit later when something matters enough to read.&lt;&#x2F;p&gt;
&lt;p&gt;That is the bet behind BHC and hx.&lt;&#x2F;p&gt;
&lt;p&gt;Not a new language for its own sake. Not Haskell nostalgia. A belief that semantic density becomes more valuable when generation gets cheap.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-the-catalogue-proves&quot;&gt;What the catalogue proves&lt;&#x2F;h2&gt;
&lt;p&gt;The catalogue does not prove which camp is right.&lt;&#x2F;p&gt;
&lt;p&gt;It proves the pressure is real.&lt;&#x2F;p&gt;
&lt;p&gt;Independent people arrived in the same neighborhood at roughly the same time, using different words and different tools. That usually means the underlying constraint changed. In this case, the constraint is authorship.&lt;&#x2F;p&gt;
&lt;p&gt;The old language design question was: what can humans write, read, and maintain?&lt;&#x2F;p&gt;
&lt;p&gt;The new question is: what can agents generate, compilers verify, runtimes constrain, and humans audit when needed?&lt;&#x2F;p&gt;
&lt;p&gt;That does not make human readability irrelevant. It changes its position. Human readability becomes part of auditability, not necessarily the primary authoring constraint.&lt;&#x2F;p&gt;
&lt;p&gt;That is the shift agentlanguages.dev makes visible.&lt;&#x2F;p&gt;
&lt;p&gt;The field exists now. It has camps. It has disagreements. It has projects with code, projects with papers, projects with benchmarks, and projects that are still mostly intent. That is fine. Early fields are messy.&lt;&#x2F;p&gt;
&lt;p&gt;The useful thing is that we can stop arguing about whether the category is real and start arguing about which claims survive measurement.&lt;&#x2F;p&gt;
&lt;p&gt;That is where the work gets interesting.&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>Source Code Is the New Assembly</title>
          <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/semantic-artifacts-and-meaning-engines/</link>
          <guid>https://raskell.io/articles/semantic-artifacts-and-meaning-engines/</guid>
          <description xml:base="https://raskell.io/articles/semantic-artifacts-and-meaning-engines/">&lt;p&gt;In &lt;a href=&quot;&#x2F;articles&#x2F;what-programming-languages-become-when-ai-writes-the-code&#x2F;&quot;&gt;The Last Programming Language Might Not Be for Humans&lt;&#x2F;a&gt;, I argued that if AI becomes the primary author of code, the source language has to bend around that author. In &lt;a href=&quot;&#x2F;articles&#x2F;what-comes-after-the-last-programming-language&#x2F;&quot;&gt;What Comes After the Last Programming Language&lt;&#x2F;a&gt;, I extended the argument one layer down: even the operating system still assumes a human is writing CPU-centric instruction streams, and that assumption has an expiration date.&lt;&#x2F;p&gt;
&lt;p&gt;I left the deepest move for last on purpose.&lt;&#x2F;p&gt;
&lt;p&gt;If languages are bending and operating systems are bending, it is because something more fundamental is bending. The unit of software is changing. The thing we treat as authoritative, version, review, ship, replay, and audit, is on its way to becoming something other than a tree of source files.&lt;&#x2F;p&gt;
&lt;p&gt;I want to make that move explicitly. This is the post where I plant the flag.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-am-actually-claiming&quot;&gt;What I am actually claiming&lt;&#x2F;h2&gt;
&lt;p&gt;The clearest way I can put it is this: programming languages were never the destination. They were a writing system for an era when humans had to author the syntax. They survived because humans were the bottleneck, and the machine had to be addressed through a human-legible medium. That arrangement is starting to come apart.&lt;&#x2F;p&gt;
&lt;p&gt;Once generation is cheap, syntax is not the scarce resource anymore. The scarce resource is semantic stability. What I want is not a better language for models to type. I want an artifact that preserves meaning across generation, verification, lowering, execution, audit, and replay.&lt;&#x2F;p&gt;
&lt;p&gt;I will call that thing a &lt;strong&gt;semantic artifact&lt;&#x2F;strong&gt;, and I will call the machine around it a &lt;strong&gt;meaning engine&lt;&#x2F;strong&gt;. Both terms exist in adjacent literatures already. I am using them deliberately, not as branding.&lt;&#x2F;p&gt;
&lt;p&gt;A semantic artifact is the new unit of software. The meaning engine is what compilers and runtimes become when they accept those units. Source code does not vanish in this picture. It demotes. It survives the way assembly survives today: indispensable, specialized, and no longer the layer most people author.&lt;&#x2F;p&gt;
&lt;p&gt;That is the full claim. The rest of this post is the case for it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;programming-languages-were-always-writing-systems&quot;&gt;Programming languages were always writing systems&lt;&#x2F;h2&gt;
&lt;p&gt;There is a reflex in our field to treat programming languages as if they were a category unto themselves. They are not. They are a specialized branch of writing.&lt;&#x2F;p&gt;
&lt;p&gt;Writing, in linguistics, is a technology for making language visible and durable. It is a system of conventional marks that encodes linguistic structure in a persistent medium. It is not thought itself. It is not even speech. It is a representation that allows institutions, tools, and machines to act on language across time and space.&lt;&#x2F;p&gt;
&lt;p&gt;Programming languages fit this definition without strain. They are notation systems that encode executable intent in marks a compiler can consume. Like every writing system, they were shaped by the dominant author and the dominant medium of their era.&lt;&#x2F;p&gt;
&lt;p&gt;Assembly is the closest thing we have to a logographic system for computation. Each opcode points at one specific machine action. There is no abstraction between the symbol and the execution. C is structured prose for systems work, designed when the author was a competent human engineer and the target was a single-CPU machine with byte-addressable memory. Python is a writing system tuned for readability and velocity, designed for an author who would rather get to the point than negotiate with the type system. Haskell pushes in the other direction, toward formal logic notation, where the writing system itself encodes proof obligations.&lt;&#x2F;p&gt;
&lt;p&gt;Every one of these languages is a compromise between precision and expressiveness. None of them was final. Each was tuned to what humans could reasonably author and what compilers could reasonably lower.&lt;&#x2F;p&gt;
&lt;p&gt;Once you see this clearly, the historical contingency becomes obvious. Source code is not sacred. It is a notation layer that won because humans had to write something, and machines had to lower something. It became the universal artifact of software because the author was human and the medium was text. Change the author, and the universal artifact has no reason to stay text.&lt;&#x2F;p&gt;
&lt;p&gt;This is the same observation &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.unison-lang.org&#x2F;&quot;&gt;Unison&lt;&#x2F;a&gt; has been making in production for years, just from a different angle. Unison stores definitions by the hash of their structured form, not by filename. Text is a view, not the artifact. The system has not gone post-code. It has gone post-text, while keeping code as the underlying object. That is a useful first step, and it is much more than a curiosity. It is a working proof that the source file is not load-bearing in the way our tools pretend it is.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;code-is-a-lossy-compression-of-intent&quot;&gt;Code is a lossy compression of intent&lt;&#x2F;h2&gt;
&lt;p&gt;The thing programmers do, when we write code, is compression. We take a high-dimensional, contextual, often-ambiguous understanding of what the system should do, and we squeeze it into a rigid, linear, explicit sequence of statements that a compiler will accept.&lt;&#x2F;p&gt;
&lt;p&gt;The compiler then performs a different compression, from our notation down to machine code. By the time the binary runs, several lossy passes have happened. Assumptions live in our heads or in a README. Side effects live implicitly in function calls. The reasons we made certain decisions live in commit messages, ticket systems, or, more often, nowhere. The artifact that runs is missing most of the meaning that produced it.&lt;&#x2F;p&gt;
&lt;p&gt;We know this is lossy because we spend enormous effort patching the loss. We write tests to recover behavioral intent. We write documentation to recover design intent. We write architecture decision records to recover historical intent. We write runbooks to recover operational intent. We sprinkle assertions to recover invariants we could not express in the type system. We add observability to recover what the running program is actually doing because the source did not say.&lt;&#x2F;p&gt;
&lt;p&gt;Programming, in this view, is manual compression of intent into executable form. It worked because the human author could hold the uncompressed version in their head while typing the compressed version. It also worked because the compressed version was good enough to ship and the uncompressed version did not have to survive review.&lt;&#x2F;p&gt;
&lt;p&gt;AI changes the economics of that compression. Generation gets cheap. Expansion from a few sentences to thousands of lines of code is no longer the hard part. What gets relatively more expensive is the part that was always implicit: deciding which version is correct, which assumptions are still valid, which constraints must hold, which environments will give the same result, which side effects are allowed.&lt;&#x2F;p&gt;
&lt;p&gt;In a world with cheap generation, the uncompressed version is what we should be authoring. Not the compressed one.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-a-semantic-artifact-actually-is&quot;&gt;What a semantic artifact actually is&lt;&#x2F;h2&gt;
&lt;p&gt;A semantic artifact is not a fancier source file. It is not “code with metadata.” It is a different kind of object, with different properties, and the difference matters.&lt;&#x2F;p&gt;
&lt;p&gt;A semantic artifact is &lt;strong&gt;intent-first&lt;&#x2F;strong&gt;. It says what must hold, not how. It can carry “how” as one of several lowerings, but the upper layer is the obligation, not the implementation.&lt;&#x2F;p&gt;
&lt;p&gt;It is &lt;strong&gt;structured&lt;&#x2F;strong&gt;. It is a typed graph of entities, constraints, transformations, effects, evidence, and provenance. Text is one rendering. JSON is another. A diagram is another. None of them is the artifact.&lt;&#x2F;p&gt;
&lt;p&gt;It is &lt;strong&gt;content-addressed&lt;&#x2F;strong&gt;. Identity is structural, derived from the artifact itself, not from a filename or a path. Two artifacts with the same meaning have the same identity. This is the lesson Unison teaches and the lesson the rest of the industry has not fully absorbed.&lt;&#x2F;p&gt;
&lt;p&gt;It is &lt;strong&gt;explicit about effects&lt;&#x2F;strong&gt;. Reads, writes, time, randomness, network, ledger postings, model inference, file system access, are all declared. Effects are part of the type, not lurking inside a function body. Koka and Vera both make this argument from the language side. A semantic artifact takes the same idea seriously enough to make it part of the artifact identity.&lt;&#x2F;p&gt;
&lt;p&gt;It is &lt;strong&gt;verifiable&lt;&#x2F;strong&gt;. It carries obligations: properties that must hold, invariants that must be preserved, preconditions on inputs, postconditions on outputs. Some of these are dischargeable by SMT solvers. Some need proof kernels. Some can only be enforced at runtime. The artifact records which is which, and what evidence exists for each.&lt;&#x2F;p&gt;
&lt;p&gt;It is &lt;strong&gt;reproducible&lt;&#x2F;strong&gt;. It declares its environment with enough precision that a meaning engine somewhere else, or the same meaning engine a year later, can replay it and get the same outputs. Wasm with the deterministic profile, Nix-pinned environments, content-addressed inputs, deterministic schedulers, this is the substrate that makes reproducibility a property rather than a hope.&lt;&#x2F;p&gt;
&lt;p&gt;It is &lt;strong&gt;explainable&lt;&#x2F;strong&gt;. Execution emits a provenance graph. What ran, on what data, in what environment, with what assumptions, against what obligations, with what residual uncertainty. The provenance is part of the output, not a log file someone deletes after a week.&lt;&#x2F;p&gt;
&lt;p&gt;The thing to notice about that list is that none of these properties is exotic. Every one of them already exists in some production system today. What does not exist yet is the synthesis. We treat them as add-ons around code. The argument I am making is that they are not add-ons. They are the artifact, and code is one rendering of it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-meaning-engine&quot;&gt;The meaning engine&lt;&#x2F;h2&gt;
&lt;p&gt;If the artifact changes, the machine around it has to change too. Compilers were built for source text. Runtimes were built for binaries. Neither was built for typed, verifiable, provenance-rich semantic objects.&lt;&#x2F;p&gt;
&lt;p&gt;The meaning engine is what fills that gap. I am using the term as a placeholder for a category, not for a single product. A meaning engine accepts a semantic artifact and:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Elaborates&lt;&#x2F;strong&gt; it. Resolves references, links schemas, grounds entities, checks types and effects.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Verifies&lt;&#x2F;strong&gt; it. Discharges obligations against the right backend. Type checks first. SMT for decidable arithmetic and structural constraints. Proof kernels for the residual cases that need full mathematical certainty. Runtime contracts for the cases nothing else can decide.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Plans&lt;&#x2F;strong&gt; execution. Picks an environment. Pins a target. Chooses where to run, on what hardware, with what capability set.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Lowers&lt;&#x2F;strong&gt; to code. Wasm, native CPU, GPU kernels, distributed dataflow graphs, whatever the planner decided. Code becomes an output, not an input.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Executes&lt;&#x2F;strong&gt; deterministically. Capability-bounded, replayable, audited.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Explains&lt;&#x2F;strong&gt;. Emits a provenance graph alongside the result. Says what assumptions held, what proofs passed, what tests ran, what fell back to runtime checks, what remained uncertain.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The interesting reframing is the compiler itself. The old job of a compiler was to translate one notation into another. The job of a meaning engine is to materialize meaning into execution and to keep meaning intact across that materialization. Translation becomes a sub-task. The primary task is preservation.&lt;&#x2F;p&gt;
&lt;p&gt;The semantic intermediate representation is where this lives or dies. I do not think the answer is “use LLVM IR with more comments.” LLVM IR is too low. It is already shaped around the mechanics of execution. We need a layer above it that carries domain meaning, obligations, and effects with first-class status.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;mlir.llvm.org&#x2F;&quot;&gt;MLIR&lt;&#x2F;a&gt; is the most relevant existing infrastructure for this. It treats multi-level structure as the design center: dialects, regions, operations, progressive lowering. A meaning engine could plausibly use an MLIR-style dialect stack with a semantic dialect at the top and execution dialects at the bottom. That is an implementation question, not a thesis question, and I want to keep the two separate.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-scientific-experiment-as-an-executable-artifact&quot;&gt;A scientific experiment as an executable artifact&lt;&#x2F;h2&gt;
&lt;p&gt;Abstract claims about new artifacts age badly. The argument has to survive contact with concrete domains. The first place I would point is science.&lt;&#x2F;p&gt;
&lt;p&gt;Take a differential expression analysis from RNA sequencing. Today, the artifact that runs is some combination of: a Snakemake or Nextflow workflow, a Conda environment file, a few R scripts, a Jupyter notebook with the figure-generation logic, a README that explains how to run it, and a PDF describing the methods for the eventual paper. The “program” is spread across all of these, and the relationships between them are mostly informal.&lt;&#x2F;p&gt;
&lt;p&gt;People have been trying to fix this. The &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.commonwl.org&#x2F;&quot;&gt;Common Workflow Language&lt;&#x2F;a&gt; standardizes the workflow part. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.researchobject.org&#x2F;ro-crate&#x2F;&quot;&gt;RO-Crate&lt;&#x2F;a&gt; packages methods, data, outputs, and identifiers together. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.w3.org&#x2F;TR&#x2F;prov-o&#x2F;&quot;&gt;PROV-O&lt;&#x2F;a&gt; gives you a vocabulary for entities, activities, and agents. These are real progress, and they show that the field has been edging toward semantic artifacts for years. What is still missing is a single authoritative object that also carries assumptions, allowed environments, correctness obligations, and execution traces in one verifiable form.&lt;&#x2F;p&gt;
&lt;p&gt;A semantic artifact for the same analysis would look more like this:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;artifact RNASeqDiffExp.v3
  inputs {
    reads: FASTQ[]
    reference: GenomeRef
    alpha: Real where 0 &amp;lt; alpha &amp;lt;= 1.0
  }
  assumptions {
    sha256(reference) == &amp;quot;9f8c1a...&amp;quot;
    paired_end(reads)
    same_instrument(reads)
  }
  outputs {
    counts: CountTable
    figures: FigureSet
  }
  effects {
    fs.read(&amp;quot;reads&amp;#x2F;&amp;quot;, &amp;quot;reference&amp;#x2F;&amp;quot;)
    fs.write(&amp;quot;results&amp;#x2F;&amp;quot;)
    cpu.simd
    gpu.optional
  }
  obligations {
    deterministic_target = &amp;quot;wasm32-wasi&amp;quot;
    environment = &amp;quot;nix:sha256-7m...&amp;quot;
    p_adj_threshold(counts) &amp;lt;= alpha
  }
  plan {
    trim -&amp;gt; align -&amp;gt; quantify -&amp;gt; diffexp -&amp;gt; render
  }
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The syntax above is illustrative. The shape is the point.&lt;&#x2F;p&gt;
&lt;p&gt;What this artifact gives you that a workflow file plus a README does not: a reviewer can inspect the assumptions directly. A lab can replay the artifact against the pinned environment and reproduce the figures byte for byte. An auditor can ask whether GPU execution changed numerical behavior and get a defensible answer. A future model can propose an optimization, and the meaning engine can reject the optimization automatically if it weakens any obligation. The provenance graph it emits is the methods section of the paper, machine-checkable, not a paragraph somebody wrote at midnight before submission.&lt;&#x2F;p&gt;
&lt;p&gt;The artifact does not have to make every claim machine-decidable. It only has to make explicit which claims are decided how. That is already a significant improvement over the current state, where most of those distinctions live in tribal knowledge.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-contract-as-an-executable-artifact&quot;&gt;A contract as an executable artifact&lt;&#x2F;h2&gt;
&lt;p&gt;The second domain I would point to is the one where words and computation already overlap uncomfortably: legal and financial agreements.&lt;&#x2F;p&gt;
&lt;p&gt;There is real prior art here. The &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;accordproject.org&#x2F;&quot;&gt;Accord Project&lt;&#x2F;a&gt; models contracts as text plus a data model plus executable logic. The &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cdm.finos.org&#x2F;&quot;&gt;Common Domain Model&lt;&#x2F;a&gt; treats financial products and lifecycle events as machine-readable, machine-executable domain objects. Smart contracts, in their actual industrial form rather than the crypto-bro form, have been heading in this direction for a decade.&lt;&#x2F;p&gt;
&lt;p&gt;The honest statement is that legal language will never be fully decidable. Many clauses are intentionally ambiguous. Many require human judgment. That is not a bug. It is how the institution works.&lt;&#x2F;p&gt;
&lt;p&gt;But a semantic artifact does not require full decidability. It requires that the boundaries between decidable and undecidable be explicit. Some clauses become parameters. Some clauses become executable logic. Some clauses remain text with cross-references into the structured part. The artifact records which is which.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;contract FixedRateLoan.v1
  parties { lender, borrower }
  terms {
    principal: Money where principal &amp;gt; 0
    rateAPR: Percent where 0 &amp;lt;= rateAPR &amp;lt;= 0.25
    termMonths: Nat where termMonths &amp;gt; 0
  }
  preconditions {
    kyc_clear(borrower)
    sanctions_clear(borrower)
  }
  obligations {
    monthly_payment(m) = amortize(principal, rateAPR, termMonths, m)
    total_paid = principal + total_interest(principal, rateAPR, termMonths)
  }
  effects {
    ledger.post
    notice.send
    archive.write
  }
  evidence {
    jurisdiction = &amp;quot;CH&amp;quot;
    execution_target = &amp;quot;wasm-ledger&amp;quot;
    text_reference = &amp;quot;Loan_2026_03_15.pdf#sha256:4b2a...&amp;quot;
  }
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A loan agreement, in this form, is verifiable in the parts that should be verifiable, executable in the parts that should be executable, and traceable back to the natural-language contract for the parts that are not either. A bank can check on every payment that the obligations still hold. An auditor can ask why a specific posting happened and get a provenance graph that points back to the artifact and the inputs. A regulator can require a class of contracts to demonstrate certain invariants without reading every contract by hand.&lt;&#x2F;p&gt;
&lt;p&gt;This is a much more serious model of “smart software” than the casual conflation of code with policy that the smart-contract era encouraged. The point is not that every rule becomes automatic. The point is that the system can tell you, without ambiguity, which rules it can enforce, which rules it can only check, and which rules a human still has to interpret.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-current-systems-already-get-right&quot;&gt;What current systems already get right&lt;&#x2F;h2&gt;
&lt;p&gt;I want to be careful here, because the easy mistake is to write this kind of essay as if nothing existing matters and everything has to be invented from scratch. That is wrong. Most of the pieces of a meaning engine already exist. They are scattered across systems that solved one face of the problem each. The synthesis is what is missing.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;veralang.dev&#x2F;&quot;&gt;Vera&lt;&#x2F;a&gt; optimizes for the generation loop. It makes syntax canonical, contracts mandatory, effects explicit, verification part of the normal compilation pipeline, and Wasm the default execution target. It is, in my framing, the best current example of “language designed for machine authorship.” It is not yet a meaning engine because the source text remains the artifact of authority. But the discipline it imposes on the artifact is exactly the discipline a semantic artifact needs.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bhc-lang&quot;&gt;BHC&lt;&#x2F;a&gt; (the bridge Haskell compiler I have written about elsewhere) optimizes for the typed substrate and the deployment surface. Multi-stage IRs, a Core layer that survives across lowerings, profile-specific runtime contracts, multiple backends from native to Wasm to GPU. It treats semantic preservation across lowering as a first-class goal, which is the exact discipline a meaning engine needs at the lowering layer. It is still source-language-centered, by design. That is its scope.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;dafny.org&#x2F;&quot;&gt;Dafny&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;lean-lang.org&#x2F;&quot;&gt;Lean&lt;&#x2F;a&gt; push hardest toward proof-bearing semantics. Dafny puts specifications inside the language and uses SMT to discharge them. Lean has a minimal trusted kernel and builds proof automation on top. They show what the upper bound of “verified meaning” looks like in practice. They are not yet the substrate for cross-domain artifacts, deterministic deployment, and provenance packaging, but they set the standard for what the verification layer of a meaning engine should aspire to.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;koka-lang.github.io&#x2F;koka&#x2F;doc&#x2F;index.html&quot;&gt;Koka&lt;&#x2F;a&gt; and similar effect-typed languages show that you can keep effect information at the type level without giving up performance. Effects in the type signature, not just in the function body. That is the model the artifact needs at the meaning layer.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;mlir.llvm.org&#x2F;&quot;&gt;MLIR&lt;&#x2F;a&gt; shows that a multi-level dialect stack is a workable architecture for keeping high-level structure across lowering. A semantic IR sitting above MLIR-style dialects is a plausible engineering path.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;webassembly.org&#x2F;&quot;&gt;Wasm&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;wasi.dev&#x2F;&quot;&gt;WASI&lt;&#x2F;a&gt;, with the recent &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;WebAssembly&#x2F;proposals&quot;&gt;deterministic profile&lt;&#x2F;a&gt; work and tooling like &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.wasmtime.dev&#x2F;&quot;&gt;Wasmtime’s determinism guide&lt;&#x2F;a&gt;, give us a portable, sandboxed, increasingly deterministic execution target. This is the substrate that makes “replay this artifact and get byte-identical outputs” feasible without operating-system heroics.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nix.dev&#x2F;&quot;&gt;Nix&lt;&#x2F;a&gt; gives us declarative, reproducible, isolated environments. The pinning story is mature. The integration story is still rough. But the building block is real.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.w3.org&#x2F;TR&#x2F;prov-o&#x2F;&quot;&gt;PROV-O&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.researchobject.org&#x2F;ro-crate&#x2F;&quot;&gt;RO-Crate&lt;&#x2F;a&gt;, and &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.commonwl.org&#x2F;&quot;&gt;CWL&lt;&#x2F;a&gt; give us standards for provenance, packaging, and portable workflows, particularly in the scientific domain. They are evidence that “explainability as a structural property of the artifact” is not science fiction. It is the way some communities already work.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.unison-lang.org&#x2F;&quot;&gt;Unison&lt;&#x2F;a&gt; gives us content-addressed, structured code identity. This is the lesson the rest of us still have not fully internalized: identity should not depend on filenames or formatting.&lt;&#x2F;p&gt;
&lt;p&gt;If I were to draw the comparison shortly: Vera optimizes the generation loop. BHC optimizes the typed substrate. Dafny and Lean optimize formal truth. Wasm and Nix optimize deterministic deployment. PROV and RO-Crate optimize provenance. Unison optimizes structural identity. None of them is the future on its own. The future is the synthesis.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;from-compilers-to-meaning-engines&quot;&gt;From compilers to meaning engines&lt;&#x2F;h2&gt;
&lt;p&gt;The reframing I find most useful is to put the old pipeline next to the new one and look at what changes.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;From compiler to meaning engine&quot;&gt;Old pipeline
============

  source code  -&amp;gt;  compile  -&amp;gt;  binary  -&amp;gt;  run


New pipeline
============

  intent
    |
    v
  elaborate   ---&amp;gt;  semantic artifact
    |                (typed, content-addressed,
    |                 effects explicit)
    v
  verify      ---&amp;gt;  obligation graph
    |                (proved | tested | runtime)
    v
  plan        ---&amp;gt;  execution plan
    |                (env pinned, target chosen,
    |                 capabilities bounded)
    v
  lower       ---&amp;gt;  Wasm | native | GPU | distributed
    |
    v
  execute     ---&amp;gt;  results
    |
    v
  explain     ---&amp;gt;  provenance graph&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Every stage matters, and every stage has explicit outputs. Elaboration produces a fully grounded semantic artifact. Verification produces a graph of discharged and residual obligations. Planning produces a deterministic execution plan with a pinned environment and a capability set. Lowering produces target-specific code as one of several possible outputs. Execution produces results plus a provenance graph. Explanation reads the provenance graph and answers questions about why the system did what it did.&lt;&#x2F;p&gt;
&lt;p&gt;The execution trace becomes a first-class output, not a side effect. Today, a stack trace exists because something went wrong. In a meaning engine, an execution trace exists because something happened, and the system promises to be able to reconstruct it later.&lt;&#x2F;p&gt;
&lt;p&gt;The compiler does not disappear. It moves. It becomes one stage of a larger pipeline, and the pipeline is what people interact with.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-this-is-bigger-than-programming&quot;&gt;Why this is bigger than programming&lt;&#x2F;h2&gt;
&lt;p&gt;I want to zoom out for one section, because the trilogy has been quietly building toward a claim that is not really about programming languages at all.&lt;&#x2F;p&gt;
&lt;p&gt;Writing systems have changed several times. Oral tradition gave way to written records. Manuscripts gave way to print. Print gave way to digital text. Each transition expanded the scale and durability of what humans could record, distribute, and act on collectively. Each transition reshaped institutions in ways that took decades to settle.&lt;&#x2F;p&gt;
&lt;p&gt;We are in the early phase of another transition. We are moving from systems that &lt;strong&gt;record&lt;&#x2F;strong&gt; knowledge to systems that &lt;strong&gt;execute&lt;&#x2F;strong&gt; knowledge. A scientific paper is a record. A semantic artifact for a scientific experiment is an execution. A legal contract is a record. A semantic artifact for a financial product is an execution. An infrastructure runbook is a record. A semantic artifact for the same operational behavior is an execution.&lt;&#x2F;p&gt;
&lt;p&gt;This is the civilizational layer of the argument, and I am stating it deliberately even though it is uncomfortable. The shift from text-as-record to text-as-execution will be at least as large as the shift from manuscript to print. It will not be the same as previous transitions, because it is happening on infrastructure we built and partially understand, not on infrastructure that emerged organically over centuries. But the order of magnitude is similar.&lt;&#x2F;p&gt;
&lt;p&gt;If that framing is even half right, “what programming language should we use” is a small question. The big question is what kinds of institutions we are willing to build on top of artifacts that are themselves executable.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-role-of-humans&quot;&gt;The role of humans&lt;&#x2F;h2&gt;
&lt;p&gt;If we do not write code, what do we do? This is a fair question, and I want to answer it directly rather than dodge it.&lt;&#x2F;p&gt;
&lt;p&gt;We define constraints. We shape intent. We evaluate outcomes. We design systems of meaning. We argue about which obligations matter and which assumptions are reasonable. We negotiate the boundaries between automated and human-decided parts of an artifact. We decide which evidence is acceptable and which is not.&lt;&#x2F;p&gt;
&lt;p&gt;This is a different job. It is closer to architecture than to authorship. The unit of work is the obligation, not the function.&lt;&#x2F;p&gt;
&lt;p&gt;For people who like writing code, this is going to feel like a loss. I understand. I like writing code too. But the historical pattern is clear. Each abstraction layer demoted the layer below it from “primary skill” to “specialized skill.” Assembly programming did not disappear when C arrived. It became the thing a small number of people did when it really mattered, in compilers, in kernels, in performance-critical hot paths. The same is going to happen one layer up. Most people will stop authoring code as the primary artifact. Some people will keep doing it, where it really matters, and they will be more important to the field, not less.&lt;&#x2F;p&gt;
&lt;p&gt;The work that gets more interesting, in this picture, is the work of building meaning engines themselves. The substrate is wide open. There is room for many designs. The decade ahead is going to look more like the early compiler era than like the late framework era. That is a good time to be working on infrastructure.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;implications-for-builders-right-now&quot;&gt;Implications for builders right now&lt;&#x2F;h2&gt;
&lt;p&gt;The migration is staged. The way I think about the timeline:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Migration roadmap&quot;&gt;Near term              Medium term            Long term
(now)                  (2-3 years)            (3-10 years)

Semantic           -&amp;gt;  Artifact core      -&amp;gt;  Multi-target
sidecars                                      lowering

- specs beside         - content-             - Wasm&amp;#x2F;WASI first
  the code               addressed            - native CPU
- pinned envs            identity             - GPU kernels
  (Nix)                - typed effects        - distributed DAGs
- provenance           - obligation
  (PROV-O)               graphs              Artifact-first
- deterministic        - graded                domains
  replay (Wasm)          verification       - science
                                            - contracts
Improve auditability   Make the artifact    - infra policy
without abandoning     the review surface   - audit&amp;#x2F;compliance
the current stack&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;I do not want to leave this post in the air. The trilogy has been escalating, and each step has been more speculative than the last. This post is the most speculative of the three. It is fair to ask what any of this implies for what we should be doing in 2026, not in 2036.&lt;&#x2F;p&gt;
&lt;p&gt;A few things I think are already actionable.&lt;&#x2F;p&gt;
&lt;p&gt;Take semantic sidecars seriously. If you are shipping software in any regulated or critical domain, you can already start writing artifact manifests that bind your code to its assumptions, its allowed effects, its pinned environment, its provenance, and its replay instructions. You do not need a new compiler to do this. You need discipline and a few tools that already exist. Nix and Wasm are good starting points. PROV-O is a usable vocabulary. The investment compounds.&lt;&#x2F;p&gt;
&lt;p&gt;Treat reproducibility as a property, not a hope. If you cannot replay your artifact deterministically, you do not have a semantic artifact. You have a hope. Wasm with the deterministic profile, Nix-pinned environments, content-addressed inputs, and deterministic schedulers are the building blocks. The cost of getting there is real but bounded, and the gains in audit, debugging, and incident response are immediate.&lt;&#x2F;p&gt;
&lt;p&gt;Make effects explicit. You do not need a new language to do this. You can do it with discipline in the language you already use. Module boundaries that separate pure from effectful code are not exotic, they are just unfashionable in some communities. The discipline is what matters. The syntax is downstream.&lt;&#x2F;p&gt;
&lt;p&gt;Stop treating proof and test and observation as the same thing. They are not. A proven property, a tested property, and an observed property carry different weights. A meaning engine has to keep them separate. You can start keeping them separate yourself, in code review and in design review, today.&lt;&#x2F;p&gt;
&lt;p&gt;Invest in IRs more than in syntax. The next decade of leverage is in the layer above LLVM IR and below user-facing source. If you are building tools for software development, this is where the interesting work is. The tooling that wins will preserve more meaning than today’s compiler stacks do.&lt;&#x2F;p&gt;
&lt;p&gt;If you build infrastructure, the heuristic is simple. Anywhere your team is currently relying on “shell scripts plus a README plus a notebook plus a PDF plus tribal knowledge” to encode a process, that is a candidate for a semantic artifact. Pick one such process, and try to model it as an artifact instead of as a pile.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;open-questions-i-do-not-pretend-to-have-settled&quot;&gt;Open questions I do not pretend to have settled&lt;&#x2F;h2&gt;
&lt;p&gt;I am stating the thesis confidently, but I want to be honest about what I do not know.&lt;&#x2F;p&gt;
&lt;p&gt;I do not know the exact shape of the semantic IR. I have a bias toward an MLIR-style dialect stack with a semantic dialect at the top, augmented with effect typing, content-addressed identity, and explicit obligations. A content-addressed term graph in the Unison style is a credible alternative. A two-part design with a logical core plus an executable planning layer is another. The winning design is probably hybrid. I would rather present the question honestly than fake an answer.&lt;&#x2F;p&gt;
&lt;p&gt;I do not know how much proof to demand before execution. Too much proof and the system is unusable. Too little and the artifact loses its claim to authority. Vera’s split between static proof and runtime fallback is realistic. Lean’s kernel model is the strongest, and the slowest. The right answer is graded, not absolute, and it will probably depend on the domain.&lt;&#x2F;p&gt;
&lt;p&gt;I do not know how distributed execution fits in. Wasm gives us a portable substrate. WASI gives us a capability model. Content addressing gives us identity. None of these solves scheduling, data locality, and semantic replay across clusters. That is an implementation frontier, not a hole in the thesis, but it is a frontier and I want to flag it as one.&lt;&#x2F;p&gt;
&lt;p&gt;I do not know which domains move first. My best guess is the domains that already pay most of the cost of ambiguity today: scientific workflows, regulated finance, compliance and audit, infrastructure policy, and certain parts of public-sector procurement. Domains where the gap between “what we wrote” and “what we ran” is currently catastrophic when something goes wrong. I would not be surprised if those domains develop their own meaning engines first, and a general-purpose substrate emerges as the synthesis a few years later.&lt;&#x2F;p&gt;
&lt;p&gt;I do not know how long the migration takes. My only confident claim about timing is that it does not arrive by deleting today’s stacks. It arrives by moving semantics upward, year by year, until the artifact most teams care about is no longer a tree of source files.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;closing&quot;&gt;Closing&lt;&#x2F;h2&gt;
&lt;p&gt;We called them programming languages because we thought we were speaking to machines. In reality, we were translating ourselves into a form machines could tolerate. The notation was a compromise between what we could write and what they could lower.&lt;&#x2F;p&gt;
&lt;p&gt;Now that machines can understand us more directly, the question is not how to write better code. The question is how to think in systems that can be executed. The artifact we author should preserve meaning, not perform it. The runtime should keep that meaning intact, not erase it during translation.&lt;&#x2F;p&gt;
&lt;p&gt;Code does not disappear in this story. It drops a layer. It becomes implementation detail, escape hatch, optimization substrate, foreign-function boundary. Important. Powerful. Not primary.&lt;&#x2F;p&gt;
&lt;p&gt;The trilogy ends here, in the same place each transition in computing has ended. The previous artifact survives, demoted, while a higher layer takes over the work of authority. Source code joins assembly in the long list of things that used to be the thing and now are something we lower to.&lt;&#x2F;p&gt;
&lt;p&gt;The interesting work, for the rest of this decade, is at the layer above. That is where I am spending mine.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;further-reading&quot;&gt;Further reading&lt;&#x2F;h2&gt;
&lt;p&gt;The systems already pointing at parts of this future, grouped by the dimension each one gets right.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Semantic IRs and structural identity.&lt;&#x2F;strong&gt; The argument that artifact identity should not depend on filenames, and that compiler stacks should preserve high-level structure across lowering, already has working precedent.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;mlir.llvm.org&#x2F;&quot;&gt;MLIR&lt;&#x2F;a&gt; for multi-level dialect stacks&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.unison-lang.org&#x2F;&quot;&gt;Unison&lt;&#x2F;a&gt; for content-addressed code&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Verification.&lt;&#x2F;strong&gt; What it looks like to push specifications and proofs into the artifact, with three different bets on how strict the proof obligation should be.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;dafny.org&#x2F;&quot;&gt;Dafny&lt;&#x2F;a&gt; for specs in the language, SMT-backed&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;lean-lang.org&#x2F;&quot;&gt;Lean&lt;&#x2F;a&gt; for a minimal trusted kernel and proofs as terms&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;veralang.dev&#x2F;&quot;&gt;Vera&lt;&#x2F;a&gt; for Z3 on decidable cases and runtime fallback for the rest&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Typed effects.&lt;&#x2F;strong&gt; The case for putting side effects into the type rather than hiding them in the function body.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;koka-lang.github.io&#x2F;koka&#x2F;doc&#x2F;index.html&quot;&gt;Koka&lt;&#x2F;a&gt; for row-polymorphic effects with handlers&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;veralang.dev&#x2F;&quot;&gt;Vera&lt;&#x2F;a&gt; for mandatory effect declarations on every function&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Reproducible execution.&lt;&#x2F;strong&gt; What “the same artifact runs the same way somewhere else, a year later” actually requires in production.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;webassembly.org&#x2F;&quot;&gt;WebAssembly&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;wasi.dev&#x2F;&quot;&gt;WASI&lt;&#x2F;a&gt; for portable, sandboxed, increasingly deterministic execution&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.wasmtime.dev&#x2F;&quot;&gt;Wasmtime determinism guide&lt;&#x2F;a&gt; for the concrete steps to byte-identical replay&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nix.dev&#x2F;&quot;&gt;Nix&lt;&#x2F;a&gt; for declarative, isolated, pinned environments&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Provenance and packaging.&lt;&#x2F;strong&gt; The vocabulary for explaining what ran, on what data, in what environment, with what evidence.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.w3.org&#x2F;TR&#x2F;prov-o&#x2F;&quot;&gt;PROV-O&lt;&#x2F;a&gt; for entities, activities, and agents&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.researchobject.org&#x2F;ro-crate&#x2F;&quot;&gt;RO-Crate&lt;&#x2F;a&gt; for artifact bundles with methods, data, and identifiers&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.commonwl.org&#x2F;&quot;&gt;Common Workflow Language&lt;&#x2F;a&gt; for portable, vendor-neutral workflows&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Domain artifacts.&lt;&#x2F;strong&gt; The honest evidence that “software artifact” is too narrow a category for what comes next.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;accordproject.org&#x2F;&quot;&gt;Accord Project&lt;&#x2F;a&gt; for contracts as text plus data model plus logic&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cdm.finos.org&#x2F;&quot;&gt;Common Domain Model&lt;&#x2F;a&gt; for financial products and lifecycle events as machine-readable, machine-executable objects&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>Patents Per Capita Is a Vanity Metric</title>
          <pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/patents-per-capita-is-a-vanity-metric/</link>
          <guid>https://raskell.io/articles/patents-per-capita-is-a-vanity-metric/</guid>
          <description xml:base="https://raskell.io/articles/patents-per-capita-is-a-vanity-metric/">&lt;p&gt;It is the first of May, Labour Day. A holiday explicitly about work, which makes it a good day to ask whether the work that gets counted is the work that actually matters.&lt;&#x2F;p&gt;
&lt;p&gt;Earlier this week I posted on LinkedIn about how Swiss innovation rankings tend to be illustrated with photos of Zurich, even though Basel produces a meaningful share of what those rankings reward. It performed better than most of my other shares there. Some of the feedback agreed. Some pushed back. Almost all of it stayed inside the same frame the rankings themselves set: which Swiss city deserves credit for the score. That is not the question I find interesting. The question I find interesting is what the score is measuring in the first place, and whether it is the score we should be optimising for. The LinkedIn format is not the right place to work that out. A long-form post on a Labour Day morning is.&lt;&#x2F;p&gt;
&lt;p&gt;Switzerland topped the Global Innovation Index in 2025 for the fifteenth year in a row. The index is published annually by the World Intellectual Property Organization, the United Nations agency that administers the international patent system, and it ranks roughly 140 economies on how innovative they are. The 2025 press release went out in September. The country celebrated. Zurich’s skyline got photographed again. Roche and Novartis, the two Basel-headquartered pharmaceutical giants, were name-checked. ETH Zurich and EPFL, the country’s two federal technical universities, got their nods. The framing was the usual one: small country, outsized output, durable lead.&lt;&#x2F;p&gt;
&lt;p&gt;I am Swiss. I am from Basel. I build infrastructure for a living, which means I spend a lot of time staring at dashboards that report how systems are performing in production. A number that sits at the top of a leaderboard for fifteen consecutive years should make any engineer who watches dashboards suspicious before it makes them proud. The metrics that look the smoothest the longest are usually the ones telling you the least about what is happening right now.&lt;&#x2F;p&gt;
&lt;p&gt;This post is about why “most innovative country” is a vanity metric: a number that is real, and countable, and correlated with success, but that should not be the thing you optimise for. The Global Innovation Index measures yesterday extremely well. It does not measure today, and it tells you almost nothing about tomorrow.&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>Traffic Replay as a Security Primitive</title>
          <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/traffic-replay-as-a-security-primitive/</link>
          <guid>https://raskell.io/articles/traffic-replay-as-a-security-primitive/</guid>
          <description xml:base="https://raskell.io/articles/traffic-replay-as-a-security-primitive/">&lt;p&gt;If you have ever tuned a WAF rule, you know the cycle. A rule triggers on legitimate traffic. You get paged. You look at the logs, which tell you the rule ID and the request path but not enough to reproduce what happened. You loosen the rule based on your best guess. You deploy. You wait. Next week, you get paged again, either because the rule is still too aggressive or because you loosened it too far and now something is getting through.&lt;&#x2F;p&gt;
&lt;p&gt;The problem is not the WAF. The problem is that you are tuning a stateful, context-dependent system without the ability to reproduce the inputs that caused the behavior you are trying to change. You are debugging blind.&lt;&#x2F;p&gt;
&lt;p&gt;This is not unique to WAFs. Every edge system that makes decisions about traffic, proxies, rate limiters, auth gateways, bot detectors, suffers from the same structural issue: the traffic that triggered the behavior is gone. It existed for the duration of the request, it was logged incompletely, and now you are making changes based on a partial reconstruction of what happened.&lt;&#x2F;p&gt;
&lt;p&gt;Traffic replay solves this. Not as a testing tool. As a security primitive.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-replay-actually-means&quot;&gt;What replay actually means&lt;&#x2F;h2&gt;
&lt;p&gt;Traffic replay, in the sense I mean it here, is not load testing. It is not generating synthetic traffic that looks like production. It is capturing real requests as they happened, in order, with their headers and bodies intact, and re-executing them against a target environment deterministically.&lt;&#x2F;p&gt;
&lt;p&gt;The distinction matters. Load testing answers “can this system handle the volume?” Replay answers “will this system make the same decisions about the same traffic?” One is about throughput. The other is about behavior.&lt;&#x2F;p&gt;
&lt;p&gt;A replayed request hits the same WAF rules, the same rate limits, the same routing logic as the original. If the original was blocked, you can see whether the replay is also blocked, and if not, exactly what changed. If you modify a rule and replay the same traffic, you get a direct comparison: old behavior versus new behavior, same inputs, different configuration.&lt;&#x2F;p&gt;
&lt;p&gt;This is the primitive that WAF tuning has always been missing. Not better logs. Not more dashboards. The ability to say “here is the exact traffic that caused the problem, and here is what happens when I change the rule.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-capture-problem&quot;&gt;The capture problem&lt;&#x2F;h2&gt;
&lt;p&gt;The first obstacle is getting the traffic into a replayable format. Production traffic is ephemeral. It arrives, gets processed, and disappears. Logs capture metadata: timestamps, status codes, paths, maybe some headers. They do not capture the full request as a replayable artifact.&lt;&#x2F;p&gt;
&lt;p&gt;There are two practical approaches.&lt;&#x2F;p&gt;
&lt;p&gt;The first is capturing at the browser. Every modern browser can export a session as a HAR (HTTP Archive) file. This gives you the complete request and response for every HTTP transaction in a session: method, URL, headers, body, timing. When a user reports “this request was blocked,” you can ask them for a HAR export. You now have the exact traffic, not a description of it.&lt;&#x2F;p&gt;
&lt;p&gt;The second is capturing at the proxy. If your reverse proxy or load balancer can mirror traffic to a capture endpoint, you get production-representative traffic without depending on user cooperation. This is more complex to set up but gives you continuous coverage rather than incident-by-incident captures.&lt;&#x2F;p&gt;
&lt;p&gt;Either way, the result is the same: a sequence of HTTP requests that can be replayed faithfully.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;replay-is-not-as-simple-as-resending&quot;&gt;Replay is not as simple as resending&lt;&#x2F;h2&gt;
&lt;p&gt;There are subtleties that make naive replay useless.&lt;&#x2F;p&gt;
&lt;p&gt;The obvious one is URLs. If you captured traffic hitting &lt;code&gt;production.example.com&lt;&#x2F;code&gt; and want to replay it against &lt;code&gt;staging.example.com&lt;&#x2F;code&gt;, you need to rewrite the host. But you also need to rewrite any absolute URLs in headers like &lt;code&gt;Origin&lt;&#x2F;code&gt; and &lt;code&gt;Referer&lt;&#x2F;code&gt;, and potentially in request bodies for APIs that include callback URLs.&lt;&#x2F;p&gt;
&lt;p&gt;Then there are cookies. A replayed request with production session cookies will either fail authentication on staging (different session store) or, worse, succeed against a production session you did not intend to touch. Cookie stripping is not optional. It is a safety requirement.&lt;&#x2F;p&gt;
&lt;p&gt;Headers need mutation too. You might need to inject an authorization token for the staging environment, or add a header that tells the WAF “this is a replay, do not count it toward rate limits.” You might need to strip headers that identify the original client IP to avoid polluting analytics.&lt;&#x2F;p&gt;
&lt;p&gt;And ordering matters. If request B depends on state created by request A (a login followed by an authenticated action), replaying them in parallel or out of order produces meaningless results. Deterministic replay means sequential, in capture order, by default.&lt;&#x2F;p&gt;
&lt;p&gt;None of this is algorithmically hard. But getting it wrong in any of these dimensions produces results that are misleading rather than useful. A replay tool that does not handle URL rewriting, cookie stripping, header mutation, and ordering is a footgun, not a primitive.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-diff-is-the-point&quot;&gt;The diff is the point&lt;&#x2F;h2&gt;
&lt;p&gt;Replay alone is useful. Replay with behavioral diffing is what changes the WAF tuning workflow.&lt;&#x2F;p&gt;
&lt;p&gt;The pattern works like this. You have a set of captured traffic. You replay it against environment A (say, production with the current WAF rules) and save the results. You replay the same traffic against environment B (staging with the proposed rule change) and save those results. Then you diff.&lt;&#x2F;p&gt;
&lt;p&gt;The diff is not a text diff. It is a behavioral diff. For each request in the capture, you compare:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Status codes&lt;&#x2F;strong&gt;: did the same request get a 200 in one environment and a 403 in the other?&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;WAF decisions&lt;&#x2F;strong&gt;: did the WAF block in one and allow in the other? Which rule ID? What score?&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Security headers&lt;&#x2F;strong&gt;: did CSP, HSTS, or X-Frame-Options change between environments?&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Response characteristics&lt;&#x2F;strong&gt;: same content type? Same cache behavior?&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;pre&gt;&lt;code&gt;Request: GET &amp;#x2F;api&amp;#x2F;v2&amp;#x2F;users?search=&amp;lt;script&amp;gt;alert(1)&amp;lt;&amp;#x2F;script&amp;gt;

Production (current rules):
  Status: 403
  x-waf-action: block
  x-waf-rule: 941100

Staging (proposed rules):
  Status: 200
  x-waf-action: pass
  x-waf-rule: (none)

⚠ WAF regression: XSS payload now passes
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This changes the WAF tuning conversation entirely. Instead of “I think loosening rule 941100 is safe,” you have “I replayed 2,000 captured requests against the proposed rule change. Three requests that were previously blocked are now allowed. Here they are. Two are false positives that should pass. One is an XSS payload that should not.”&lt;&#x2F;p&gt;
&lt;p&gt;That is not a guess. That is evidence.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-this-is-a-security-primitive&quot;&gt;Why this is a security primitive&lt;&#x2F;h2&gt;
&lt;p&gt;I use the word “primitive” deliberately. A primitive is a building block that other things compose on top of. Deterministic replay with behavioral diffing is a primitive because it enables patterns that are otherwise impractical:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Safe WAF rule iteration.&lt;&#x2F;strong&gt; Change a rule, replay traffic, inspect the behavioral diff, deploy with confidence. The feedback loop goes from “deploy and hope” to “verify, then deploy.” This is the most immediate use case, and the one that solves the 3 AM pager problem.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Environment drift detection.&lt;&#x2F;strong&gt; Replay the same traffic against staging and production on a regular schedule. When the behaviors diverge, you know something changed that should not have. This catches configuration drift, certificate mismatches, and routing differences before they become incidents.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Regression testing for edge config.&lt;&#x2F;strong&gt; Every change to your proxy, WAF, or rate limiter configuration gets a replay run before deployment. The diff tells you exactly what changed in observable behavior. This is the edge infrastructure equivalent of a unit test suite, except instead of testing code you are testing policy.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Incident reproduction.&lt;&#x2F;strong&gt; When a user reports a block, capture the traffic, replay it, confirm the block, and then iterate on the fix in staging without affecting production. The time from “user report” to “verified fix” drops from hours to minutes.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Compliance evidence.&lt;&#x2F;strong&gt; When an auditor asks “how do you know your WAF rules are working?”, you can show them replay runs with behavioral diffs that demonstrate which rules triggered on which traffic patterns. Not “we have a WAF configured.” Verifiable behavioral evidence.&lt;&#x2F;p&gt;
&lt;p&gt;Each of these patterns exists in some ad-hoc form at organizations that invest heavily in security tooling. What they lack is a common primitive that makes them systematic.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-exists-today&quot;&gt;What exists today&lt;&#x2F;h2&gt;
&lt;p&gt;There is no shortage of tools that touch parts of this problem. The gap is in how they compose.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Manual security testing platforms.&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;portswigger.net&#x2F;burp&quot;&gt;Burp Suite&lt;&#x2F;a&gt; is the standard. Its Repeater lets you capture a request and resend it with modifications, which is replay in the most literal sense. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.zaproxy.org&#x2F;&quot;&gt;OWASP ZAP&lt;&#x2F;a&gt; provides similar capabilities as open source. Both are excellent for manual pen-testing: an engineer investigates a specific request, tweaks parameters, observes the response. What they do not do is automated, batch-level behavioral comparison across environments. You can replay one request in Burp and inspect the result. You cannot replay two thousand requests against staging and production and get a structured diff of every WAF decision that changed. The workflow is manual and investigative, not systematic and CI-integrated.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Traffic capture and replay tools.&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;goreplay.org&#x2F;&quot;&gt;GoReplay&lt;&#x2F;a&gt; (gor) captures live HTTP traffic and replays it against another environment. It is the closest existing tool to what I am describing, and it is good at what it does: mirroring production traffic to a staging environment for load and correctness testing. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;mitmproxy.org&#x2F;&quot;&gt;mitmproxy&lt;&#x2F;a&gt; can intercept, record, and replay HTTP flows with full scriptability. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;tcpreplay.appneta.com&#x2F;&quot;&gt;tcpreplay&lt;&#x2F;a&gt; operates at the TCP level for network-layer replay. The limitation across all of these is that they are replay tools, not behavioral analysis tools. They send the traffic. What happens next, comparing WAF decisions, diffing security headers, detecting regressions, is left to you.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Desktop proxies.&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.charlesproxy.com&#x2F;&quot;&gt;Charles Proxy&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.telerik.com&#x2F;fiddler&quot;&gt;Fiddler&lt;&#x2F;a&gt; capture and replay HTTP traffic with rich GUIs. They are useful for development debugging but are desktop applications, not CLI tools. They do not integrate into CI&#x2F;CD pipelines or produce machine-readable behavioral diffs.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Load testing tools.&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;k6.io&#x2F;&quot;&gt;k6&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;locust.io&#x2F;&quot;&gt;Locust&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;gatling.io&#x2F;&quot;&gt;Gatling&lt;&#x2F;a&gt;. These can replay captured traffic at volume, but they measure performance, not policy behavior. They answer “can the system handle the load?” not “did the WAF make the same decision?”&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;WAF testing frameworks.&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;coreruleset&#x2F;ftw&quot;&gt;ftw&lt;&#x2F;a&gt; (Framework for Testing WAFs) is the OWASP project for validating Core Rule Set behavior. It uses synthetic payloads designed to trigger specific rules. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;projectdiscovery&#x2F;nuclei&quot;&gt;Nuclei&lt;&#x2F;a&gt; is a template-based vulnerability scanner that sends crafted requests and checks responses. Both are valuable for validating that your WAF blocks known-bad patterns. Neither replays real captured traffic, which means neither can tell you whether a rule change affects the legitimate traffic that your actual users send.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;API testing tools.&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hurl.dev&#x2F;&quot;&gt;Hurl&lt;&#x2F;a&gt; can chain HTTP requests with assertions, which is close to sequential replay with verification. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;curl.se&#x2F;&quot;&gt;curl&lt;&#x2F;a&gt; can resend individual requests. Both are useful building blocks but do not provide the capture-replay-diff workflow as a single primitive.&lt;&#x2F;p&gt;
&lt;p&gt;The pattern across all of these is the same. Each tool handles one or two steps well: capture, or replay, or analysis. No single tool captures real traffic, replays it deterministically with URL rewriting and cookie stripping, and then produces a structured behavioral diff across WAF decisions, security headers, and status codes. The pieces exist. The composition does not.&lt;&#x2F;p&gt;
&lt;p&gt;This is the gap that &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;ushio&quot;&gt;Ushio&lt;&#x2F;a&gt; fills.&lt;&#x2F;p&gt;
&lt;p&gt;Ushio is a Rust CLI that does three things. It converts HAR files into a replayable capture format. It replays captures against one or more targets with URL rewriting, header mutation, and cookie stripping. And it diffs two replay results to identify behavioral changes in status codes, WAF decisions, and security headers.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Convert a browser HAR export to ushio format
ushio convert session.har -o capture.json

# Replay against staging with auth header, strip cookies
ushio replay capture.json \
  -t https:&amp;#x2F;&amp;#x2F;staging.example.com \
  --header &amp;quot;Authorization:Bearer $TOKEN&amp;quot; \
  --strip-cookies \
  -o staging.json

# Replay against production
ushio replay capture.json \
  -t https:&amp;#x2F;&amp;#x2F;prod.example.com \
  --strip-cookies \
  -o prod.json

# Diff the results
ushio diff staging.json prod.json --only-diff
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The output is either pretty-printed for human review or JSON for pipeline integration. Exit code 0 means no behavioral differences. Exit code 1 means differences were found. This makes it composable with CI&#x2F;CD: run a replay diff on every edge config change, fail the pipeline if behavior regresses.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-changes-when-you-have-this&quot;&gt;What changes when you have this&lt;&#x2F;h2&gt;
&lt;p&gt;The shift is from reactive to proactive. Without replay, you discover WAF problems when users report them or when the pager goes off. With replay, you discover them before deployment. The feedback loop compresses from incident-driven to change-driven.&lt;&#x2F;p&gt;
&lt;p&gt;But the deeper shift is epistemic. Without replay, WAF tuning is a matter of judgment and experience. You read the rule, you read the logs, you make a call. With replay, it is a matter of evidence. You replay the traffic, you observe the behavior, you make a decision based on what actually happened.&lt;&#x2F;p&gt;
&lt;p&gt;I do not think this replaces judgment. You still need to decide whether a particular request that is now being allowed is acceptable. But the decision is grounded in concrete behavior rather than reconstruction from incomplete logs. The security engineer’s job changes from “guess what the WAF will do” to “observe what the WAF does and decide if that is correct.”&lt;&#x2F;p&gt;
&lt;p&gt;Every edge system that makes decisions about traffic should be testable with real traffic. WAFs, rate limiters, bot detectors, auth gateways. If you cannot replay traffic and diff the behavior, you cannot systematically verify that the system does what you think it does. That is not a tooling problem. That is a visibility problem. And it is solvable.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;ushio&quot;&gt;Ushio&lt;&#x2F;a&gt; - Deterministic edge traffic replay tool&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;zentinelproxy&#x2F;zentinel&quot;&gt;Zentinel&lt;&#x2F;a&gt; - Security-first reverse proxy with structured WAF decision logging&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;www.softwareishard.com&#x2F;blog&#x2F;har-12-spec&#x2F;&quot;&gt;HAR 1.2 specification&lt;&#x2F;a&gt; - The HTTP Archive format&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;&#x2F;articles&#x2F;what-zentinel-is-really-optimizing-for&#x2F;&quot;&gt;What Zentinel Is Really Optimizing For&lt;&#x2F;a&gt; - The operator-first proxy design philosophy&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;&#x2F;articles&#x2F;notes-from-rsac-2026&#x2F;&quot;&gt;Notes from RSAC 2026&lt;&#x2F;a&gt; - Where the applied security thread connects to industry context&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>The Economics of Inference</title>
          <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/the-economics-of-inference/</link>
          <guid>https://raskell.io/articles/the-economics-of-inference/</guid>
          <description xml:base="https://raskell.io/articles/the-economics-of-inference/">&lt;p&gt;In the &lt;a href=&quot;&#x2F;articles&#x2F;what-sixteen-ai-agents-taught-me-about-management&#x2F;&quot;&gt;previous article&lt;&#x2F;a&gt;, I described running sixteen AI agents as a virtual company over Christmas 2025. The architecture worked. The coordination model worked. What did not work was the economics. My token budget evaporated in days, not because the agents were unproductive, but because every act of coordination, every status update, every escalation consumed a metered resource that I could not replenish fast enough.&lt;&#x2F;p&gt;
&lt;p&gt;At the time, I framed this as a budget problem. My consumer subscription could not sustain the overhead of agent-to-agent communication. An enterprise with API access and a real budget would not have the same constraint. That framing was correct, but it was also too small.&lt;&#x2F;p&gt;
&lt;p&gt;What I was actually experiencing, at the scale of one person and sixteen terminal panes, was something much larger. I was experiencing inference as a utility. A consumed physical resource, the same way I consume electricity when I turn on a light and water when I open a tap.&lt;&#x2F;p&gt;
&lt;p&gt;That framing changes everything about how you think about the economics of AI.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-makes-something-a-utility&quot;&gt;What makes something a utility&lt;&#x2F;h2&gt;
&lt;p&gt;Utilities share a set of properties that distinguish them from ordinary goods and services. They are not optional. Modern life depends on them. They are consumed continuously, not purchased once. They require massive physical infrastructure to produce and deliver. They are metered. They are priced per unit of consumption. Their supply chains are subject to geography, geopolitics, and regulation. And their cost structure is dominated by the capital expenditure of building and maintaining the infrastructure, not by the marginal cost of the next unit delivered.&lt;&#x2F;p&gt;
&lt;p&gt;Water. Electricity. Natural gas. Telecommunications. These are the four traditional utilities. Every modern economy is built on top of them. Their availability, reliability, and cost determine what is possible in a given jurisdiction. You do not build a semiconductor fab where the power grid is unreliable. You do not run a data center where the water supply for cooling is uncertain. The utility layer is the foundation, and everything above it is constrained by what the foundation can support.&lt;&#x2F;p&gt;
&lt;p&gt;Inference is acquiring every one of these properties.&lt;&#x2F;p&gt;
&lt;p&gt;It is consumed continuously. Every agent interaction, every model call, every token generated is a unit of consumption. When my sixteen agents were coordinating, they were consuming inference the way a factory floor consumes electricity: steadily, measurably, and in direct proportion to the work being performed.&lt;&#x2F;p&gt;
&lt;p&gt;It requires massive physical infrastructure. The data centers running frontier models are among the most capital-intensive facilities being built anywhere in the world right now. They require advanced silicon, enormous quantities of power, water for cooling, and physical security. They are not software projects. They are industrial projects.&lt;&#x2F;p&gt;
&lt;p&gt;It is metered and priced per unit. Every major inference provider charges per token. Input tokens, output tokens, sometimes with different rates for different capability tiers. The billing model is already a utility billing model. You pay for what you consume.&lt;&#x2F;p&gt;
&lt;p&gt;Its supply chain is subject to geography and geopolitics.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;chips-data-talent-energy&quot;&gt;Chips, data, talent, energy&lt;&#x2F;h2&gt;
&lt;p&gt;At RSAC 2026, I attended a panel with four former NSA directors and US Cyber Command commanders. Paul Nakasone laid out what he considers the four factors that determine a nation state’s strategic potentiality in this era. Not GDP. Not military strength. Four things: chips, data, talent, and energy.&lt;&#x2F;p&gt;
&lt;p&gt;I &lt;a href=&quot;&#x2F;articles&#x2F;notes-from-rsac-2026&#x2F;&quot;&gt;wrote about this&lt;&#x2F;a&gt; at the time. But I have been thinking about those four factors through a different lens since the shiioo experience. Nakasone was talking about national power. I am talking about the supply chain of a utility.&lt;&#x2F;p&gt;
&lt;p&gt;Every utility has a supply chain that determines who can produce it, at what cost, and with what dependencies. For electricity, the supply chain is fuel (coal, gas, uranium, sunlight, wind), generation infrastructure (power plants, turbines, panels), transmission (the grid), and distribution (the last mile to your outlet). For water, it is source (rivers, aquifers, desalination), treatment, transmission (pipes, pumps), and distribution.&lt;&#x2F;p&gt;
&lt;p&gt;For inference, the supply chain is Nakasone’s four factors.&lt;&#x2F;p&gt;
&lt;p&gt;Chips are the generation capacity. Advanced GPUs, specifically the frontier silicon manufactured overwhelmingly by TSMC in Taiwan and designed primarily by NVIDIA in the United States, are the turbines of the inference economy. Without them, you do not generate inference at competitive cost. The global concentration of this manufacturing capacity in a single facility on a geopolitically contested island is the equivalent of the entire world’s electricity depending on one power plant. It is a single point of failure that makes infrastructure planners lose sleep.&lt;&#x2F;p&gt;
&lt;p&gt;Energy is the fuel. Frontier inference is measured in gigawatts now, not megawatts. A single large-scale inference cluster draws more power than a small city. The &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ai-2027.com&#x2F;&quot;&gt;AI 2027&lt;&#x2F;a&gt; scenario projects global AI datacenter power consumption reaching 38 GW by 2026, and that number is rising on a curve that shows no sign of flattening. You cannot run inference without power, and you cannot build the power infrastructure overnight. The jurisdictions that have cheap, abundant, reliable energy have a structural advantage that no amount of software cleverness can compensate for.&lt;&#x2F;p&gt;
&lt;p&gt;Data is the raw material that the models learned from, the substrate that makes inference meaningful rather than random. Who has it, under what legal constraints it can be used, and how diverse and representative it is all determine the quality and applicability of the inference you can produce.&lt;&#x2F;p&gt;
&lt;p&gt;Talent is the operational workforce. Not just the researchers who design the models, but the engineers who build the infrastructure, the operators who keep it running, and the security professionals who defend it. This is the human capital layer that every utility depends on, and it is concentrated in a handful of geographic clusters for the same reasons that petrochemical engineering talent concentrates near refineries.&lt;&#x2F;p&gt;
&lt;p&gt;When you map it out, inference has the same supply chain structure as any mature utility. And the strategic implications follow directly.&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Supply chain layer&lt;&#x2F;th&gt;&lt;th&gt;Electricity&lt;&#x2F;th&gt;&lt;th&gt;Water&lt;&#x2F;th&gt;&lt;th&gt;Inference&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Raw input&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Fuel (gas, uranium, sun)&lt;&#x2F;td&gt;&lt;td&gt;Source (river, aquifer)&lt;&#x2F;td&gt;&lt;td&gt;Data (training corpora)&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Generation&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Power plants, turbines&lt;&#x2F;td&gt;&lt;td&gt;Treatment plants&lt;&#x2F;td&gt;&lt;td&gt;GPUs, data centers&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Transmission&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;The grid&lt;&#x2F;td&gt;&lt;td&gt;Pipes, pumps&lt;&#x2F;td&gt;&lt;td&gt;Networks, API endpoints&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Fuel for generation&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Primary energy&lt;&#x2F;td&gt;&lt;td&gt;Electricity for pumping&lt;&#x2F;td&gt;&lt;td&gt;Electricity (38+ GW)&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Key bottleneck&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Grid capacity, permits&lt;&#x2F;td&gt;&lt;td&gt;Water rights, drought&lt;&#x2F;td&gt;&lt;td&gt;Chip fabrication (TSMC)&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Geopolitical risk&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Pipeline politics, OPEC&lt;&#x2F;td&gt;&lt;td&gt;Cross-border rivers&lt;&#x2F;td&gt;&lt;td&gt;Taiwan, export controls&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;The analogy is not a metaphor. It is a structural description.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-ai-2027-got-right-about-the-economics&quot;&gt;What AI 2027 got right about the economics&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ai-2027.com&#x2F;&quot;&gt;AI 2027&lt;&#x2F;a&gt; scenario, written by Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean, is a speculative timeline. The authors are careful to frame it that way. But the economic projections in it have a quality that I keep returning to: they treat AI compute as an industrial resource, not a software product.&lt;&#x2F;p&gt;
&lt;p&gt;Their scenario tracks global AI datacenter spending reaching the trillion-dollar range. It maps the distribution of frontier compute capacity across nations, with the United States holding roughly 70% through its companies and China at around 12%. It projects power consumption figures that match what utility planners, not software engineers, would recognize as relevant.&lt;&#x2F;p&gt;
&lt;p&gt;What makes this framing useful is not the specific numbers. It is the category. When you project AI spending in trillions and power consumption in tens of gigawatts, you are not describing a software industry. You are describing a utility buildout. The capital expenditure patterns, the infrastructure timelines, the regulatory questions, the geopolitical competition, all of it maps to how nations have historically competed over energy infrastructure, telecommunications infrastructure, and industrial capacity.&lt;&#x2F;p&gt;
&lt;p&gt;The AI 2027 authors project that the feedback loop of AI systems accelerating AI research compresses the timeline for capability growth. Whether their specific dates hold is less important than the structural observation: if capability is growing on a steep curve and that capability requires physical infrastructure that grows on a much slower curve, then the binding constraint on AI is not software. It is infrastructure. It is the utility layer.&lt;&#x2F;p&gt;
&lt;p&gt;This is what I experienced at the personal scale with shiioo. The software worked. The orchestration model worked. What ran out was the physical resource: tokens, which is to say inference, which is to say compute, which is to say silicon and electricity. The bottleneck was not the architecture. The bottleneck was the meter.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-this-means-for-post-ai-systems&quot;&gt;What this means for post-AI systems&lt;&#x2F;h2&gt;
&lt;p&gt;If inference is a utility, then every system that depends on inference is a utility consumer. And that changes how you design those systems.&lt;&#x2F;p&gt;
&lt;p&gt;When I built shiioo, I treated token consumption as a budget to manage. That was the individual developer framing: I have a fixed allocation, I need to spend it wisely. But if you zoom out to the enterprise or the nation state, the framing shifts. You are not managing a budget. You are managing a utility dependency.&lt;&#x2F;p&gt;
&lt;p&gt;The questions become infrastructure questions. How much inference capacity do you need? Where does it come from? What happens when your provider has an outage? What are your contractual guarantees for availability and throughput? What is your fallback if the primary supply is interrupted? Do you own any generation capacity, or are you entirely dependent on external providers?&lt;&#x2F;p&gt;
&lt;p&gt;These are exactly the questions that enterprises ask about electricity, about water, about telecommunications. And they are starting to ask them about inference, even if most of them do not yet use that framing.&lt;&#x2F;p&gt;
&lt;p&gt;The organizations that will navigate this transition well are the ones that recognize what inference actually is: a consumed resource that correlates with power draw, requires physical infrastructure, and generates value in the form of information contextualization and evaluation. Every time a post-AI system communicates with another system or with a human, it is consuming inference. Every agent-to-agent message, every model evaluation, every generated response is a unit drawn from a metered supply that has real physical costs behind it.&lt;&#x2F;p&gt;
&lt;p&gt;The implications branch in several directions.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Pricing will converge toward utility models.&lt;&#x2F;strong&gt; Per-token pricing is already the norm, but the industry will move toward the more sophisticated pricing structures that utilities use: tiered rates, time-of-use pricing, capacity reservations, spot markets. Some of this is already emerging. It will accelerate as enterprises start treating inference spend the way they treat energy spend: as a major operational cost that requires dedicated procurement and optimization.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Sovereignty will matter.&lt;&#x2F;strong&gt; If inference is a utility, then depending on a foreign provider for your inference supply is the same kind of strategic vulnerability as depending on a foreign country for your energy supply. Europe learned this lesson with Russian natural gas. The question of whether you can run inference workloads within your own jurisdiction, on your own infrastructure, under your own legal framework, is not an abstract concern about data residency. It is a question of infrastructure sovereignty. This is why I built &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&#x2F;&quot;&gt;Archipelag&lt;&#x2F;a&gt;, and it is why I think the compute sovereignty conversation in Europe needs to move from policy papers to physical infrastructure.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Efficiency will become an engineering discipline.&lt;&#x2F;strong&gt; When electricity was cheap and abundant, nobody optimized for energy efficiency. When it became expensive, an entire engineering discipline emerged around it. The same will happen with inference. Right now, most systems that use inference do so profligately, full context windows, verbose prompts, redundant calls. As inference cost becomes a meaningful line item, optimizing for token efficiency will become as normal as optimizing for energy efficiency. The structured communication protocols I described in the shiioo article, typed messages instead of free-form conversation, are an early example of this. Every token should carry information, not politeness.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Metering and observability will be essential.&lt;&#x2F;strong&gt; You cannot manage a utility you cannot measure. Enterprises will need inference observability the same way they need power monitoring and network monitoring: real-time visibility into consumption, cost attribution to specific workloads and teams, anomaly detection for unexpected usage spikes, and capacity planning based on historical patterns. The tooling for this barely exists today. It will be a significant market within a few years.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-meter-is-the-message&quot;&gt;The meter is the message&lt;&#x2F;h2&gt;
&lt;p&gt;When I ran out of tokens over Christmas, my first reaction was frustration. My second reaction was to think about the architecture differently, to design for token efficiency, to route cheap communication through cheaper models. But my third reaction, the one that has stayed with me longest, was recognition.&lt;&#x2F;p&gt;
&lt;p&gt;I recognized the shape of the problem. It was not a new shape. It was the shape of every utility constraint I have ever encountered. The shape of “the infrastructure is the bottleneck.” The shape of “the resource is finite and metered and you need to think about your consumption.” The shape of “the supply chain is geopolitical.”&lt;&#x2F;p&gt;
&lt;p&gt;Nakasone’s four factors, chips, data, talent, energy, are not just a framework for assessing national power. They are the bill of materials for producing inference. And the nations, enterprises, and individuals who control that bill of materials will have the same structural advantage that energy-rich nations had in the industrial age and bandwidth-rich nations had in the information age.&lt;&#x2F;p&gt;
&lt;p&gt;The fifth utility is here. We are just early enough that most people still think they are buying software.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;&#x2F;articles&#x2F;what-sixteen-ai-agents-taught-me-about-management&#x2F;&quot;&gt;What Sixteen AI Agents Taught Me About Management&lt;&#x2F;a&gt; - The predecessor to this article&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;&#x2F;articles&#x2F;notes-from-rsac-2026&#x2F;&quot;&gt;Notes from RSAC 2026&lt;&#x2F;a&gt; - Paul Nakasone’s four factors and the geopolitical dimension&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ai-2027.com&#x2F;&quot;&gt;AI 2027&lt;&#x2F;a&gt; - Scenario work by Kokotajlo, Alexander, Larsen, Lifland, and Dean&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&#x2F;&quot;&gt;Archipelag&lt;&#x2F;a&gt; - Decentralized, sovereignty-first AI compute network&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;&#x2F;articles&#x2F;how-i-work-these-days&#x2F;&quot;&gt;How I Work These Days&lt;&#x2F;a&gt; - Where I first wrote about the shift to agent-driven development&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>What Comes After the Last Programming Language</title>
          <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/what-comes-after-the-last-programming-language/</link>
          <guid>https://raskell.io/articles/what-comes-after-the-last-programming-language/</guid>
          <description xml:base="https://raskell.io/articles/what-comes-after-the-last-programming-language/">&lt;p&gt;In &lt;a href=&quot;&#x2F;articles&#x2F;what-programming-languages-become-when-ai-writes-the-code&#x2F;&quot;&gt;The Last Programming Language Might Not Be for Humans&lt;&#x2F;a&gt;, I described three futures for programming languages as AI becomes the primary author of code. Explicit languages for machines. Declarative languages where types are proofs. And ultimately, no language at all, where AI generates machine code directly and the intermediate layer disappears.&lt;&#x2F;p&gt;
&lt;p&gt;I left something out. There is a fourth possibility, and it goes deeper than the language.&lt;&#x2F;p&gt;
&lt;p&gt;What happens to the operating system?&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-operating-system-was-designed-for-typists&quot;&gt;The operating system was designed for typists&lt;&#x2F;h2&gt;
&lt;p&gt;Every operating system in production today, Linux, Windows, macOS, the BSDs, was built on the same foundational assumption: a human writes a program, the program is compiled into a sequence of CPU instructions, and the OS manages the execution of those instructions. Processes. Threads. System calls. Virtual memory. File descriptors. Schedulers. These abstractions exist because the fundamental unit of work is a sequence of CPU operations authored by a human programmer.&lt;&#x2F;p&gt;
&lt;p&gt;This is not an exaggeration. Look at the POSIX specification. &lt;code&gt;fork()&lt;&#x2F;code&gt; creates a copy of the calling process. &lt;code&gt;exec()&lt;&#x2F;code&gt; replaces the current process image with a new program. &lt;code&gt;read()&lt;&#x2F;code&gt; and &lt;code&gt;write()&lt;&#x2F;code&gt; move bytes between a process and a file descriptor. &lt;code&gt;mmap()&lt;&#x2F;code&gt; maps a file into the process’s virtual address space. Every one of these primitives assumes a CPU-centric, sequential execution model where a process is a container for a stream of instructions that the CPU executes one at a time (or a few at a time, with threads).&lt;&#x2F;p&gt;
&lt;p&gt;This made sense for sixty years. Dennis Ritchie and Ken Thompson designed Unix in 1969 around the PDP-7, a machine with a single CPU and 18-bit words. The abstractions they chose, processes, pipes, files as byte streams, were elegant reflections of what the hardware could do. Those abstractions survived because they generalized well. When CPUs got faster, processes got faster. When CPUs got more cores, threads mapped naturally onto them. When networks arrived, sockets extended the file descriptor model. The operating system grew, but the foundational unit of work never changed: a sequence of CPU instructions, managed by a scheduler, isolated by virtual memory.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-the-gpu-became-an-accidental-general-purpose-computer&quot;&gt;How the GPU became an accidental general-purpose computer&lt;&#x2F;h2&gt;
&lt;p&gt;The GPU was never meant to be here. Its entire history is a sequence of coincidences.&lt;&#x2F;p&gt;
&lt;p&gt;In the early 1990s, GPUs existed to draw triangles. Silicon Graphics made specialized hardware for 3D rendering, and the rest of the industry followed. NVIDIA’s GeForce 256, released in 1999, was marketed as the world’s first “GPU,” a term NVIDIA invented. Its job was to take vertices and textures from a CPU program, transform them, and rasterize them to a framebuffer. It was a peripheral. A display adapter with math capabilities.&lt;&#x2F;p&gt;
&lt;p&gt;Then game developers started abusing the hardware. Shader programs, originally designed for lighting and surface effects, turned out to be tiny parallel programs that could do arbitrary computation. By the mid-2000s, researchers at Stanford and elsewhere realized you could encode general-purpose math problems as texture operations: matrix multiplications as pixel shaders, fluid simulations as render passes. It was a hack. The GPU did not know it was doing science. It thought it was rendering a really weird image.&lt;&#x2F;p&gt;
&lt;p&gt;NVIDIA saw the opportunity and released CUDA in 2007, giving the GPU a proper programming model for general-purpose computation. But the architecture of the system did not change. CUDA was a userspace library. The GPU driver ran outside the kernel. The operating system still treated the GPU as a display device. The OS scheduled CPU processes and managed CPU memory. The GPU scheduled its own work and managed its own memory, through CUDA, through the driver, outside the OS’s view.&lt;&#x2F;p&gt;
&lt;p&gt;Then came cryptocurrency mining. Bitcoin miners discovered that GPUs were vastly more efficient than CPUs for SHA-256 hashing, because the algorithm is embarrassingly parallel and the GPU has thousands of cores. This was the first mass-market workload where the GPU did the economically valuable work and the CPU was just overhead. Mining rigs were machines where the CPU was an afterthought, a cheap Celeron whose only job was to feed work to a rack of GPUs. But the operating system running on that Celeron was still Linux, still managing the GPU through CUDA, still treating it as a peripheral.&lt;&#x2F;p&gt;
&lt;p&gt;Then came machine learning. First training (AlexNet in 2012, the moment deep learning became real), then inference. Each transition was coincidental. Nobody designed GPUs to be good at neural network training. They just happened to have the right characteristics: massive parallelism, high memory bandwidth, and a programming model that could express matrix multiplications. The workloads found the hardware. The hardware did not seek the workloads.&lt;&#x2F;p&gt;
&lt;p&gt;And so we arrived at 2026 with the most important compute workload in a generation, AI inference, running on hardware that was originally designed to render Quake, managed by drivers that bypass the operating system, scheduled by a proprietary runtime that the kernel cannot see or control. The GPU is the most important processor in the machine, and the OS does not know what it is doing.&lt;&#x2F;p&gt;
&lt;p&gt;Nearly twenty years after CUDA, we are still using that model. The workloads have changed beyond recognition. The software stack has not.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-cpus-and-gpus-actually-differ&quot;&gt;How CPUs and GPUs actually differ&lt;&#x2F;h2&gt;
&lt;p&gt;To understand why this matters, it helps to understand how CPUs and GPUs compute differently. Not at the marketing level. At the architectural level.&lt;&#x2F;p&gt;
&lt;p&gt;A CPU is designed for latency. It has a small number of powerful cores (8 to 128 on a modern server chip), each with deep pipelines, branch predictors, out-of-order execution engines, and large caches. Each core is optimized to execute a single thread of instructions as fast as possible. When your program says “if this, then that,” the CPU predicts which branch you will take and starts executing it before it knows the answer. When your program accesses memory, the CPU has three levels of cache to hide the latency of going to DRAM. The entire design optimizes for one thing: getting through a single sequence of instructions with minimal delay.&lt;&#x2F;p&gt;
&lt;p&gt;A GPU is designed for throughput. It has thousands of small cores (16,384 CUDA cores on an H100), each simple, each capable of executing one instruction per clock, grouped into blocks that execute the same instruction on different data simultaneously. There is no branch predictor because all threads in a warp (a group of 32) execute the same instruction at the same time. If your program has a branch, both paths execute and the unwanted results are discarded. There is very little cache because the design assumes you are streaming through large data sets, not randomly accessing small ones. The entire design optimizes for one thing: doing the same operation on as many data points as possible simultaneously.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;CPU vs GPU architecture&quot;&gt;CPU (latency-optimized)                 GPU (throughput-optimized)
========================                ============================

+---------+  +---------+               +--+--+--+--+--+--+--+--+--+
| Core 0  |  | Core 1  |  ...  8-128   |  |  |  |  |  |  |  |  |  |
| complex |  | complex |  cores        |  |  |  |  |  |  |  |  |  |
| OoO exec|  | OoO exec|               |  |  |  |  |  |  |  |  |  |
| branch  |  | branch  |               |  SM  |  |  SM  |  |  SM  |
| predict |  | predict |               |  |  |  |  |  |  |  |  |  |
+---------+  +---------+               |  |  |  |  |  |  |  |  |  |
     |            |                     +--+--+--+--+--+--+--+--+--+
+----+----+  +----+----+                      thousands of cores
| L1  32K |  | L1  32K |                      simple, in-order
+---------+  +---------+                      same instruction,
+----+------------+----+                      different data
|     L2  ~1 MB        |
+-----------------------+               +---------------------------+
|     L3  ~32 MB        |               |     HBM  ~80 GB          |
+-----------------------+               |     ~3 TB&amp;#x2F;s bandwidth    |
|     DRAM  ~512 GB     |               +---------------------------+
|     ~100 GB&amp;#x2F;s         |
+-----------------------+               One instruction, 16384 data
                                        points at once
One instruction at a time,
very fast&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This difference is why matrix multiplication, the core operation in neural network inference, runs three orders of magnitude faster on a GPU. A matrix multiply is thousands of multiply-and-add operations on independent data points. The CPU executes them one at a time (or a few at a time with SIMD), fast but serial. The GPU executes thousands simultaneously, each on its own core. The CPU finishes one row while the GPU finishes the whole matrix.&lt;&#x2F;p&gt;
&lt;p&gt;For traditional software, CPU architecture is perfect. A web server parses HTTP requests (branchy, sequential), queries a database (latency-sensitive, cache-friendly), and formats a response (string operations, unpredictable access patterns). Each request is different. Each code path branches differently. The CPU’s branch predictor, out-of-order execution, and deep cache hierarchy are exactly right.&lt;&#x2F;p&gt;
&lt;p&gt;For inference, GPU architecture is perfect. Each layer of a transformer is a dense matrix multiplication followed by an element-wise nonlinearity. Every token in the batch gets the same operations applied to the same weights. There are almost no branches. The data is enormous and streaming. The GPU’s thousands of simple cores and high-bandwidth memory are exactly right.&lt;&#x2F;p&gt;
&lt;p&gt;The problem is that we run both workloads on an operating system that only understands the first kind.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-gpu-is-becoming-the-computer&quot;&gt;The GPU is becoming the computer&lt;&#x2F;h2&gt;
&lt;p&gt;When you run inference on a large language model, the GPU is not assisting the CPU. The GPU is doing the work. The CPU’s role is orchestration: loading weights, managing memory, feeding tokens, collecting output. The actual computation, the matrix multiplications that turn a prompt into a response, happens on the GPU. For a 70-billion-parameter model, the GPU does billions of floating-point operations per token. The CPU does bookkeeping.&lt;&#x2F;p&gt;
&lt;p&gt;The numbers make this concrete. An NVIDIA H100 delivers roughly 1,979 teraflops of FP8 compute. The CPU it is paired with, typically an AMD EPYC or Intel Xeon, delivers maybe 2-3 teraflops of FP32. The GPU has three orders of magnitude more compute throughput for the operations that matter in inference. When a request arrives at an inference endpoint, the CPU spends microseconds parsing HTTP and tokenizing text. The GPU spends milliseconds doing the actual thinking. The ratio of useful work is not close.&lt;&#x2F;p&gt;
&lt;p&gt;This inversion has happened gradually enough that we have not fully reckoned with it. We still run inference workloads on Linux. We still manage GPU memory through CUDA driver calls from userspace processes. We still treat the GPU as a device the OS mediates access to, the same way it mediates access to a disk or a network card.&lt;&#x2F;p&gt;
&lt;p&gt;But a disk does not run your business logic. A network card does not make decisions. The GPU increasingly does both. When an AI agent decides whether to approve a transaction, route a request, or generate a response, the decision happens on the GPU. The CPU is the secretary. The GPU is the executive.&lt;&#x2F;p&gt;
&lt;p&gt;If the executive is making the decisions, why is the secretary’s office designed for the secretary?&lt;&#x2F;p&gt;
&lt;p&gt;This is not merely an aesthetic complaint. The mismatch has practical consequences. GPU memory management is manual and error-prone. Context switches between GPU workloads are expensive because the OS has no concept of GPU process state. Scheduling is done by the CUDA driver, not by the kernel, which means the OS cannot enforce fairness or priority between GPU workloads the way it enforces them between CPU processes. And isolation, the most critical property for multi-tenant inference, depends entirely on userspace software that the OS does not control or verify.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-an-inference-native-os-would-look-like&quot;&gt;What an inference-native OS would look like&lt;&#x2F;h2&gt;
&lt;p&gt;Imagine an operating system where inference is the fundamental compute primitive. Not a syscall you invoke. Not a library you link. The basic unit of work.&lt;&#x2F;p&gt;
&lt;p&gt;In a traditional OS, the primitive is a process: an isolated address space running a sequence of CPU instructions with access to file descriptors, network sockets, and memory. The scheduler gives each process time on the CPU. The kernel mediates access to shared resources.&lt;&#x2F;p&gt;
&lt;p&gt;In an inference-native OS, the primitive would be a shard: an isolated execution context with dedicated GPU resources, its own VRAM partition, its own compute units, its own inference pipeline. The scheduler does not give shards time on the CPU. It gives them capacity on the GPU. The kernel does not mediate file descriptors. It mediates model weights, token streams, and attention contexts.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Traditional OS vs inference-native OS&quot;&gt;Traditional OS                          Inference-native OS
=================                       ====================

Process A  Process B  Process C         Shard A    Shard B    Shard C
   |          |          |                 |          |          |
   v          v          v                 v          v          v
+-------------------------------+    +-------------------------------+
|     CPU scheduler             |    |     GPU scheduler             |
|     (time-slicing)            |    |     (capacity-slicing)        |
+-------------------------------+    +-------------------------------+
   |          |          |                 |          |          |
   v          v          v                 v          v          v
+------+  +------+  +------+        +--------+  +--------+  +--------+
| Core | | Core | | Core  |        | CU     | | CU      | | CU     |
| 0    | | 1    | | 2     |        | slice  | | slice   | | slice  |
+------+  +------+  +------+        +--------+  +--------+  +--------+
                                          |          |          |
   GPU is a peripheral                    v          v          v
   called via ioctl&amp;#x2F;CUDA              +-------------------------------+
                                      |     VRAM partitions           |
                                      |     (isolated per shard)      |
                                      +-------------------------------+&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The analogy that keeps coming back to me is mainframes and timesharing. In the 1960s, computers were batch-processing machines. You submitted a job, waited, got results. Then Multics and later Unix introduced timesharing: multiple users, each with the illusion of having the whole machine, isolated from each other by the OS. The transition was not just a performance improvement. It was a conceptual shift in what the machine was for. It went from “a machine that runs one job at a time” to “a machine that runs many jobs concurrently, safely isolated.”&lt;&#x2F;p&gt;
&lt;p&gt;We need the same transition for GPUs. Right now, GPU computing is in its batch-processing era. One workload gets the GPU (or a partition of it, managed by CUDA MPS or MIG), and isolation is an afterthought bolted on by the driver. The inference-native equivalent of timesharing would be an OS that treats GPU capacity the way Unix treats CPU time: a shared resource, securely partitioned, with each tenant unable to see or affect the others.&lt;&#x2F;p&gt;
&lt;p&gt;The isolation model is the critical piece. In a traditional OS, processes are isolated by virtual memory on the CPU side. Two processes cannot read each other’s RAM because the page tables prevent it. The MMU (Memory Management Unit) enforces this in hardware. It is not a software convention. It is a physical guarantee.&lt;&#x2F;p&gt;
&lt;p&gt;But GPU memory is a different story. In most systems, GPU memory isolation depends on the CUDA driver, which runs in userspace. NVIDIA’s MIG (Multi-Instance GPU) provides hardware partitioning on some GPU models, but it is coarse-grained (up to 7 instances on an A100) and not available on consumer hardware. A vulnerability in the driver, or in any process with GPU access, can potentially read VRAM belonging to another workload. For inference workloads handling sensitive data, this is not acceptable.&lt;&#x2F;p&gt;
&lt;p&gt;Hardware-level isolation is the answer. Intel VT-d IOMMU can enforce DMA translation boundaries so that each GPU partition’s VRAM is physically inaccessible to other partitions. Not software-isolated. Hardware-isolated. The same level of guarantee that CPU virtual memory provides, but for GPU resources. The technology exists. It is used in server virtualization today for PCIe passthrough. The question is whether an OS can be built that uses it as a first-class isolation primitive for inference workloads, not as a virtualization feature.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;coconutos&quot;&gt;coconutOS&lt;&#x2F;h2&gt;
&lt;p&gt;This is not purely theoretical. I have been building a proof of concept.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;coconut-os&#x2F;coconutOS&quot;&gt;coconutOS&lt;&#x2F;a&gt; is a capability-based microkernel written in Rust, engineered specifically for GPU-isolated AI inference. It boots on x86-64 hardware via UEFI, and its entire architecture is designed around the idea that the GPU is the primary execution engine.&lt;&#x2F;p&gt;
&lt;p&gt;The choice of a microkernel is deliberate, and it carries historical baggage worth addressing.&lt;&#x2F;p&gt;
&lt;p&gt;The microkernel idea goes back to the 1980s. Andrew Tanenbaum, the computer scientist at Vrije Universiteit Amsterdam, built &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.minix3.org&#x2F;&quot;&gt;Minix&lt;&#x2F;a&gt; as a teaching OS based on the principle that the kernel should do as little as possible: manage memory, schedule tasks, pass messages. Everything else, file systems, drivers, network stacks, runs in userspace as separate processes. If a driver crashes, the kernel restarts it. The system keeps running.&lt;&#x2F;p&gt;
&lt;p&gt;Tanenbaum and Linus Torvalds had a famous debate about this in 1992 on the comp.os.minix newsgroup. Torvalds argued that monolithic kernels were faster and more practical. Tanenbaum argued that microkernels were more reliable and that “Linux is obsolete.” Torvalds won the practical argument. Linux became the dominant OS precisely because a monolithic kernel is simpler to build and faster to run when all your workloads are CPU-centric. Putting the file system and drivers in kernel space avoids the overhead of message passing between userspace processes.&lt;&#x2F;p&gt;
&lt;p&gt;But Tanenbaum’s argument was about fault isolation, and fault isolation becomes more important as the consequences of failure increase. When the failure is “the file server process crashes and is restarted in 50 milliseconds,” a monolithic kernel’s performance advantage wins. When the failure is “a GPU driver bug in kernel space lets one tenant’s inference workload read another tenant’s medical records from VRAM,” the calculus changes.&lt;&#x2F;p&gt;
&lt;p&gt;In a monolithic kernel like Linux, the GPU driver runs in kernel space with full access to system memory. A bug in the NVIDIA driver (and there have been many: &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nvidia.custhelp.com&#x2F;app&#x2F;answers&#x2F;detail&#x2F;a_id&#x2F;5551&quot;&gt;CVE-2024-0090&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nvidia.custhelp.com&#x2F;app&#x2F;answers&#x2F;detail&#x2F;a_id&#x2F;5551&quot;&gt;CVE-2024-0092&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nvidia.custhelp.com&#x2F;app&#x2F;answers&#x2F;detail&#x2F;a_id&#x2F;5491&quot;&gt;CVE-2023-31018&lt;&#x2F;a&gt; to name a few from recent years) can compromise the entire system. In a microkernel, the GPU HAL runs as an isolated shard in userspace. A bug in the HAL crashes that shard. The kernel survives. Other shards survive. The failure domain is contained.&lt;&#x2F;p&gt;
&lt;p&gt;Tanenbaum was right about the principle. He was just early about the workload that would make it matter. GPU-isolated inference is that workload. The performance overhead of microkernel message passing is irrelevant when the actual work happens on the GPU and the kernel’s job is orchestration, not computation. The CPU side is the control plane. The GPU side is the data plane. A microkernel is the right architecture for a control plane.&lt;&#x2F;p&gt;
&lt;p&gt;The core abstractions:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Shards, not processes.&lt;&#x2F;strong&gt; Each shard is an isolated address space with its own page tables, running in ring 3. But unlike a traditional process, a shard’s primary resource is not CPU time. It is GPU capacity. A shard gets a partition of VRAM, a slice of compute units, and a dedicated HAL (Hardware Abstraction Layer) shard that manages its access to the GPU. The HAL shard itself is unprivileged. It communicates with the kernel through the same capability-based IPC that every other shard uses. There is no “root” equivalent for GPU access.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Capabilities, not permissions.&lt;&#x2F;strong&gt; Access control is capability-based, inspired by systems like &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;sel4.systems&#x2F;&quot;&gt;seL4&lt;&#x2F;a&gt; and the research that came out of the University of Cambridge’s Computer Laboratory. Each shard holds unforgeable capability tokens that grant access to specific resources: VRAM regions, IPC channels, filesystem paths, GPU compute slices. Capabilities can be granted, revoked, restricted, and inspected. There are no ambient permissions. A shard cannot access anything it does not hold a capability for. This is fundamentally different from Unix permissions, where a root process can access everything. In coconutOS, there is no root. There are only capabilities.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;GPU ASLR.&lt;&#x2F;strong&gt; Each shard’s VRAM and MMIO virtual addresses are randomized. Even if an attacker finds a vulnerability in one shard, they cannot predict where another shard’s GPU memory is mapped. CPU-side ASLR has been standard since the mid-2000s. The insight is that the same principle applies to GPU resources, and that without it, a GPU memory disclosure vulnerability in one workload can be used to locate and read another workload’s model weights or inference state.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Pledge and unveil for GPUs.&lt;&#x2F;strong&gt; This is one of the design choices I am most attached to, because it comes directly from my years of admiring OpenBSD. Theo de Raadt’s &lt;code&gt;pledge(2)&lt;&#x2F;code&gt; and &lt;code&gt;unveil(2)&lt;&#x2F;code&gt; syscalls are among the most elegant security primitives ever designed. A process calls &lt;code&gt;pledge(&quot;stdio rpath&quot;)&lt;&#x2F;code&gt; and permanently gives up the ability to do anything except read files and use standard I&#x2F;O. It cannot escalate back. The promise is irreversible.&lt;&#x2F;p&gt;
&lt;p&gt;coconutOS applies the same idea to GPU resources. &lt;code&gt;pledge_gpu&lt;&#x2F;code&gt; lets a shard declare that it will only do inference, not training. Once pledged, it cannot allocate new VRAM beyond its partition, cannot modify model weights, cannot access raw compute dispatch. &lt;code&gt;unveil_vram&lt;&#x2F;code&gt; lets a shard lock its VRAM view to a specific region. Other regions become physically invisible through the IOMMU. A compromised inference shard cannot undo its own containment because the kernel enforces the pledge at the hardware level.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;The inference stack.&lt;&#x2F;strong&gt; The proof of concept runs a transformer forward pass end-to-end: RMSNorm, multi-head attention with rotary position embeddings (RoPE), SiLU feed-forward networks, softmax. It is based on Andrej Karpathy’s &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;karpathy&#x2F;llama2.c&quot;&gt;llama2.c&lt;&#x2F;a&gt;, adapted to run as a shard with GPU isolation. The kernel preserves FPU and SSE state across preemption using FXSAVE&#x2F;FXRSTOR, so inference math is not corrupted by context switches. This is a detail that matters: if the scheduler preempts a shard mid-matrix-multiply and does not preserve the floating-point register state, the results will be silently wrong. Traditional OSes handle this for CPU processes. coconutOS handles it for GPU inference shards.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-this-matters&quot;&gt;Why this matters&lt;&#x2F;h2&gt;
&lt;p&gt;You might look at coconutOS and think: this is an interesting research project, but nobody is going to replace Linux for AI workloads. And you might be right. Linux is entrenched. CUDA is entrenched. The entire AI infrastructure stack, from PyTorch to vLLM to Triton Inference Server, is built on top of assumptions that coconutOS challenges.&lt;&#x2F;p&gt;
&lt;p&gt;But consider the trajectory.&lt;&#x2F;p&gt;
&lt;p&gt;Five years ago, AI inference was a batch job you ran on a cluster. You uploaded data, kicked off a job, came back later for results. The security model was simple: whoever had SSH access to the machine had access to the GPU. Isolation was a non-issue because there was only one workload.&lt;&#x2F;p&gt;
&lt;p&gt;Today, inference is a real-time service. Companies like Anthropic, OpenAI, and Google serve billions of inference requests per day. Multiple customers share the same GPU hardware. The workloads process sensitive data: medical records, financial transactions, legal documents, personal conversations. The security and isolation requirements have changed fundamentally, but the OS layer has not changed at all. We are running safety-critical, multi-tenant inference workloads on an operating system that was designed in 1991 for running file servers and web servers.&lt;&#x2F;p&gt;
&lt;p&gt;The gap between what we are doing on GPUs and how we manage GPU access is widening. Right now, GPU multi-tenancy in the cloud means trusting the CUDA driver and the hypervisor to keep workloads separated. NVIDIA’s MIG helps, but it is only available on data center GPUs (A100, H100), offers coarse-grained partitioning (7 instances maximum), and still relies on the CUDA driver for memory management within each partition. For most use cases, this is fine. For financial services, healthcare, defense, and any context where inference handles regulated data, “trusting the driver” is not a compliance-grade answer.&lt;&#x2F;p&gt;
&lt;p&gt;The analogy is virtualization. Before Xen and KVM, server multi-tenancy meant trusting the host OS to isolate users. It worked until it did not. Hardware virtualization (VT-x) gave us actual isolation guarantees enforced by the CPU. The cloud was built on those guarantees. We need the same thing for GPUs. Hardware-level isolation, enforced by the kernel, with capability-based access control and monotonic privilege restriction, is where this has to go. Whether it looks like coconutOS or like a set of patches to Linux or like NVIDIA building isolation into their next GPU architecture is an implementation question. The architectural direction is not optional.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-four-futures-updated&quot;&gt;The four futures, updated&lt;&#x2F;h2&gt;
&lt;p&gt;Looking back at the original post, the timeline extends further than I initially described:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;The full timeline&quot;&gt;Near term        Medium term       Long term         Far horizon
(now)            (2-5 years)       (5-15 years)      (10-20 years)

Explicit         Declarative       Post-language     Inference-native
languages        languages         (AI-native        operating systems
(Vera)           (Haskell + BHC)   targets)

Reduce noise --&amp;gt; Change the   --&amp;gt;  Remove the   --&amp;gt;  Redesign the
in the loop      signal            layer             machine

HOW, but         WHAT, verified    Intent to         Intent to
unambiguous      by types          execution         inference&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The first three futures are about the intermediate layer between human intent and machine execution. The fourth future is about the machine itself. If the primary workload is inference, the machine should be designed for inference. Not adapted. Not extended. Designed.&lt;&#x2F;p&gt;
&lt;p&gt;Each transition in this timeline follows the same pattern that has repeated throughout the history of computing. When a new type of workload becomes dominant, the infrastructure eventually reshapes itself around that workload. Mainframes were redesigned for timesharing when interactive use became the dominant workload. Server hardware was redesigned for virtualization when multi-tenancy became the dominant workload. Network infrastructure was redesigned for packet switching when interactive data became more important than circuit-switched voice. The question is never whether the infrastructure will adapt. It is how long the transition takes and what it looks like on the other side.&lt;&#x2F;p&gt;
&lt;p&gt;This is admittedly the furthest out of the four possibilities. We are even further from inference-native operating systems than we are from AI generating machine code directly. The hardware support is early. The software stack is embryonic. coconutOS boots and runs a transformer forward pass, which is a start, but it is a long way from something you would deploy in production.&lt;&#x2F;p&gt;
&lt;p&gt;But the same was true of virtual memory when MIT’s Project MAC implemented it in 1961. It took decades for hardware support to mature, for operating systems to build on it, for the abstraction to become invisible. Today, every program you run uses virtual memory and nobody thinks about it. The same was true of containerization when Google started using cgroups internally in 2006. Docker did not arrive until 2013. Kubernetes until 2014. The gap between “research prototype” and “runs the world” is real, but so is the trajectory.&lt;&#x2F;p&gt;
&lt;p&gt;The workloads are catching up. The hardware is catching up. The question is whether we build the OS to match, or whether we keep running inference on an operating system that thinks the CPU is in charge.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;honest-assessment&quot;&gt;Honest assessment&lt;&#x2F;h2&gt;
&lt;p&gt;I do not know if inference-native operating systems will happen in this form. I do not know if the microkernel approach is right, or if the better path is extending Linux with GPU isolation primitives (the way cgroups extended Linux for containers, the way KVM extended Linux for virtualization). I do not know if hardware vendors will build the IOMMU support that makes per-shard GPU isolation practical at scale, or if NVIDIA will solve this problem at the driver level before anyone needs a new OS.&lt;&#x2F;p&gt;
&lt;p&gt;There are smart people working on adjacent problems. Google’s TPU architecture has custom scheduling and isolation built into the hardware. AMD’s ROCm is exploring open-source alternatives to CUDA’s closed driver model. Intel’s GPU roadmap includes hardware virtualization features. Any of these paths could make coconutOS’s approach unnecessary by solving the isolation problem at a different layer.&lt;&#x2F;p&gt;
&lt;p&gt;What I do know is that running inference workloads on operating systems designed for sequential CPU programs is an impedance mismatch that will become less tolerable as inference becomes more critical. Something will change. coconutOS is my attempt to explore what that something might look like, with real code that boots on real hardware and runs a real transformer.&lt;&#x2F;p&gt;
&lt;p&gt;The code is open source, ISC-licensed, and very much a work in progress. If you are thinking about similar problems, I want to hear from you.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;coconut-os&#x2F;coconutOS&quot;&gt;github.com&#x2F;coconut-os&#x2F;coconutOS&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>What Sixteen AI Agents Taught Me About Management</title>
          <pubDate>Thu, 16 Apr 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/what-sixteen-ai-agents-taught-me-about-management/</link>
          <guid>https://raskell.io/articles/what-sixteen-ai-agents-taught-me-about-management/</guid>
          <description xml:base="https://raskell.io/articles/what-sixteen-ai-agents-taught-me-about-management/">&lt;p&gt;Over the 2025 Christmas holidays I had sixteen AI agents running in parallel across four macOS workspaces. Each workspace held four Ghostty terminal panes, each pane running its own Claude Code instance, each instance working on a different piece of a different project. I was on Anthropic’s 20x Max subscription, and during the holiday period the token limits were generous enough that I could burn through context at a rate I had never attempted before.&lt;&#x2F;p&gt;
&lt;p&gt;It was the most productive week of my engineering life. It was also the week I learned that managing AI agents is, at its core, a management problem. Not a technical one.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-terminal-wall&quot;&gt;The terminal wall&lt;&#x2F;h2&gt;
&lt;p&gt;The setup started simple. Mitchell Hashimoto’s Ghostty, which I consider one of the best terminal emulators released in years, supports split panes in both directions. Four panes per workspace. Four workspaces. Sixteen agents. Each one working on a real task: scaffolding a new crate, writing tests for a module I had sketched out, researching an API I needed to integrate, refactoring a component I had been putting off for months.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;The terminal wall: 4 workspaces x 4 Ghostty panes&quot;&gt;Workspace 1              Workspace 2              Workspace 3              Workspace 4
===================      ===================      ===================      ===================
| Agent 1 | Agent 2|     | Agent 5 | Agent 6|     | Agent 9  | Agent 10|   | Agent 13 | Agent 14|
|  scaffold|  tests |     |  refactor| research|   |  API     |  migrate|   |  docs    |  bench  |
|---------|---------|     |---------|---------|     |---------|---------|   |---------|---------|
| Agent 3 | Agent 4|     | Agent 7 | Agent 8|     | Agent 11 | Agent 12|   | Agent 15 | Agent 16|
|  config |  lint  |     |  deploy |  review |     |  schema  |  ci     |   |  proto   |  fuzz   |
===================      ===================      ===================      ===================

                         Me: switching between workspaces,
                         scrolling back through conversations,
                         keeping a text file of who does what&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The first few hours were exhilarating. I would spin up an agent, give it a task, switch to the next pane, give it a task, move to the next workspace, repeat. By mid-morning I had more concurrent engineering work happening than most small teams produce in a day.&lt;&#x2F;p&gt;
&lt;p&gt;Then the problems started.&lt;&#x2F;p&gt;
&lt;p&gt;I could not remember what each agent was doing. This was before Claude Code added the prompt and context summary at the top of the input line, so the only way to check was to scroll back through the conversation history and find the original prompt. With sixteen sessions, that meant a lot of scrolling. I started naming my macOS workspaces. I named each Ghostty pane. I kept a text file open with a list of which agent was doing what. I was, without quite realizing it, building a project management system out of sticky notes and window titles.&lt;&#x2F;p&gt;
&lt;p&gt;The irony was not lost on me. I had sixteen AI agents doing engineering work, and I was spending half my time doing the one thing AI was supposed to eliminate: keeping track of who was working on what.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;kage-the-first-attempt&quot;&gt;Kage, the first attempt&lt;&#x2F;h2&gt;
&lt;p&gt;That frustration led to &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;kage&quot;&gt;kage&lt;&#x2F;a&gt;. The name means shadow in Japanese. The idea was straightforward: build a Rust binary that could orchestrate multiple terminal sessions, let me switch between them, and show me at a glance what each one was doing.&lt;&#x2F;p&gt;
&lt;p&gt;kage used &lt;code&gt;alacritty_terminal&lt;&#x2F;code&gt; (the terminal emulation library extracted from the Alacritty terminal emulator) to manage PTY sessions, &lt;code&gt;ratatui&lt;&#x2F;code&gt; for a terminal UI, and &lt;code&gt;redb&lt;&#x2F;code&gt; for local persistence. I could spawn agents with goals, set iteration limits, checkpoint and resume sessions, and pool across multiple LLM providers. The vision was that I could use Claude Code alongside OpenAI’s Codex and whatever else was available, routing tasks to whichever provider had capacity.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;kage: terminal session orchestrator&quot;&gt;+-----------------------------------------------------------+
|                      kage TUI (ratatui)                   |
|  +-------+  +-------+  +-------+  +-------+  +-------+   |
|  | Sess 1|  | Sess 2|  | Sess 3|  | Sess 4|  |  ...  |   |
|  | goal: |  | goal: |  | goal: |  | goal: |  |       |   |
|  | scaf. |  | tests |  | refac.|  | API   |  |       |   |
|  +---+---+  +---+---+  +---+---+  +---+---+  +-------+   |
|      |          |           |          |                   |
+------|----------|-----------|----------|-------------------+
       |          |           |          |
  +----v----+  +--v----+  +--v----+  +--v----+
  | PTY     |  | PTY   |  | PTY   |  | PTY   |    alacritty_terminal
  | (Claude)|  |(Codex)|  |(Claude|  |(Claude|    manages each session
  +---------+  +-------+  +-------+  +-------+

  +---------------------+    +------------------+
  | redb (persistence)  |    | LLM provider pool|
  | checkpoints, goals, |    | Anthropic, OpenAI|
  | session state       |    | route by capacity|
  +---------------------+    +------------------+&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It worked. Technically. I could see all my sessions in one interface. I could switch between them. I could track what each agent was supposed to be doing.&lt;&#x2F;p&gt;
&lt;p&gt;But it felt clunky. The terminal streaming had issues. The UI was functional but not fluid. And more importantly, I realized the tool was solving the wrong problem. The problem was not that I needed a better way to look at sixteen terminal panes. The problem was that sixteen independent agents with no coordination between them produced sixteen independent streams of work with no coherence between them. Agent A would refactor a module that agent B was simultaneously writing tests for, using the old API. Agent C would make a design decision that conflicted with what agent D was building. I was the only point of integration, and I could not context-switch fast enough to catch every conflict.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;The coordination problem kage could not solve&quot;&gt;Agent A: refactors auth module         Agent B: writes tests for auth module
  |                                       |
  +-- removes old_login()                 +-- tests old_login()
  +-- renames to authenticate()           +-- expects old return type
  +-- changes return type                 +-- passes locally (stale code)
                                          +-- FAILS after A merges

Agent C: picks REST for new API        Agent D: builds gRPC client for new API
  |                                       |
  +-- adds &amp;#x2F;api&amp;#x2F;v2&amp;#x2F;users                  +-- generates proto stubs
  +-- writes OpenAPI spec                 +-- implements streaming calls
                                          +-- INCOMPATIBLE with C&amp;#x27;s design

              Me (the only integration point):
              &amp;quot;Wait, what is everyone doing?&amp;quot;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;I needed the agents to coordinate with each other. Not just run in parallel. Actually collaborate.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;shiioo-the-virtual-company&quot;&gt;Shiioo, the virtual company&lt;&#x2F;h2&gt;
&lt;p&gt;The second attempt was &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;shiioo&quot;&gt;shiioo&lt;&#x2F;a&gt;. The name is the Japanese romanization of CEO. The premise had changed entirely.&lt;&#x2F;p&gt;
&lt;p&gt;What if the agents were not just parallel workers? What if they were employees in a virtual enterprise, each with a specific role, a defined skillset, a limited set of tools, and a reporting structure?&lt;&#x2F;p&gt;
&lt;p&gt;I had been reading about organizational design and it struck me how directly the problems I was having with agent orchestration mapped to problems that real companies solve with management hierarchies. When you have sixteen people working on a project, you do not give each one independent access to everything and hope for the best. You create teams. You assign leads. You define communication channels. You escalate decisions that exceed a team’s authority.&lt;&#x2F;p&gt;
&lt;p&gt;So that is what I built. shiioo is a Rust-based client-server system with an event-sourced persistence layer. Each agent is modeled as an employee with a &lt;code&gt;SKILLS.md&lt;&#x2F;code&gt; file that defines its capabilities, a limited set of MCP interfaces it can access, and a position in an organizational hierarchy. Agents are grouped into squads. Each squad has a lead. Squad leads report to department leads. Department leads report up to me.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;shiioo: virtual company hierarchy&quot;&gt;+------------------+
                            |   CEO (me)       |
                            |   Strategic      |
                            |   decisions only |
                            +--------+---------+
                                     |
                   +-----------------+-----------------+
                   |                                   |
          +--------v---------+               +---------v--------+
          |  Dept Lead:      |               |  Dept Lead:      |
          |  Infrastructure  |               |  Product         |
          +--------+---------+               +---------+--------+
                   |                                   |
          +--------+--------+                 +--------+--------+
          |                 |                 |                 |
   +------v------+  +------v------+   +------v------+  +------v------+
   | Squad Lead: |  | Squad Lead: |   | Squad Lead: |  | Squad Lead: |
   | Backend     |  | Platform    |   | Frontend    |  | Data        |
   +------+------+  +------+------+   +------+------+  +------+------+
          |                 |                 |                 |
     +----+----+       +---+---+        +----+----+       +---+---+
     |    |    |       |       |        |    |    |       |       |
    Ag1  Ag2  Ag3    Ag4     Ag5      Ag6  Ag7  Ag8    Ag9    Ag10

   Each agent has:
   - SKILLS.md (capabilities)     - Limited MCP interfaces
   - Defined scope of authority   - Escalation path upward&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The architecture uses a DAG workflow engine built on &lt;code&gt;petgraph&lt;&#x2F;code&gt; for task dependencies, a policy engine for authorization and approval gates, and a capacity broker that routes LLM calls across multiple providers. Every action is event-sourced, which means I have a complete audit trail of every decision every agent made, and I can replay any sequence of events to understand how a particular outcome was reached.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;shiioo system architecture&quot;&gt;+-------------------------------------------------------------------+
|                         shiioo server                             |
|                                                                   |
|  +------------------+  +------------------+  +-----------------+  |
|  | Workflow Engine   |  | Policy Engine    |  | Capacity Broker |  |
|  | (petgraph DAGs)  |  | (RBAC, approval  |  | (route LLM     |  |
|  |                  |  |  gates, authz)   |  |  calls across   |  |
|  | task deps,       |  |                  |  |  providers)     |  |
|  | retries,         |  | who can do what, |  |                 |  |
|  | timeouts         |  | who approves     |  | Anthropic,      |  |
|  +--------+---------+  +--------+---------+  | OpenAI, ...     |  |
|           |                     |             +---------+-------+  |
|           +----------+----------+                       |         |
|                      |                                  |         |
|              +-------v--------+                         |         |
|              | Event Store    |    +--------------------v------+  |
|              | (redb + S3)   |    | MCP Tool Server            |  |
|              |               |    | (expose enterprise tools   |  |
|              | every action  |    |  to agents)                |  |
|              | is persisted, |    +----------------------------+  |
|              | replayable    |                                    |
|              +---------------+                                    |
+-------------------------------------------------------------------+
        |                                        |
   +----v----+                              +----v----+
   | CLI REPL|                              | Web     |
   | (Chief  |                              | Dashboard|
   | of Staff|                              | (real-  |
   | mode)   |                              |  time)  |
   +---------+                              +---------+&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The key insight was the escalation model. Most tasks could be handled by individual agents within their defined scope. When an agent encountered a decision that exceeded its authority, such as a design choice that would affect other teams, a dependency conflict, or a resource allocation question, it escalated to its squad lead. The squad lead either resolved it or escalated further. Only the decisions that truly required strategic judgment, market positioning, resource allocation across projects, architectural direction, reached me.&lt;&#x2F;p&gt;
&lt;p&gt;I was the CEO. Not in some metaphorical sense. In the literal operational sense that the only decisions that required my attention were the ones that should require a CEO’s attention.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;it-worked&quot;&gt;It worked&lt;&#x2F;h2&gt;
&lt;p&gt;shiioo was rough. The UI was a REPL with seven built-in commands. The web dashboard was basic. The documentation was incomplete.&lt;&#x2F;p&gt;
&lt;p&gt;But the model worked. I bootstrapped multiple projects using this setup over the holidays. Agents would pick up tasks, work within their scope, coordinate through the reporting structure, and escalate when they hit ambiguity. The squad lead agents were particularly effective because they had enough context about their team’s work to resolve most conflicts without involving me.&lt;&#x2F;p&gt;
&lt;p&gt;The event sourcing turned out to be more valuable than I expected. When something went wrong, and things did go wrong, I could trace the decision chain from the final output back to the original task assignment. I could see exactly where an agent made a bad call, which agent approved it, and what information was available at each step. This is something you cannot do with sixteen independent terminal sessions. You cannot even do it with most human teams.&lt;&#x2F;p&gt;
&lt;p&gt;For a week, I was running a virtual company. And the virtual company was shipping code.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;then-the-tokens-ran-out&quot;&gt;Then the tokens ran out&lt;&#x2F;h2&gt;
&lt;p&gt;Nobody talks about this part when they discuss agent orchestration: the economics.&lt;&#x2F;p&gt;
&lt;p&gt;During the Christmas holiday period, Anthropic’s 20x Max subscription was unusually generous with token limits. I do not know the exact numbers, but the allowance was noticeably higher than usual. I was burning through it.&lt;&#x2F;p&gt;
&lt;p&gt;The problem was not the agents doing useful work. The problem was the agents talking to each other. In any organization, a significant portion of communication is overhead. Status updates. Clarifying questions. Acknowledging instructions. Confirming understanding. In a human company, this overhead is accepted because humans need it to function. In an agent company, every word of that communication costs tokens.&lt;&#x2F;p&gt;
&lt;p&gt;I watched my weekly token budget evaporate. Not because the agents were writing code, although they were. Because the squad leads were having lengthy exchanges with their teams about task requirements. Because agents were asking clarifying questions that a slightly better prompt would have made unnecessary. Because the escalation chain, every step of it, required a full context window of conversation.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Where the tokens actually went&quot;&gt;Token budget (weekly)
=====================

  Useful work (code, tests, docs)       ████████░░░░░░░░░░░░  ~35%
  Agent-to-agent coordination           ██████████████░░░░░░  ~40%
  Escalation chain overhead             ████████░░░░░░░░░░░░  ~15%
  Clarifying questions &amp;#x2F; retries        ████░░░░░░░░░░░░░░░░  ~10%
                                        ^^^^^^^^^^^^^^^^^^^^
                                        |---- productive ---|--- overhead ---|

  The overhead was not a bug. It was the cost of coordination.
  The same cost human companies pay in salaries for meetings,
  Slack threads, standups, and status reports.

  The difference: human overhead is a fixed cost.
  Token overhead scales with every message.&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It was the AI equivalent of employees standing around the coffee machine. Except every minute at the coffee machine cost real money.&lt;&#x2F;p&gt;
&lt;p&gt;When the holiday period ended and the token limits returned to normal, shiioo became impractical for my situation. I have more than ten active projects. I need to decide, deliberately and granularly, what my limited token budget goes toward. A virtual company that autonomously allocates its own token spend, no matter how effectively, does not give me that control. I stopped using it. Not because the architecture was wrong. Because the economics did not fit my constraints.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-management-lesson&quot;&gt;The management lesson&lt;&#x2F;h2&gt;
&lt;p&gt;Every problem I encountered was a management problem that real companies have solved, or at least learned to live with.&lt;&#x2F;p&gt;
&lt;p&gt;The coordination problem: sixteen independent workers producing inconsistent output. Solved the same way companies solve it. Hierarchy, defined roles, communication channels.&lt;&#x2F;p&gt;
&lt;p&gt;The overhead problem: too much communication relative to productive work. Every manager knows this. Every company struggles with it. The optimal amount of coordination is not zero and it is not “as much as possible.” It is somewhere in between, and finding that point is one of the hardest problems in organizational design.&lt;&#x2F;p&gt;
&lt;p&gt;The economics problem: the work gets done, but the cost of the work exceeds the budget. This is not a technology problem. This is a business problem. And it has different answers at different scales.&lt;&#x2F;p&gt;
&lt;p&gt;For an individual developer on a consumer subscription, shiioo’s overhead is too expensive. The token cost of agent-to-agent communication eats into the budget for actual work. I need to be hands-on, directing each agent personally, because my token budget is small enough that every token should go toward output I directly value.&lt;&#x2F;p&gt;
&lt;p&gt;For an enterprise with API access and a meaningful budget, the calculus is completely different. If you are paying for engineering time at market rates, the token cost of agent coordination is a rounding error compared to the cost of human coordination. The overhead that made shiioo impractical for me, agents discussing requirements, leads resolving conflicts, escalation chains processing decisions, would be a bargain for a company that currently pays humans to do the same thing at 100x the cost and 10x the latency.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;The economics at different scales&quot;&gt;Individual developer (consumer sub)     Enterprise (API access)
====================================    ====================================

Token budget:  limited, weekly cap      Token budget:  pay per use, large
Overhead cost: eats into real work      Overhead cost: rounding error
Control need:  high (every token        Control need:  moderate (aggregate
               must count)                             ROI matters)

  +----------+                            +----------+
  |  Useful  |  &amp;lt;-- want to maximize      |  Useful  |  &amp;lt;-- still majority
  |  work    |      this slice            |  work    |
  +----------+                            +----------+
  | Overhead |  &amp;lt;-- this hurts            | Overhead |  &amp;lt;-- acceptable cost
  +----------+                            +----------+
  | Budget   |                            |          |
  +----------+                            | Room to  |
  (no room)                               | grow     |
                                          +----------+

  Same architecture. Different economics. Different answer.&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;then-openclaw-happened&quot;&gt;Then OpenClaw happened&lt;&#x2F;h2&gt;
&lt;p&gt;A few weeks after Christmas, &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;openclaw&#x2F;openclaw&quot;&gt;OpenClaw&lt;&#x2F;a&gt; launched and crossed 100,000 GitHub stars within its first week. The community was impressed. A personal AI agent framework with multi-agent support, tool integrations, and an approachable setup experience.&lt;&#x2F;p&gt;
&lt;p&gt;I looked at it and felt a mix of recognition and mild frustration. OpenClaw solved the “how do I run an AI agent” problem elegantly. But shiioo was solving a different problem. Not “how do I run agents” but “how do I run an organization of agents.” Hiring, delegation, escalation, governance, audit trails, budget management. The boring, structural, enterprise problems that do not demo well but determine whether agent orchestration actually works at scale.&lt;&#x2F;p&gt;
&lt;p&gt;OpenClaw is a good tool for individuals. What I built, rough as it was, is a sketch of what enterprises will need when they move past “we have an AI assistant” to “we have an AI workforce.” Those are fundamentally different problems, and they require fundamentally different architectures.&lt;&#x2F;p&gt;
&lt;p&gt;I do not say this to diminish OpenClaw. I say it because I think the industry is still mostly working on the first problem while the second one is coming fast.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-would-do-differently&quot;&gt;What I would do differently&lt;&#x2F;h2&gt;
&lt;p&gt;If I were building shiioo again today, three things would change.&lt;&#x2F;p&gt;
&lt;p&gt;First, the communication protocol between agents needs to be structured and minimal. Free-form conversation between agents is expensive and produces the same rambling overhead that plagues human Slack channels. Agents should exchange typed messages with defined schemas. Not “hey, I was thinking about the API design and I wonder if we should consider…” but &lt;code&gt;{ type: &quot;design_decision&quot;, scope: &quot;api&quot;, proposal: &quot;...&quot;, requires_approval: true }&lt;&#x2F;code&gt;. Every token in agent-to-agent communication should carry information, not politeness.&lt;&#x2F;p&gt;
&lt;p&gt;Second, the token budget needs to be a first-class resource managed by the system, not an afterthought. Every agent should have a token allowance. Every escalation should have a cost. The system should make tradeoff decisions about whether a clarifying question is worth the tokens, the same way a well-run company makes tradeoff decisions about whether a meeting is worth the calendar time.&lt;&#x2F;p&gt;
&lt;p&gt;Third, I would separate the orchestration layer from the LLM provider entirely. shiioo was built around the Anthropic Messages API. It should have been built around an abstract capability interface where the provider is a pluggable backend. Not for vendor neutrality as a principle, but because the economics change when you can route low-stakes agent communication through a smaller, cheaper model and reserve the frontier model for decisions that actually require it.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Smarter token routing by decision weight&quot;&gt;Task type                Model tier          Cost per 1K tokens
==========               ==========          ==================

Strategic decisions       Frontier            $$$$
(architecture, design)    (Opus, GPT-5)
        |
        v
Squad lead coordination   Mid-tier            $$
(conflict resolution,     (Sonnet, GPT-4o)
 task assignment)
        |
        v
Agent-to-agent comms      Small&amp;#x2F;fast          $
(status, ack, handoff)    (Haiku, GPT-4o-mini)

Route by decision weight, not uniformly.
Most tokens go to the cheapest tier.
Most value comes from the expensive tier.&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;what-this-means-for-actual-enterprises&quot;&gt;What this means for actual enterprises&lt;&#x2F;h2&gt;
&lt;p&gt;I ran my virtual company for a week across personal projects. It was an experiment. But the patterns I stumbled into are not experimental. They are the same patterns that every large organization will need to operationalize as agentic employees become real line items on the org chart.&lt;&#x2F;p&gt;
&lt;p&gt;This is not a distant future. Companies are already deploying AI agents for customer support, code review, compliance checks, and data pipeline management. What most of them have not done yet is think about what happens when those agents need to coordinate. When the support agent needs to escalate to the engineering agent. When the compliance agent needs to block a deployment the CI agent is trying to push. When three agents working on the same codebase need to not step on each other.&lt;&#x2F;p&gt;
&lt;p&gt;The companies that figure this out first will have a structural advantage that compounds.&lt;&#x2F;p&gt;
&lt;p&gt;Consider a mid-size engineering organization. Two hundred engineers. They spend, conservatively, 30% of their time on coordination. Standups, planning meetings, Slack threads, code review discussions, design document feedback loops, incident response coordination. That is sixty full-time-equivalent salaries spent on people talking to each other about work rather than doing it. Nobody questions this cost because it has always been the cost of building software with humans.&lt;&#x2F;p&gt;
&lt;p&gt;Now replace even a fraction of that coordination layer with agents. Not the engineers themselves. The coordination between them. An agentic squad lead that triages incoming tickets, assigns them based on skill match, checks for conflicts with in-flight work, and only escalates to a human lead when the decision genuinely requires human judgment. An agentic project manager that tracks dependencies across teams, flags blockers before they become crises, and generates status updates from actual commit history instead of asking twelve people to fill out a spreadsheet.&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&lt;&#x2F;th&gt;&lt;th&gt;Today&lt;&#x2F;th&gt;&lt;th&gt;Near future&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Coordination&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;All human, expensive, slow, lossy&lt;&#x2F;td&gt;&lt;td&gt;Mostly agents, cheap, fast, auditable&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Status updates&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Asked for weekly (often stale by the time they reach leadership)&lt;&#x2F;td&gt;&lt;td&gt;Generated from event logs (always current)&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Conflict detection&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Someone notices during code review (after the work is done)&lt;&#x2F;td&gt;&lt;td&gt;Automatic, flagged before work begins&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Escalation&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;td&gt;Informal, depends on who knows whom&lt;&#x2F;td&gt;&lt;td&gt;Structured, policy-driven, with full context&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;The organizations that will benefit most are not the ones with the best AI models. They are the ones that treat agent orchestration as an organizational design problem. That means defining clear scopes of authority, building escalation paths that preserve context, implementing approval gates that match their governance requirements, and maintaining audit trails that satisfy compliance. These are not engineering problems. They are operations problems. And most companies already have people who know how to solve them. They are called managers.&lt;&#x2F;p&gt;
&lt;p&gt;The irony is that the skills most relevant to the agentic enterprise are not machine learning or prompt engineering. They are the skills that good managers have always had: defining clear responsibilities, building trust through transparency, knowing when to delegate and when to intervene, and designing systems where people, or agents, can do their best work without tripping over each other.&lt;&#x2F;p&gt;
&lt;p&gt;Every enterprise will need to answer a specific set of questions in the next few years. How do you onboard an agentic employee? How do you define its scope? What happens when it makes a mistake? Who is accountable? How do you audit its decisions? How do you revoke its access? These questions sound like IT governance, and they are. But the answers will reshape how companies think about headcount, team structure, and operational capacity in ways that most leadership teams have not begun to consider.&lt;&#x2F;p&gt;
&lt;p&gt;The companies that start experimenting now, even crudely, even with something as rough as what I built over a holiday week, will have a vocabulary and a set of institutional patterns that the latecomers will not. And in organizational design, having the right patterns early matters more than having the best technology late.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-this-goes&quot;&gt;Where this goes&lt;&#x2F;h2&gt;
&lt;p&gt;I still believe the virtual company model is the right abstraction for large-scale agent orchestration. It solves the same coordination problems that human organizations solve, using the same structural patterns that have been refined over decades of organizational theory.&lt;&#x2F;p&gt;
&lt;p&gt;The technology is ready. The architectures are straightforward. The missing piece is the economics. When frontier model inference costs drop by another order of magnitude, and they will, the token overhead of agent-to-agent communication stops being prohibitive. When that happens, the question will not be “should we orchestrate agents in a hierarchy” but “what does the org chart look like.”&lt;&#x2F;p&gt;
&lt;p&gt;I suspect we are closer to that moment than most people think. And when it arrives, the hard problems will not be technical. They will be the same problems that have always made management hard: delegation, trust, accountability, and knowing when to let your people work and when to step in.&lt;&#x2F;p&gt;
&lt;p&gt;The agents are ready to work. We just need to learn how to manage them.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;kage&quot;&gt;kage: Local-first agentic work orchestrator&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;shiioo&quot;&gt;shiioo: Virtual Company OS&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;openclaw&#x2F;openclaw&quot;&gt;OpenClaw: Personal AI agent framework&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ghostty.org&quot;&gt;Ghostty terminal emulator&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;crates.io&#x2F;crates&#x2F;alacritty_terminal&quot;&gt;alacritty_terminal crate&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>Why We Built a Haskell Package Manager in Rust</title>
          <pubDate>Mon, 13 Apr 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/why-we-built-a-haskell-package-manager-in-rust/</link>
          <guid>https://raskell.io/articles/why-we-built-a-haskell-package-manager-in-rust/</guid>
          <description xml:base="https://raskell.io/articles/why-we-built-a-haskell-package-manager-in-rust/">&lt;p&gt;Why would anyone invest serious engineering effort into Haskell tooling in 2026? Haskell is a niche language. It has been a niche language for thirty years. Most companies do not use it. Most developers have never written a line of it. If you are going to pour months of work into building a package manager and toolchain from scratch, in Rust no less, the obvious question is: why not just use Rust?&lt;&#x2F;p&gt;
&lt;p&gt;The answer is the same one I gave in &lt;a href=&quot;&#x2F;articles&#x2F;what-programming-languages-become-when-ai-writes-the-code&#x2F;&quot;&gt;The Last Programming Language Might Not Be for Humans&lt;&#x2F;a&gt;: the way we write software is changing. AI is becoming the primary author of code, and the languages that will matter most in that future are not the ones optimized for human typing speed. They are the ones optimized for formal correctness, composability, and provability. Haskell is not niche in that framing. It is early.&lt;&#x2F;p&gt;
&lt;p&gt;I have &lt;a href=&quot;&#x2F;articles&#x2F;all-beginning-is-haskell&#x2F;&quot;&gt;written before&lt;&#x2F;a&gt; about why Haskell shaped the way I think. The short version: Haskell teaches you to think about programs as compositions of well-typed transformations, and that discipline makes you better at everything else. I still believe this. I write most of my production software in Rust, but I think in Haskell.&lt;&#x2F;p&gt;
&lt;p&gt;The problem was never the language. The problem was everything around it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-state-of-haskell-tooling&quot;&gt;The state of Haskell tooling&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to start a Haskell project today, you install ghcup, which manages GHC (the compiler), Cabal (the build tool), Stack (a different build tool), and HLS (the language server). Then you decide whether to use Cabal or Stack, which is a decision that has split the Haskell community for over a decade and which nobody has fully resolved. Then you configure your project, using either a &lt;code&gt;.cabal&lt;&#x2F;code&gt; file (a custom format that predates TOML, YAML, and JSON as configuration languages) or a &lt;code&gt;stack.yaml&lt;&#x2F;code&gt; plus a &lt;code&gt;.cabal&lt;&#x2F;code&gt; file (because Stack still needs Cabal files underneath). Then you wait for GHC to compile your dependencies, which takes long enough that you start questioning your life choices.&lt;&#x2F;p&gt;
&lt;p&gt;I have introduced Haskell to teams and watched the enthusiasm drain from people’s faces during the toolchain setup. Not because the language was hard, but because the first thirty minutes were spent fighting &lt;code&gt;ghcup&lt;&#x2F;code&gt;, &lt;code&gt;cabal update&lt;&#x2F;code&gt;, resolver mismatches, and cryptic build errors that had nothing to do with the code they wanted to write.&lt;&#x2F;p&gt;
&lt;p&gt;A typical first encounter: you want to write a small HTTP server. You install ghcup, install GHC 9.8.2, run &lt;code&gt;cabal init&lt;&#x2F;code&gt;, and get a &lt;code&gt;.cabal&lt;&#x2F;code&gt; file with a dozen fields you do not understand yet. You add &lt;code&gt;warp&lt;&#x2F;code&gt; as a dependency and run &lt;code&gt;cabal build&lt;&#x2F;code&gt;. GHC starts compiling &lt;code&gt;warp&lt;&#x2F;code&gt; and its transitive dependencies: &lt;code&gt;http-types&lt;&#x2F;code&gt;, &lt;code&gt;bytestring&lt;&#x2F;code&gt;, &lt;code&gt;text&lt;&#x2F;code&gt;, &lt;code&gt;network&lt;&#x2F;code&gt;, &lt;code&gt;streaming-commons&lt;&#x2F;code&gt;, &lt;code&gt;vault&lt;&#x2F;code&gt;, &lt;code&gt;wai&lt;&#x2F;code&gt;, and about forty others. Four to six minutes on a modern machine. Every time you switch GHC versions or clean your cache, you pay that cost again.&lt;&#x2F;p&gt;
&lt;p&gt;Now compare this with Rust. You run &lt;code&gt;cargo new my-server&lt;&#x2F;code&gt;. You add &lt;code&gt;axum&lt;&#x2F;code&gt; to &lt;code&gt;Cargo.toml&lt;&#x2F;code&gt;. You run &lt;code&gt;cargo build&lt;&#x2F;code&gt;. It compiles. The first build is not instant either, but &lt;code&gt;cargo&lt;&#x2F;code&gt; does not ask you which of two incompatible build tools you prefer, does not require a separate tool to manage the compiler, and does not present you with a configuration format from 2005.&lt;&#x2F;p&gt;
&lt;p&gt;Or Python. &lt;code&gt;uv init my-server&lt;&#x2F;code&gt;. &lt;code&gt;uv add fastapi&lt;&#x2F;code&gt;. &lt;code&gt;uv run&lt;&#x2F;code&gt;. Done. The entire dependency resolution and installation takes less than a second because &lt;code&gt;uv&lt;&#x2F;code&gt; resolves and installs in parallel, in Rust, without spawning Python.&lt;&#x2F;p&gt;
&lt;p&gt;Every major language ecosystem has converged on the same answer: one tool that handles project creation, dependency management, building, testing, and publishing. Haskell has three tools that each do part of the job, disagree about how dependencies should work, and require a fourth tool to manage the compiler itself.&lt;&#x2F;p&gt;
&lt;p&gt;People have been talking about Haskell’s tooling problem for years. I decided to do something about it the way &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;astral.sh&quot;&gt;astral.sh&lt;&#x2F;a&gt; did for Python: rewrite the developer experience from scratch, in Rust, and make everything dramatically faster.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-astral-sh-playbook&quot;&gt;The astral.sh playbook&lt;&#x2F;h2&gt;
&lt;p&gt;When Astral released &lt;code&gt;uv&lt;&#x2F;code&gt; and &lt;code&gt;ruff&lt;&#x2F;code&gt;, it proved something important. You can take a mature ecosystem with deeply entrenched tooling, rebuild the developer experience in Rust, and people will switch, because the new tools were fast enough and coherent enough that the switching cost paid for itself immediately.&lt;&#x2F;p&gt;
&lt;p&gt;Python’s tooling situation before &lt;code&gt;uv&lt;&#x2F;code&gt; was remarkably similar to Haskell’s. &lt;code&gt;pip&lt;&#x2F;code&gt;, &lt;code&gt;pip-tools&lt;&#x2F;code&gt;, &lt;code&gt;pipenv&lt;&#x2F;code&gt;, &lt;code&gt;poetry&lt;&#x2F;code&gt;, &lt;code&gt;conda&lt;&#x2F;code&gt;, &lt;code&gt;virtualenv&lt;&#x2F;code&gt;, &lt;code&gt;venv&lt;&#x2F;code&gt;, &lt;code&gt;pyenv&lt;&#x2F;code&gt;. Each solved part of the problem. Each had opinions that conflicted with the others. Setting up a Python project from scratch meant choosing a stack of tools, hoping they worked together, and accepting that your lockfile format depended on which combination you picked.&lt;&#x2F;p&gt;
&lt;p&gt;Astral did not try to fix any single tool. They rewrote the experience. &lt;code&gt;uv&lt;&#x2F;code&gt; is a single Rust binary that does what &lt;code&gt;pip&lt;&#x2F;code&gt;, &lt;code&gt;pip-tools&lt;&#x2F;code&gt;, &lt;code&gt;virtualenv&lt;&#x2F;code&gt;, and &lt;code&gt;pyenv&lt;&#x2F;code&gt; did, but 10-100x faster and with a coherent interface. &lt;code&gt;ruff&lt;&#x2F;code&gt; is a single Rust binary that does what &lt;code&gt;flake8&lt;&#x2F;code&gt;, &lt;code&gt;isort&lt;&#x2F;code&gt;, &lt;code&gt;pycodestyle&lt;&#x2F;code&gt;, and &lt;code&gt;pyflakes&lt;&#x2F;code&gt; did, but 100x faster. The Python community switched because the tools were obviously better the first time they used them.&lt;&#x2F;p&gt;
&lt;p&gt;The playbook has three steps:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Wrap first.&lt;&#x2F;strong&gt; Use the existing tools under the hood rather than reimplementing everything. &lt;code&gt;uv&lt;&#x2F;code&gt; wraps pip’s package index and resolver logic. hx wraps GHC and Cabal.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Tame second.&lt;&#x2F;strong&gt; Add better error messages, faster startup, unified configuration, and workflows that make sense. This is where most of the user-facing value lives.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Replace last.&lt;&#x2F;strong&gt; Only replace underlying components when you have to. For hx, that meant building a native build mode that bypasses Cabal entirely for simple projects, and a native dependency resolver in Rust that is 24x faster than Cabal’s constraint solver.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;This approach is pragmatic in a way that matters. You do not need to rebuild the world to improve the experience. You need to rebuild the surface. The parts that people touch every day.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-rust&quot;&gt;Why Rust&lt;&#x2F;h2&gt;
&lt;p&gt;The choice to build hx in Rust is a direct response to a structural problem.&lt;&#x2F;p&gt;
&lt;p&gt;Haskell’s existing tooling is written in Haskell. This creates a bootstrap problem. To build the build tool, you need the compiler. To install the compiler, you need the compiler manager. To build the compiler manager, you need a compiler. The dependency chain is circular, and every link in it is slow to compile.&lt;&#x2F;p&gt;
&lt;p&gt;Think about what this means in practice. You are a new developer. You want to try Haskell. You download ghcup. ghcup is a shell script that downloads a pre-built GHC binary, but it also installs Cabal, which is itself a Haskell binary compiled with GHC. If the pre-built binary does not exist for your platform, you need GHC to build Cabal, but you need Cabal to set up GHC. The bootstrap documentation exists because the bootstrap problem exists, and it exists because the tools are written in the language they manage.&lt;&#x2F;p&gt;
&lt;p&gt;GHC’s runtime system adds initialization overhead to every invocation. When you type &lt;code&gt;cabal build&lt;&#x2F;code&gt;, the first 45 milliseconds are spent starting the GHC runtime before Cabal even begins to think about your project. Stack is worse at 89 milliseconds. These numbers sound small until you are running commands in a tight development loop, hitting save and expecting the build to start instantly. Or in CI, where the build tool is invoked hundreds of times across a pipeline and those milliseconds compound into minutes.&lt;&#x2F;p&gt;
&lt;p&gt;hx starts in 12 milliseconds. A native binary without a garbage-collected runtime does not need to initialize one. The tool should not have the same dependencies as the thing it manages.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;hx build    # 12ms startup + build time
cabal build # 45ms startup + build time
stack build # 89ms startup + build time
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Memory tells the same story:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Tool&lt;&#x2F;th&gt;&lt;th&gt;Startup memory&lt;&#x2F;th&gt;&lt;th&gt;Build memory (simple project)&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;hx&lt;&#x2F;td&gt;&lt;td&gt;8 MB&lt;&#x2F;td&gt;&lt;td&gt;45 MB&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;cabal&lt;&#x2F;td&gt;&lt;td&gt;45 MB&lt;&#x2F;td&gt;&lt;td&gt;250 MB&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;stack&lt;&#x2F;td&gt;&lt;td&gt;85 MB&lt;&#x2F;td&gt;&lt;td&gt;320 MB&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;For a tool you invoke constantly, this matters. Especially on CI runners with constrained memory, or on a laptop where you have four terminal panes open with different projects.&lt;&#x2F;p&gt;
&lt;p&gt;The Rust decision also solves the distribution problem. A Rust binary is a single static executable that cross-compiles trivially. No runtime dependencies. No “install GHC first so you can install the tool that installs GHC.” &lt;code&gt;curl | sh&lt;&#x2F;code&gt; and you are running. hx is available via the install script, Cargo, aqua, winget on Windows, and Homebrew. Every distribution channel ships a self-contained binary.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-hx-actually-does&quot;&gt;What hx actually does&lt;&#x2F;h2&gt;
&lt;p&gt;hx replaces the &lt;code&gt;cabal + stack + ghcup + fourmolu + hlint&lt;&#x2F;code&gt; workflow with a single binary:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;curl -fsSL https:&amp;#x2F;&amp;#x2F;arcanist.sh&amp;#x2F;hx&amp;#x2F;install.sh | sh
hx new my-app &amp;amp;&amp;amp; cd my-app
hx run
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;No ghcup. No stack. No cabal-install. One tool, one configuration file, one lockfile format.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;configuration&quot;&gt;Configuration&lt;&#x2F;h3&gt;
&lt;p&gt;The configuration is &lt;code&gt;hx.toml&lt;&#x2F;code&gt;. Not a &lt;code&gt;.cabal&lt;&#x2F;code&gt; file with its custom syntax that nobody can parse without a library. Not a &lt;code&gt;stack.yaml&lt;&#x2F;code&gt; with YAML indentation traps. TOML, the same format that Rust (&lt;code&gt;Cargo.toml&lt;&#x2F;code&gt;), Python (&lt;code&gt;pyproject.toml&lt;&#x2F;code&gt;), and most modern tools have converged on.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;toml&quot; class=&quot;language-toml &quot;&gt;&lt;code class=&quot;language-toml&quot; data-lang=&quot;toml&quot;&gt;[project]
name = &amp;quot;my-app&amp;quot;
kind = &amp;quot;bin&amp;quot;

[toolchain]
ghc = &amp;quot;9.8.2&amp;quot;

[build]
optimization = 2
warnings = true

[format]
formatter = &amp;quot;fourmolu&amp;quot;

[lint]
hlint = true

[hooks]
pre-build = &amp;quot;scripts&amp;#x2F;generate-version.sh&amp;quot;
post-test = &amp;quot;scripts&amp;#x2F;notify.sh&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Everything in one file. The toolchain version is pinned per-project, so different projects can use different GHC versions without conflict. When you run &lt;code&gt;hx build&lt;&#x2F;code&gt; in a project pinned to GHC 9.8.2 and another pinned to 9.6.4, hx switches automatically. No &lt;code&gt;ghcup set&lt;&#x2F;code&gt; commands. No global state.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;lockfiles&quot;&gt;Lockfiles&lt;&#x2F;h3&gt;
&lt;p&gt;The lockfile is also TOML. Every dependency is pinned with a sha256 fingerprint:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;toml&quot; class=&quot;language-toml &quot;&gt;&lt;code class=&quot;language-toml&quot; data-lang=&quot;toml&quot;&gt;version = 1
ghc = &amp;quot;9.8.2&amp;quot;
created_at = &amp;quot;2026-01-16T00:00:00Z&amp;quot;

[[package]]
name = &amp;quot;aeson&amp;quot;
version = &amp;quot;2.2.1.0&amp;quot;
sha256 = &amp;quot;a5a5b8a...&amp;quot;
deps = [&amp;quot;base&amp;quot;, &amp;quot;text&amp;quot;, &amp;quot;bytestring&amp;quot;]
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;hx lock --check&lt;&#x2F;code&gt; in CI fails if the lockfile is stale. This is deterministic by default. Not “deterministic if you remember to run &lt;code&gt;cabal freeze&lt;&#x2F;code&gt; and commit the freeze file and hope nobody ran &lt;code&gt;cabal update&lt;&#x2F;code&gt; on a different machine.” Deterministic the way &lt;code&gt;cargo&lt;&#x2F;code&gt; and &lt;code&gt;uv&lt;&#x2F;code&gt; are deterministic. Automatically. Every time.&lt;&#x2F;p&gt;
&lt;p&gt;If you are coming from Stack, you might say “Stack already has lockfiles.” It does. Stack’s approach is to pin to a Stackage snapshot, which gives you a curated set of packages known to work together. This is a valid approach, but it means your dependency versions are dictated by what the Stackage maintainers decided to include in that snapshot. If you need a newer version of a package that is not in the current LTS, you start adding &lt;code&gt;extra-deps&lt;&#x2F;code&gt;, and your reproducibility guarantees become more complex. hx resolves from Hackage directly, pins every version, and verifies checksums. You control exactly what you get.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;native-builds&quot;&gt;Native builds&lt;&#x2F;h3&gt;
&lt;p&gt;For simple projects with only &lt;code&gt;base&lt;&#x2F;code&gt; dependencies, hx has a native build mode that bypasses Cabal entirely. It constructs the module graph itself and invokes GHC directly:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Operation&lt;&#x2F;th&gt;&lt;th&gt;hx native&lt;&#x2F;th&gt;&lt;th&gt;hx (cabal backend)&lt;&#x2F;th&gt;&lt;th&gt;cabal&lt;&#x2F;th&gt;&lt;th&gt;stack&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Cold build&lt;&#x2F;td&gt;&lt;td&gt;0.48s&lt;&#x2F;td&gt;&lt;td&gt;2.52s&lt;&#x2F;td&gt;&lt;td&gt;2.68s&lt;&#x2F;td&gt;&lt;td&gt;3.2s&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Incremental&lt;&#x2F;td&gt;&lt;td&gt;0.05s&lt;&#x2F;td&gt;&lt;td&gt;0.35s&lt;&#x2F;td&gt;&lt;td&gt;0.39s&lt;&#x2F;td&gt;&lt;td&gt;0.52s&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Single file change&lt;&#x2F;td&gt;&lt;td&gt;0.31s&lt;&#x2F;td&gt;&lt;td&gt;1.42s&lt;&#x2F;td&gt;&lt;td&gt;1.42s&lt;&#x2F;td&gt;&lt;td&gt;1.8s&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;5.6x faster cold builds. 7.8x faster incremental builds. The difference comes from eliminating Cabal’s package database queries, build plan calculation, and job scheduling overhead.&lt;&#x2F;p&gt;
&lt;p&gt;Where does the time go in a normal Cabal build? Roughly: runtime initialization (45ms), reading the package database (80-120ms), computing the build plan (200-400ms depending on dependency count), checking file timestamps through the Cabal build system (100-200ms), and only then invoking GHC. hx native mode skips all of that. It reads file timestamps directly, constructs a minimal module graph, and calls GHC with exactly the flags needed. For projects with external dependencies, hx falls back to the Cabal backend transparently. You do not have to think about it.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;dependency-resolution&quot;&gt;Dependency resolution&lt;&#x2F;h3&gt;
&lt;p&gt;hx includes a native dependency resolver written in Rust. The &lt;code&gt;hx-solver&lt;&#x2F;code&gt; crate implements constraint resolution using the same algorithm as Cabal’s solver, but without the overhead of GHC’s runtime:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Direct dependencies&lt;&#x2F;th&gt;&lt;th&gt;hx&lt;&#x2F;th&gt;&lt;th&gt;cabal&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;10 packages&lt;&#x2F;td&gt;&lt;td&gt;5ms&lt;&#x2F;td&gt;&lt;td&gt;120ms&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;20 packages&lt;&#x2F;td&gt;&lt;td&gt;18ms&lt;&#x2F;td&gt;&lt;td&gt;450ms&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;50 packages&lt;&#x2F;td&gt;&lt;td&gt;85ms&lt;&#x2F;td&gt;&lt;td&gt;2.8s&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;100 packages&lt;&#x2F;td&gt;&lt;td&gt;320ms&lt;&#x2F;td&gt;&lt;td&gt;12.5s&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;At 100 dependencies, hx resolves in 320 milliseconds. Cabal takes 12.5 seconds. In a real-world test with 20 direct dependencies and their transitive closure, hx resolved in 1.2 seconds versus 8.5 seconds for &lt;code&gt;cabal freeze&lt;&#x2F;code&gt;. Stack’s resolver is faster at 0.8 seconds because Stackage snapshots are pre-computed, but you trade resolution speed for version flexibility.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;error-messages&quot;&gt;Error messages&lt;&#x2F;h3&gt;
&lt;p&gt;Haskell’s reputation for cryptic error messages is partly deserved and partly a tooling problem. GHC type errors can be daunting, but build tool errors are often worse because they mix configuration issues with compilation issues in unhelpful ways. “Could not resolve dependencies” from Cabal tells you almost nothing about which constraint is blocking resolution or what you could change to fix it.&lt;&#x2F;p&gt;
&lt;p&gt;hx uses structured error codes with actionable suggestions:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;E0012: Package &amp;#x27;aeson&amp;#x27; not found in local index

  The package index may be outdated.
  Run: hx index update

  Or add the package explicitly:
  Run: hx add aeson
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;pre&gt;&lt;code&gt;E0020: GHC version mismatch

  Project requires GHC 9.8.2 but 9.6.4 is active.
  Run: hx toolchain install 9.8.2
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Every error has a code, a human-readable explanation, and a concrete command to fix it. &lt;code&gt;hx doctor&lt;&#x2F;code&gt; runs a comprehensive diagnostic of your entire environment, checking GHC, Cabal, HLS, PATH configuration, and project setup, reporting exactly what is wrong and how to fix each issue.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;everything-else&quot;&gt;Everything else&lt;&#x2F;h3&gt;
&lt;p&gt;hx bundles the rest of the development workflow too. &lt;code&gt;hx fmt&lt;&#x2F;code&gt; wraps fourmolu for formatting. &lt;code&gt;hx lint&lt;&#x2F;code&gt; wraps hlint. &lt;code&gt;hx coverage --html --open&lt;&#x2F;code&gt; generates an HTML coverage report and opens it in your browser. &lt;code&gt;hx doc --open&lt;&#x2F;code&gt; builds Haddock documentation and serves it locally. &lt;code&gt;hx watch&lt;&#x2F;code&gt; detects file changes in 15 milliseconds (versus 180ms for &lt;code&gt;stack --file-watch&lt;&#x2F;code&gt;) and triggers rebuilds or test runs. &lt;code&gt;hx profile --heap&lt;&#x2F;code&gt; generates heap profiles for memory analysis.&lt;&#x2F;p&gt;
&lt;p&gt;The goal is that you should never need to leave hx to do something with your Haskell project. Not because hx reimplements everything, but because it wraps the best existing tools with a consistent interface and fast orchestration.&lt;&#x2F;p&gt;
&lt;p&gt;There is also a plugin system using Steel, a Scheme dialect, for custom build lifecycle hooks:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;scheme&quot; class=&quot;language-scheme &quot;&gt;&lt;code class=&quot;language-scheme&quot; data-lang=&quot;scheme&quot;&gt;;; .hx&amp;#x2F;plugins&amp;#x2F;check-todos.scm
(define (on-build-success project)
  (when (file-exists? &amp;quot;TODO.md&amp;quot;)
    (warn &amp;quot;Do not forget to update TODO.md&amp;quot;)))

(register-hook &amp;#x27;post-build on-build-success)
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Plugins live in &lt;code&gt;.hx&#x2F;plugins&#x2F;&lt;&#x2F;code&gt; and time out after a configurable interval so a misbehaving script cannot stall your build. They hook into pre-build, post-build, pre-test, post-test, and other lifecycle events. Lightweight enough that you can add project-specific automation without maintaining a separate build script.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;migration&quot;&gt;Migration&lt;&#x2F;h3&gt;
&lt;p&gt;If you have an existing project, hx can import it:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;hx init --from-cabal   # Import from an existing .cabal project
hx init --from-stack   # Import from a Stack project
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It reads your existing configuration, generates &lt;code&gt;hx.toml&lt;&#x2F;code&gt;, creates a lockfile, and you are running. The &lt;code&gt;.cabal&lt;&#x2F;code&gt; file is preserved for compatibility. hx reads it for package metadata and dependency specifications, but the build configuration and toolchain management move to &lt;code&gt;hx.toml&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-architecture&quot;&gt;The architecture&lt;&#x2F;h2&gt;
&lt;p&gt;hx is structured as a Rust workspace with 14 crates:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;hx workspace architecture&quot;&gt;hx-cli
                            |
              +-------------+-------------+
              |             |             |
          hx-core       hx-config      hx-ui
              |             |
    +---------+---------+   |
    |         |         |   |
hx-cabal  hx-solver  hx-lock
    |         |
hx-cache  hx-toolchain
    |
hx-doctor

Separate concerns: hx-plugins, hx-lsp, hx-warnings, hx-telemetry&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Each crate has a single responsibility. &lt;code&gt;hx-solver&lt;&#x2F;code&gt; knows how to resolve dependencies but nothing about building. &lt;code&gt;hx-cabal&lt;&#x2F;code&gt; knows how to invoke Cabal but nothing about configuration. &lt;code&gt;hx-toolchain&lt;&#x2F;code&gt; manages GHC installations but nothing about lockfiles. This separation means you can test the resolver without setting up a build environment, and you can change the build backend without touching the resolver.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;hx-lsp&lt;&#x2F;code&gt; crate is worth calling out. It provides language server protocol support, which means hx can manage HLS (Haskell Language Server) versions matched to your project’s GHC version. When your project uses GHC 9.8.2, hx ensures HLS is compatible. No more “HLS crashed because it was compiled with a different GHC than your project uses.” This is a problem that has frustrated Haskell developers for years, and it is entirely a tooling coordination problem.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-bigger-picture&quot;&gt;The bigger picture&lt;&#x2F;h2&gt;
&lt;p&gt;I built hx because I needed it. But the timing is not accidental.&lt;&#x2F;p&gt;
&lt;p&gt;In &lt;a href=&quot;&#x2F;articles&#x2F;what-programming-languages-become-when-ai-writes-the-code&#x2F;&quot;&gt;The Last Programming Language Might Not Be for Humans&lt;&#x2F;a&gt;, I laid out three futures for programming languages as AI becomes the primary author of code. The first future is explicit languages designed to minimize LLM errors through tight feedback loops. The second is declarative languages where code describes what something is rather than how to compute it, and the type system acts as a proof checker. The third is no language at all, where AI generates machine code directly.&lt;&#x2F;p&gt;
&lt;p&gt;I bet on the second future.&lt;&#x2F;p&gt;
&lt;p&gt;When an LLM writes imperative code, it has to track mutable state across dozens of lines, reason about the order of side effects, and hold implicit language behaviors in context. When it writes Haskell, it expresses a relationship between inputs and outputs, and the compiler verifies that the relationship is consistent. The model does not need to simulate execution step by step. It needs to generate an expression that satisfies type constraints. This is what LLMs are good at. Pattern recognition. Constraint satisfaction. Formal structure.&lt;&#x2F;p&gt;
&lt;p&gt;Consider what happens when an AI generates a Haskell function with a wrong type. The compiler does not produce a vague runtime error three layers deep in a call stack. It produces a precise, localized type error at compile time: “Expected &lt;code&gt;[LogEntry] -&amp;gt; [ErrorSummary]&lt;&#x2F;code&gt;, got &lt;code&gt;[LogEntry] -&amp;gt; [LogEntry]&lt;&#x2F;code&gt;.” The model reads this, adjusts, and re-generates. The feedback loop is tight, but unlike the explicit-language approach, the tightness comes from the type system itself, not from bolted-on contracts. The correctness guarantees are structural, not ceremonial.&lt;&#x2F;p&gt;
&lt;p&gt;This matters even more when you think about code that has to survive time. Procedural code decays. Three years from now, nobody remembers why a function mutates a global variable on line 47. The variable name made sense to whoever wrote it. The mutation order made sense in the context of the original design. But context evaporates. Types do not. A function signature that says &lt;code&gt;Request -&amp;gt; Policy -&amp;gt; Decision&lt;&#x2F;code&gt; is self-documenting in a way that no amount of comments on imperative code can match. The proof is in the types, and the types are checked by the compiler, not by human memory.&lt;&#x2F;p&gt;
&lt;p&gt;But none of that matters if nobody can set up a Haskell project without losing thirty minutes to toolchain configuration. The language’s virtues are locked behind a tooling wall. You can have the most expressive type system in production use, the most rigorous correctness guarantees, the best theoretical fit for agent-assisted development, and it means nothing if a developer’s first experience is fighting &lt;code&gt;ghcup&lt;&#x2F;code&gt; for half an hour. First impressions are permanent, and Haskell’s first impression has been “powerful but painful” for too long.&lt;&#x2F;p&gt;
&lt;p&gt;If Haskell is going to be relevant in a world where AI writes most of the code, the experience of using Haskell has to be as fast and frictionless as the experience of using Rust or Python. Not comparable. Equal. That is what hx is for. To remove the tooling objection entirely, so the conversation can be about the language’s actual strengths instead of its ecosystem’s historical baggage.&lt;&#x2F;p&gt;
&lt;p&gt;hx is the first step. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;arcanist.sh&#x2F;bhc&#x2F;&quot;&gt;BHC&lt;&#x2F;a&gt;, the Basel Haskell Compiler, goes further. GHC is a remarkable piece of engineering, but it was designed for a world where Haskell ran on desktops and servers with one performance profile. BHC is a clean-slate Haskell compiler, also written in Rust, offering six runtime profiles for different deployment targets:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Server&lt;&#x2F;strong&gt;: structured concurrency with automatic cancellation, observability hooks, deadline-aware scheduling&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Numeric&lt;&#x2F;strong&gt;: strict-by-default in hot paths, tensor lowering, SIMD auto-vectorization, GPU backends for CUDA and ROCm&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Edge&lt;&#x2F;strong&gt;: minimal runtime footprint, direct WASM emission, designed for Cloudflare Workers and Fastly Compute&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Realtime&lt;&#x2F;strong&gt;: bounded GC pauses under 1 millisecond, arena allocation, designed for games and audio processing&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Embedded&lt;&#x2F;strong&gt;: no GC at all, static allocation, bare-metal targets like ARM Cortex-M&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Same language. Same type safety. Different performance contracts depending on what you are building. Your security policy engine compiles with the server profile. Your tensor pipeline compiles with the numeric profile and runs on a GPU. Your edge function compiles to WASM. You do not change your source code. You change the compiler flag.&lt;&#x2F;p&gt;
&lt;p&gt;hx already supports BHC as an alternative backend:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;hx build --compiler=bhc --profile=server
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;One flag. Same project. Different runtime.&lt;&#x2F;p&gt;
&lt;p&gt;The vision behind &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;arcanist.sh&quot;&gt;arcanist.sh&lt;&#x2F;a&gt; is that Haskell’s ideas deserve infrastructure that matches their ambition. The language has always been decades ahead of its tooling. hx closes the gap on the developer experience side. BHC closes it on the runtime side. Together, they make the case that Haskell is not a language for academics and hobbyists. It is a language for the era we are entering, where correctness is not a luxury, it is the load-bearing structure of software that AI writes and humans verify.&lt;&#x2F;p&gt;
&lt;p&gt;The tooling is not separate from the thesis. The tooling IS the thesis.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-bet-what-if-i-am-wrong&quot;&gt;The Bet, What If I am Wrong&lt;&#x2F;h2&gt;
&lt;p&gt;This is a gamble.&lt;&#x2F;p&gt;
&lt;p&gt;I do not know whether Haskell will go through a revival. Nobody knows how AI-assisted development will actually evolve, which languages will matter in five years, or whether the thesis I outlined in the previous post will hold up against what reality delivers. I have a conviction, not a crystal ball.&lt;&#x2F;p&gt;
&lt;p&gt;I spent months building hx and BHC. Months of my own time, and to be perfectly blunt, a significant number of Anthropic’s Claude tokens. I pair-programmed most of this with Claude Code on my Max subscription, and that is not a footnote. It is part of the story. The tools I am building for AI-assisted Haskell development were themselves built using AI-assisted development. If that sounds circular, it is. The thesis tested itself during its own construction.&lt;&#x2F;p&gt;
&lt;p&gt;But I could be wrong. Haskell could remain niche forever. The AI era could favor a language nobody has thought of yet. The intermediate layer might not evolve the way I expect. The industry might double down on Python and TypeScript for agent-assisted workflows and never look back. These are all plausible outcomes.&lt;&#x2F;p&gt;
&lt;p&gt;So I build toward what I believe in and put the work out in the open. If I am right, Haskell gets the tooling it always deserved, and the language is ready when the moment arrives. If I am wrong, the ideas in hx and BHC, fast Rust-based tooling, deterministic lockfiles, multiple runtime profiles, structured error messages, are valuable regardless. Good infrastructure design does not expire just because the language it serves does not win the popularity contest.&lt;&#x2F;p&gt;
&lt;p&gt;And honestly, even on the unlikely side, I would rather have tried and been wrong than watched from the sidelines while the most elegant language I have ever used slowly faded because nobody bothered to fix the parts that were not the language.&lt;&#x2F;p&gt;
&lt;p&gt;At least I have tried.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;try-it&quot;&gt;Try it&lt;&#x2F;h2&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;curl -fsSL https:&amp;#x2F;&amp;#x2F;arcanist.sh&amp;#x2F;hx&amp;#x2F;install.sh | sh
hx new my-app &amp;amp;&amp;amp; cd my-app
hx run
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then try &lt;code&gt;hx doctor&lt;&#x2F;code&gt;, &lt;code&gt;hx fmt&lt;&#x2F;code&gt;, &lt;code&gt;hx test --watch&lt;&#x2F;code&gt;. See how it feels when the tooling gets out of your way.&lt;&#x2F;p&gt;
&lt;p&gt;hx is MIT-licensed and open source. If you have opinions about Haskell tooling, I want to hear them.&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>The Last Programming Language Might Not Be for Humans</title>
          <pubDate>Sat, 11 Apr 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/what-programming-languages-become-when-ai-writes-the-code/</link>
          <guid>https://raskell.io/articles/what-programming-languages-become-when-ai-writes-the-code/</guid>
          <description xml:base="https://raskell.io/articles/what-programming-languages-become-when-ai-writes-the-code/">&lt;p&gt;This morning I was standing at my desk, drinking watered-down instant coffee, doing what I do every morning after triaging the high-alert emails and notifications: thirty minutes of HackerNews. It is a ritual I time-box and never skip. I go to the office every day, and whether I am at that desk or at my home desk, the morning is the same. Coffee, posture, front page.&lt;&#x2F;p&gt;
&lt;p&gt;HackerNews remains one of the best ways to keep a finger on the pulse of the Bay Area, of tech, of science, of whatever intellectually stimulating thought surfaced overnight. I follow a handful of curated newsletters too, but I have noticed over the years that HN covers most of their content anyway if you know how to filter high signal from low signal. I could write something about a different link every single day. Most mornings I resist. This morning I did not.&lt;&#x2F;p&gt;
&lt;p&gt;A link caught my eye. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;veralang.dev&#x2F;&quot;&gt;Vera&lt;&#x2F;a&gt;, a new programming language “designed for machines to write, not humans.” Statically typed, purely functional, compiles to WebAssembly, uses Microsoft’s Z3 solver for contract verification. It has a ferret mascot. I like animal mascots for tech projects. Ferris the crab for Rust, the gopher for Go, the Shisa guardian dog for &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&#x2F;&quot;&gt;Zentinel&lt;&#x2F;a&gt;. The ferret is a good choice.&lt;&#x2F;p&gt;
&lt;p&gt;But the mascot is not why I stopped scrolling. I stopped because somebody else had arrived at the same conclusion I had reached back in December: that conventional programming languages are not adapted to the capabilities of this new technology, and that something has to change.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-christmas-realization&quot;&gt;The Christmas realization&lt;&#x2F;h2&gt;
&lt;p&gt;I had been a paying Claude Code subscriber since May 2025, when Anthropic first launched it. The CLI orientation made sense to me immediately, even though the early rate limits and model quality left me wanting more. By December 2025, I had upgraded to the Max subscription, I was on vacation, and Anthropic had made the daily limits generous that month. I was burning through every idea I had accumulated over the years. Some were good. Some were terrible. All of them were finally testable in a way they had not been before, because I could pair-program with a model that kept up. I wrote about that shift more fully in &lt;a href=&quot;&#x2F;articles&#x2F;how-i-work-these-days&#x2F;&quot;&gt;How I Work These Days&lt;&#x2F;a&gt;, the short version being that late 2025 was when the relationship between ambition and execution fundamentally changed for me. The dam broke. Ideas that had been sitting in notebooks for years started becoming real software in days.&lt;&#x2F;p&gt;
&lt;p&gt;It was during one of those late-night sessions, deep in a Claude Code conversation about compiler design, that a thought crystallized. I had been building &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;arcanist.sh&#x2F;hx&#x2F;&quot;&gt;hx&lt;&#x2F;a&gt; and thinking about how AI would change the way people write Haskell, when I realized the question was bigger than Haskell. Agent-assisted software development is approaching a point where the output language itself, the intermediate layer we use to express how information should be processed, is going to change fundamentally.&lt;&#x2F;p&gt;
&lt;p&gt;This is not an abstract observation. The history of software is a history of abstraction layers accumulating, each one letting the next generation of programmers ignore what the previous generation had to master.&lt;&#x2F;p&gt;
&lt;p&gt;It started with machine code. Raw opcodes, different for every CPU architecture. If you wanted to write software, you memorized instruction sets and wrote them by hand. People published thick reference manuals, and there were engineers who could hold entire instruction set architectures in their heads. Some of them are still around, and some of them still swear by that methodology.&lt;&#x2F;p&gt;
&lt;p&gt;Then assemblers gave those opcodes human-readable names. &lt;code&gt;MOV AX, BX&lt;&#x2F;code&gt; instead of &lt;code&gt;89 D8&lt;&#x2F;code&gt;. You were still writing for a specific architecture, still thinking in registers, but now you could read what you wrote. The first abstraction was not a new capability. It was legibility.&lt;&#x2F;p&gt;
&lt;p&gt;Then C arrived and gave us a portable abstraction over hardware. You stopped thinking in registers and started thinking in functions and pointers. C compiled down to architecture-specific assembly, but you did not have to care which architecture. One language, many targets. The reference manuals shifted from instruction sets to language specifications.&lt;&#x2F;p&gt;
&lt;p&gt;Then interpreters and virtual machines added another layer. The JVM, Python, Perl. You stopped thinking about memory layout and started thinking about objects, iterators, garbage collection. The abstraction was thicker, the feedback loop was faster, and the audience expanded from hardware engineers to anyone who could write a script.&lt;&#x2F;p&gt;
&lt;p&gt;Then IDEs changed how you interacted with the language itself. Syntax highlighting, autocomplete, integrated debuggers, refactoring tools. You stopped holding the entire API surface in your head because the editor held it for you. The language did not change, but the cognitive cost of using it dropped. IntelliSense was not a language feature. It was an abstraction over the programmer’s memory.&lt;&#x2F;p&gt;
&lt;p&gt;Then the internet changed how code moved. Open source repositories, package managers, shared libraries. You stopped writing everything from scratch and started composing from parts other people had built. The reference material moved from printed books to web documentation, wikis, tutorials.&lt;&#x2F;p&gt;
&lt;p&gt;Then StackOverflow changed how people learned. Instead of reading manuals cover to cover, you searched for the specific problem you had and found someone who had already solved it. The knowledge layer itself became an abstraction. You did not need to understand the full system. You needed to find the right answer and adapt it to your context. StackOverflow never compiled a line of code, but it was an intermediate layer between human confusion and working software, and it was arguably more important to the average developer’s productivity than any language feature shipped in the same decade.&lt;&#x2F;p&gt;
&lt;p&gt;And now StackOverflow is receding. Not because the answers got worse. Because the next abstraction layer arrived. AI coding agents do what StackOverflow did, finding known solutions to known problems, but they also do what StackOverflow never could: synthesize novel solutions, hold project context across files, and generate working code from intent descriptions. The pattern is the same as every previous transition. Each layer makes the previous one less essential, not by replacing it but by absorbing it into a higher-level abstraction.&lt;&#x2F;p&gt;
&lt;p&gt;Programming languages have always been shaped by who writes them. Assembly was shaped by hardware engineers who thought in registers and opcodes. C was shaped by systems programmers who needed portable abstractions over memory. Python was shaped by people who wanted to get things done without fighting the syntax. COBOL was shaped by business analysts who wanted code that read like English. Every language carries the fingerprints of its intended author. If AI becomes the primary author of code, it follows that the language should adapt to that new author’s strengths and weaknesses. This is not a break from the pattern. It is the pattern, doing what it has always done.&lt;&#x2F;p&gt;
&lt;p&gt;I kept coming back to three concrete possibilities. Three ways the intermediate layer could evolve. Not competing visions, exactly. More like three points on a timeline.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;make-the-language-explicit-enough-for-machines&quot;&gt;Make the language explicit enough for machines&lt;&#x2F;h2&gt;
&lt;p&gt;Anyone who has spent real time with coding agents has seen the failure mode. You ask an LLM to write a Python function. The output looks plausible. It passes linting. The variable names are reasonable. The structure follows common patterns. Then it fails at runtime because of a subtle implicit behavior the model did not track. A default mutable argument that gets shared across calls. A generator that gets silently exhausted on second iteration. A method that returns &lt;code&gt;None&lt;&#x2F;code&gt; instead of raising an exception because some library author decided that was more “Pythonic.” The model was not wrong about the algorithm. It was wrong about the language’s hidden behaviors.&lt;&#x2F;p&gt;
&lt;p&gt;This is the problem Vera is trying to solve, and the first approach I had been contemplating. Take the simplicity and explicitness of Go, push it further, and design a language where every instruction, every method, every design pattern is as unambiguous as possible. No implicit behaviors. No naming ambiguity. No style choices. One canonical way to write everything, so that an LLM does not have to waste inference tokens reasoning about which of seventeen valid approaches to take.&lt;&#x2F;p&gt;
&lt;p&gt;The language would need excellent compiler diagnostics. Not just “type mismatch on line 47,” but structured feedback that a model can parse, understand, and act on immediately. Rust and Elixir already do this well for humans. Do it even better, and do it for machines.&lt;&#x2F;p&gt;
&lt;p&gt;The pipeline looks like this:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Explicit language feedback loop&quot;&gt;+-------+     prompt      +-------+    explicit     +-----------+
| Human | -------------&amp;gt;  |  LLM  | -------------&amp;gt;  | Verifying |
+-------+                 +-------+    source       | Compiler  |
                             ^                      +-----------+
                             |                           |
                             |   structured error        |
                             |   + suggested fix         |
                             +---------------------------+
                                                         |
                                                    verified
                                                         |
                                                         v
                                                  [correct program]&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The key insight is the feedback loop. The compiler does not just reject bad code. It explains what is wrong in terms the model can act on, with a concrete fix suggestion. The model re-generates. The compiler re-checks. You converge on correct code through iteration, and the tightness of that loop depends on how unambiguous the language is and how actionable the errors are.&lt;&#x2F;p&gt;
&lt;p&gt;Vera is exactly this idea, executed with conviction. Here is what a function looks like:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot;&gt;&lt;code data-lang=&quot;vera&quot;&gt;public fn safe_divide(@Int, @Int -&amp;gt; @Int)
  requires(@Int.1 != 0)
  ensures(@Int.result == @Int.0 &amp;#x2F; @Int.1)
  effects(pure)
{
  @Int.0 &amp;#x2F; @Int.1
}&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;No variable names at all. Parameters are referenced by type and positional index using De Bruijn slot notation. &lt;code&gt;@Int.0&lt;&#x2F;code&gt; is the most recently bound integer, &lt;code&gt;@Int.1&lt;&#x2F;code&gt; is the one before that. Every function must declare its preconditions (&lt;code&gt;requires&lt;&#x2F;code&gt;), postconditions (&lt;code&gt;ensures&lt;&#x2F;code&gt;), and side effects (&lt;code&gt;effects&lt;&#x2F;code&gt;). The compiler verifies contracts statically using Z3 where possible and falls back to runtime checks for what it cannot decide at compile time.&lt;&#x2F;p&gt;
&lt;p&gt;The design principle is sharp: the model does not need to be right, it needs to be checkable. The language constrains the space of valid programs so tightly that the compiler catches mistakes before execution and explains them in natural language. Division by zero is not a runtime exception. It is a contract violation caught during compilation.&lt;&#x2F;p&gt;
&lt;p&gt;Think of it like this. Traditional languages are an open field. You can walk in any direction, and you might end up somewhere useful or you might walk off a cliff. Vera is a guided path with guardrails. You can only go certain directions, and every time you try to step off the path, a sign tells you exactly where to step instead. An LLM on an open field will wander. An LLM on a guided path will converge.&lt;&#x2F;p&gt;
&lt;p&gt;The early benchmark results are interesting, if mixed. Kimi K2.5 apparently writes perfect Vera code, scoring 100% on VeraBench and beating its own Python and TypeScript scores. Other models do not fare as well. Claude Opus 4 scores 88% in Vera versus 96% in Python. The Vera team is honest about this variance, which I appreciate. The &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47696263&quot;&gt;HackerNews discussion&lt;&#x2F;a&gt; was thin, twelve points and three skeptical comments, which tells you how early this conversation still is. The broader developer community has not engaged with this idea seriously yet. That will change.&lt;&#x2F;p&gt;
&lt;p&gt;But there is something that bothers me about optimizing the language for how machines iterate on solutions. Look at the pipeline diagram again. The LLM is still generating step-by-step instructions. It is still describing HOW to do things, just with the ambiguity stripped out and the contracts made explicit. The machine still reasons through the process sequentially. You have reduced the noise in the feedback loop, but you have not changed the nature of the signal. The model is still writing recipes. They are just more precise recipes.&lt;&#x2F;p&gt;
&lt;p&gt;What if you stopped writing recipes entirely?&lt;&#x2F;p&gt;
&lt;h2 id=&quot;describe-what-not-how&quot;&gt;Describe what, not how&lt;&#x2F;h2&gt;
&lt;p&gt;The second idea starts from a different premise. Instead of making procedural code easier for AI to write correctly, change what you ask the AI to express.&lt;&#x2F;p&gt;
&lt;p&gt;Let me make this concrete. Say you need to find the ten most recent server errors in a log. Here is how you would describe that process in a procedural language:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;python&quot; class=&quot;language-python &quot;&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;errors = []
for entry in log_entries:
    if entry.status &amp;gt;= 500:
        errors.append({
            &amp;quot;time&amp;quot;: entry.timestamp,
            &amp;quot;path&amp;quot;: entry.path,
            &amp;quot;code&amp;quot;: entry.status
        })
errors.sort(key=lambda e: e[&amp;quot;time&amp;quot;], reverse=True)
return errors[:10]
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You are telling the machine: create an empty list. Walk through each entry. Check a condition. If it matches, build a dictionary and append it. Then sort the accumulated list by a key. Then take the first ten elements. Every step is an instruction. The machine has to track the mutable list, the iteration state, the sort, the slice. And if you get any step wrong, the others might still succeed, producing output that looks correct but is subtly broken. A missing &lt;code&gt;reverse=True&lt;&#x2F;code&gt; and you silently get the oldest errors instead of the most recent.&lt;&#x2F;p&gt;
&lt;p&gt;Here is the same thing in Haskell:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;haskell&quot; class=&quot;language-haskell &quot;&gt;&lt;code class=&quot;language-haskell&quot; data-lang=&quot;haskell&quot;&gt;recentErrors :: [LogEntry] -&amp;gt; [ErrorSummary]
recentErrors =
    take 10
  . sortBy (flip compare `on` time)
  . map toSummary
  . filter isServerError
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Read it bottom to top: filter server errors, transform each into a summary, sort by time descending, take ten. There is no mutable accumulator. No loop variable. No intermediate state. You are not describing a process. You are describing a relationship between the input and the output. The function says what the result IS, not how to compute it step by step.&lt;&#x2F;p&gt;
&lt;p&gt;The type signature at the top, &lt;code&gt;[LogEntry] -&amp;gt; [ErrorSummary]&lt;&#x2F;code&gt;, is a contract the compiler enforces. If &lt;code&gt;toSummary&lt;&#x2F;code&gt; returns the wrong type, if &lt;code&gt;isServerError&lt;&#x2F;code&gt; does not take a &lt;code&gt;LogEntry&lt;&#x2F;code&gt;, if you accidentally compose functions in an order that does not type-check, the compiler rejects the program before it runs. Not with a vague “object has no attribute” at runtime. With a precise type error at compile time that tells you exactly which piece does not fit.&lt;&#x2F;p&gt;
&lt;p&gt;This distinction matters enormously for AI. Think about what an LLM actually has to track in each case:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Procedural vs declarative complexity&quot;&gt;Procedural (Vera, Python, Go)          Declarative (Haskell)
================================       ================================
- mutable variables and their          - input type
  current state at each step           - output type
- loop iteration progress              - which transformations to compose
- conditional branching outcomes       - whether types align
- order of side effects
- implicit language behaviors
- names and what they refer to&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The procedural model asks the AI to simulate execution in its head. The declarative model asks the AI to describe a transformation and let the compiler verify it. One plays to an LLM’s weakness (tracking state across many steps). The other plays to its strength (recognizing and generating patterns that satisfy formal constraints).&lt;&#x2F;p&gt;
&lt;p&gt;The pipeline changes fundamentally:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Type-driven proof pipeline&quot;&gt;+-------+     prompt      +-------+   type sigs      +-----------+
| Human | -------------&amp;gt;  |  LLM  | -------------&amp;gt;    | Compiler  |
+-------+                 +-------+   pure exprs      +-----------+
                                                           |
                                                      types align?
                                                        &amp;#x2F;      \
                                                      yes       no
                                                      |          |
                                                      v          v
                                              [proven correct]  [precise type error:
                                                                 &amp;quot;Expected LogEntry,
                                                                  got String at ...]&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;No feedback loop needed in the happy path. If the types align, the program is correct by construction for the properties the type system tracks. The compiler is not iterating with the model. It is checking a proof.&lt;&#x2F;p&gt;
&lt;p&gt;This is the approach I bet on. It is the reason this blog exists.&lt;&#x2F;p&gt;
&lt;p&gt;Raskell, the name behind this site, is a portmanteau of Rascal (as in raccoon, which is my mascot) and Haskell. The language I have loved for years because of how elegantly it describes things close to mathematical proofs. QEDs, not TODOs. When you write a well-typed Haskell function, you are not just writing code. You are writing a proof that a certain transformation is valid, and the compiler is the proof checker.&lt;&#x2F;p&gt;
&lt;p&gt;But loving Haskell and shipping Haskell in production are different experiences, and the gap between them is mostly tooling. The ecosystem is fragmented in a way that has frustrated people for over a decade. &lt;code&gt;cabal&lt;&#x2F;code&gt;, &lt;code&gt;stack&lt;&#x2F;code&gt;, &lt;code&gt;ghcup&lt;&#x2F;code&gt;. Three tools that do overlapping jobs with different opinions about how dependencies should work. If you come from Python, imagine if &lt;code&gt;pip&lt;&#x2F;code&gt;, &lt;code&gt;poetry&lt;&#x2F;code&gt;, and &lt;code&gt;pyenv&lt;&#x2F;code&gt; were all developed independently, with different lockfile formats, different resolver algorithms, and occasional incompatibilities. That was Haskell. Build times were slow. Error messages ranged from helpful to cryptic. The runtime assumed one performance profile fits every use case. If you wanted Haskell for edge functions, for embedded systems, for GPU-accelerated numerics, you spent as much time fighting the toolchain as writing the actual code.&lt;&#x2F;p&gt;
&lt;p&gt;The language was right. The surrounding infrastructure was not.&lt;&#x2F;p&gt;
&lt;p&gt;So I started building &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;arcanist.sh&quot;&gt;arcanist.sh&lt;&#x2F;a&gt;, taking the same approach that &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;astral.sh&quot;&gt;astral.sh&lt;&#x2F;a&gt; brought to Python tooling. When astral.sh released &lt;code&gt;uv&lt;&#x2F;code&gt; and &lt;code&gt;ruff&lt;&#x2F;code&gt;, it showed that you could take a mature ecosystem with entrenched tooling, rebuild the developer experience from scratch in Rust, and make everything dramatically faster and more coherent. I wanted to do the same for Haskell. arcanist.sh houses two projects.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;arcanist.sh&#x2F;hx&#x2F;&quot;&gt;hx&lt;&#x2F;a&gt; is a fast, opinionated, next-gen toolchain for Haskell, built in Rust. One tool that replaces the fragmented stack. Managed compiler versions pinned per-project. Deterministic TOML lockfiles with fingerprint verification. 5.6x faster cold builds than cabal. 7.8x faster incremental rebuilds.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;curl -fsSL https:&amp;#x2F;&amp;#x2F;arcanist.sh&amp;#x2F;install.sh | sh
hx new my-app &amp;amp;&amp;amp; cd my-app
hx run
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;No ghcup. No stack. No cabal-install. Just &lt;code&gt;hx&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;arcanist.sh&#x2F;bhc&#x2F;&quot;&gt;BHC&lt;&#x2F;a&gt;, the Basel Haskell Compiler, goes further. It is a clean-slate Haskell compiler written in Rust, not a GHC fork, targeting the Haskell 2026 Platform specification. It uses LLVM for native code generation and offers six runtime profiles that you select at compile time:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Profile&lt;&#x2F;th&gt;&lt;th&gt;Designed for&lt;&#x2F;th&gt;&lt;th&gt;Key trait&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;default&lt;&#x2F;td&gt;&lt;td&gt;General applications&lt;&#x2F;td&gt;&lt;td&gt;Lazy evaluation, GC-managed, GHC-compatible semantics&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;server&lt;&#x2F;td&gt;&lt;td&gt;Backend services&lt;&#x2F;td&gt;&lt;td&gt;Structured concurrency, automatic cancellation, observability hooks&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;numeric&lt;&#x2F;td&gt;&lt;td&gt;ML and scientific compute&lt;&#x2F;td&gt;&lt;td&gt;Strict numerics, tensor lowering, SIMD, GPU backends (CUDA&#x2F;ROCm)&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;edge&lt;&#x2F;td&gt;&lt;td&gt;WASM and CDN workers&lt;&#x2F;td&gt;&lt;td&gt;Minimal footprint, direct WASM emission without LLVM&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;realtime&lt;&#x2F;td&gt;&lt;td&gt;Games, audio, robotics&lt;&#x2F;td&gt;&lt;td&gt;Bounded GC pauses under 1ms, arena allocation&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;embedded&lt;&#x2F;td&gt;&lt;td&gt;Bare metal, microcontrollers&lt;&#x2F;td&gt;&lt;td&gt;No GC at all, static allocation, targets like ARM Cortex-M&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;The same Haskell source, different runtime contracts depending on what you are building. Your security policy engine compiles with the server profile and gets structured concurrency with tracing. Your tensor pipeline compiles with the numeric profile and gets GPU acceleration. Your edge function compiles to WASM and runs on Cloudflare Workers. Same language. Same type safety. Different performance envelopes.&lt;&#x2F;p&gt;
&lt;p&gt;The conviction behind this work is specific. When AI writes the code, the language that survives is not the one optimized for procedural explicitness. It is the one that brings consistency and purity by describing what something is, not how to compute it. AI is extraordinarily good at generating expressions that satisfy formal constraints. And source code that reads like a proof can survive time. It can survive maintainer burnout. It can survive the fact that three years from now, nobody remembers why a function was written the way it was. The types remember. The proof is self-documenting in a way that procedural code never is, because the types encode the intent.&lt;&#x2F;p&gt;
&lt;p&gt;Vera and arcanist.sh accept the same premise: the intermediate layer is changing. They disagree about which direction it should change in. Vera optimizes for reducing errors in the generation loop. Valuable. But hx and BHC optimize for making the generated code correct by construction, because the language itself constrains what valid programs look like at a structural level.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;skip-the-language-entirely&quot;&gt;Skip the language entirely&lt;&#x2F;h2&gt;
&lt;p&gt;The third possibility is the one I think about late at night and do not have a concrete project for. Not yet. It is also the one I find most fascinating and most unsettling.&lt;&#x2F;p&gt;
&lt;p&gt;What if AI stops writing source code at all?&lt;&#x2F;p&gt;
&lt;p&gt;To understand why this is plausible, it helps to look at what source code actually is. It is not the final product. It never was. Source code is a set of instructions that a compiler transforms into machine code. Machine code is what the hardware executes. Source code exists because humans needed an abstraction layer between their intentions and the silicon. We think in concepts like “sort this list” or “reject unauthorized requests.” The CPU thinks in register moves, memory loads, and conditional jumps. Source code bridges that gap.&lt;&#x2F;p&gt;
&lt;p&gt;But that bridge was built for human authors. If the author is no longer human, the bridge serves a different purpose. It becomes an audit trail. A way for humans to read and verify what the AI produced. Not an authoring medium, but a transparency layer.&lt;&#x2F;p&gt;
&lt;p&gt;I see this playing out in two distinct phases.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;phase-one-ai-targets-existing-machine-code&quot;&gt;Phase one: AI targets existing machine code&lt;&#x2F;h3&gt;
&lt;p&gt;The first phase is closer than most people think, and it is conceptually straightforward. We already train models on source code. What happens when we also train them extensively on compiled artifacts? On binaries, object files, intermediate representations, the actual output of compilers?&lt;&#x2F;p&gt;
&lt;p&gt;Consider what a compiler does. It takes source code and transforms it into machine instructions following well-defined, deterministic rules. There is a mapping between source patterns and output patterns. A &lt;code&gt;for&lt;&#x2F;code&gt; loop in C becomes a specific sequence of compare, branch, and increment instructions on x86. A function call follows a specific calling convention. Memory allocation follows specific system call patterns. These mappings are learnable. They are patterns, and pattern recognition is exactly what LLMs excel at.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Skipping the source layer&quot;&gt;Today:
human intent → prompt → LLM → source code → compiler → x86&amp;#x2F;ARM → CPU

Phase one:
human intent → prompt → LLM → x86&amp;#x2F;ARM directly → CPU
                                (trained on source +
                                 compiled artifact pairs)&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;In this phase, the model skips the source code layer and generates machine code directly. Not by “compiling” in the traditional sense. By having learned the patterns well enough to produce valid executables from intent descriptions. The way a fluent translator does not parse grammar rules consciously but produces correct sentences from meaning directly.&lt;&#x2F;p&gt;
&lt;p&gt;This sounds radical until you remember that we already trust compilers we do not read the output of. When was the last time you inspected the assembly output of &lt;code&gt;gcc -O3&lt;&#x2F;code&gt; to verify it correctly compiled your C program? You trust the compiler. You test the behavior of the resulting binary. You do not audit the intermediate representation. If an AI can produce binaries that pass the same behavioral tests, the practical difference between “AI-generated machine code” and “compiler-generated machine code” becomes a question of trust calibration, not fundamental possibility.&lt;&#x2F;p&gt;
&lt;p&gt;The analogy I keep returning to is aviation. Early pilots flew by hand and understood every mechanical system in the aircraft. Fly-by-wire changed that. The pilot communicates intent (climb, turn, maintain altitude). The computer translates that into control surface movements. The pilot does not manually adjust ailerons and elevators for every gust of wind. They trust the system. They verify outcomes (altitude, heading, airspeed), not intermediate steps. Phase one of post-language programming is fly-by-wire for software.&lt;&#x2F;p&gt;
&lt;p&gt;If this sounds speculative, consider what happened this week. Anthropic announced &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.anthropic.com&#x2F;glasswing&quot;&gt;Project Glasswing&lt;&#x2F;a&gt;, a coalition including AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike, Palo Alto Networks, the Linux Foundation, and others, formed to secure the world’s most critical software using AI. Dario Amodei, Anthropic’s CEO, put it plainly:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;“AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.”&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;The proof point he offered: “For OpenBSD, we found a bug that’s been present for 27 years.”&lt;&#x2F;p&gt;
&lt;p&gt;Think about what that means. OpenBSD is one of the most carefully audited codebases in the world. Decades of security-focused human review by some of the most meticulous systems programmers alive. And an AI model found something that every human reviewer missed for twenty-seven years. If AI can understand existing code deeply enough to find vulnerabilities that humans cannot, it can understand code deeply enough to generate it without human-readable source as an intermediate step. The question is no longer whether AI comprehends code at a structural level. That question was answered this week. The question is what it does with that comprehension next.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;phase-two-a-new-kind-of-machine&quot;&gt;Phase two: a new kind of machine&lt;&#x2F;h3&gt;
&lt;p&gt;The second phase is further out and more speculative. But I think it is where things ultimately go.&lt;&#x2F;p&gt;
&lt;p&gt;If AI is generating code for machines to execute and no human needs to read it, there is no reason that code needs to target instruction sets designed for human comprehension. x86 and ARM were designed with the assumption that someone, at least occasionally, would look at the instructions. They have mnemonics. They follow conventions that make disassembly feasible. They are organized into instructions that map, loosely, to operations humans understand.&lt;&#x2F;p&gt;
&lt;p&gt;But what if the execution target was designed from scratch for AI-generated code? A virtual machine or runtime that consumes a new kind of bytecode. Not optimized for human readability. Not optimized for hand-authored assembly. Optimized purely for execution density and machine generation.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;AI-native execution target&quot;&gt;Phase two:
human intent → prompt → AI → dense symbolic bytecode → AI-native VM
                               (opaque to humans,         |
                                optimized for machine     v
                                generation + execution)  [result]&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;I keep thinking about information density. Chinese characters encode meaning in individual symbols that carry far more semantic weight than Latin alphabet words. A single character can represent a concept that takes an entire English phrase to express. When a system is designed for readers who can process dense symbols natively, the representation compresses. It becomes more efficient at the cost of being less accessible to readers who were not part of the design audience.&lt;&#x2F;p&gt;
&lt;p&gt;AI-native bytecode could follow the same principle. Each instruction could encode complex composite operations that would take dozens of conventional instructions to express. The bytecode would be dense in ways that make current machine code look verbose. Entirely opaque when decompiled or analyzed. Not obfuscated on purpose. Just natively incomprehensible to human cognition, the same way a trained neural network’s weights are incomprehensible even though they encode real, functional knowledge.&lt;&#x2F;p&gt;
&lt;p&gt;The virtual machine running this bytecode would itself be a different kind of system. Not a stack machine or a register machine in the traditional sense. Possibly something closer to a dataflow engine, where the bytecode describes transformation graphs rather than sequential instructions. Think of it as the difference between giving someone turn-by-turn driving directions (go north, turn left, continue for two miles) versus handing them a map with the destination marked. The bytecode is the map. The VM figures out the route.&lt;&#x2F;p&gt;
&lt;p&gt;I want to be honest about where we are. We are not at phase one. Not in April 2026. Current models still need the intermediate layer. They produce better code when they can reason through it step by step. They benefit from type systems and contracts and explicit error messages. The first and second approaches are not just viable, they are necessary right now.&lt;&#x2F;p&gt;
&lt;p&gt;But the trajectory is visible. Models are getting better at generating correct programs with every generation. Formal verification is becoming more practical. Hardware is getting cheaper. The gap between “prompt that describes intent” and “correct executable output” is shrinking. Phase one will arrive when models trained on enough source-plus-binary pairs can reliably produce correct executables. Phase two will arrive when someone asks: if the model is already generating the binary, why are we targeting an instruction set that was designed for a species that is no longer doing the writing?&lt;&#x2F;p&gt;
&lt;p&gt;At some point the intermediate layer becomes optional. And optional things, given enough time, become vestigial.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-this-means&quot;&gt;What this means&lt;&#x2F;h2&gt;
&lt;p&gt;I do not think these three approaches are in competition. They are three points on a timeline, and the timeline is the story of the intermediate layer contracting.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;The intermediate layer timeline&quot;&gt;Near term          Medium term           Long term
(now)              (2-5 years)           (5-15 years)

Explicit           Declarative           Post-language
languages          languages             (AI-native targets)
(Vera)             (Haskell + BHC)

Reduce noise  --&amp;gt;  Change the signal --&amp;gt; Remove the layer
in the loop        entirely              entirely

HOW, but           WHAT, verified        Intent to
unambiguous        by types              execution&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;In the near term, explicit languages like Vera make AI-generated code more reliable by constraining the generation space and providing machine-readable diagnostics. This is useful today. If you are building an AI coding pipeline right now and need to ship next quarter, this approach works.&lt;&#x2F;p&gt;
&lt;p&gt;In the medium term, declarative languages like Haskell, especially with modern tooling and modern runtimes, make AI-generated code correct by construction. The type system does the heavy verification work at a fundamental level. The code that survives is the code that describes invariants, not procedures. This is the era I am building for with arcanist.sh.&lt;&#x2F;p&gt;
&lt;p&gt;In the long term, the language disappears. First into existing machine code generated directly by AI. Then into new execution formats designed for AI generation from the ground up. The intermediate layer that has defined software engineering for seventy years becomes an implementation detail.&lt;&#x2F;p&gt;
&lt;p&gt;That last possibility makes some people uncomfortable. It made me uncomfortable when I first thought about it in December. It means that the craft of programming as we know it, the fluency in syntax, the mastery of idioms, the instinct for an elegant implementation, becomes less like a core skill and more like knowing how to operate a manual lathe. Valuable in the right context. Still needed for the hard cases. But no longer the primary way most software gets built.&lt;&#x2F;p&gt;
&lt;p&gt;This has happened before. There was a time when every programmer understood assembly. Then C abstracted it away, and most programmers stopped reading machine code. Then Python and JavaScript abstracted C away, and most programmers stopped thinking about memory management. Each time, the previous layer did not disappear. It became the domain of specialists who maintained the infrastructure everyone else stood on. The same thing will happen to source code. It will not vanish. It will specialize.&lt;&#x2F;p&gt;
&lt;p&gt;I still write code every day. I still think in types. I am still building hx and BHC because I believe the medium-term future is both real and long, and Haskell’s strengths are exactly what that future demands. Pure functions, strong types, provable correctness. These are not luxuries in a world where AI writes the implementation. They are the load-bearing structure. But I do it with one eye on the horizon, knowing that the intermediate layer I am investing in is exactly that. Intermediate.&lt;&#x2F;p&gt;
&lt;p&gt;The person who built Vera saw the same thing I saw, probably around the same time. They chose the first approach. I chose the second. Somebody, eventually, will build the third. And then we will all need to figure out what we mean by “programming” when nobody writes programs anymore.&lt;&#x2F;p&gt;
&lt;p&gt;I am already curious about the answer.&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>Notes from RSAC 2026</title>
          <pubDate>Thu, 09 Apr 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/notes-from-rsac-2026/</link>
          <guid>https://raskell.io/articles/notes-from-rsac-2026/</guid>
          <description xml:base="https://raskell.io/articles/notes-from-rsac-2026/">&lt;p&gt;I am writing this about two weeks after RSAC 2026 closed. In between, my coworkers and I drove down the West Coast, through Big Sur to LA, then out to Las Vegas for Easter weekend. That was deliberate. Not the route, exactly, but the space. The conference gave me a lot to process, and I have learned over the years that I process better when I am moving, when I am not sitting at a desk trying to force conclusions out of raw impressions.&lt;&#x2F;p&gt;
&lt;p&gt;So here is what I took away. Not a session-by-session recap. Not a vendor roundup. Just the things that are still with me now that the noise has faded and the Pacific Coast Highway is behind me.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-talk&quot;&gt;The talk&lt;&#x2F;h2&gt;
&lt;p&gt;Milan Duric and I presented “Self-Learning WAF: Using Generative AI to Tame ModSecurity False Positives” on Wednesday morning, March 25, in Moscone West 3020. We had an 8:30 AM slot, which is either a curse or a blessing depending on your audience. It turned out to be a blessing. The room was full of people who had chosen to be there at that hour, which means they cared about the topic, and the energy reflected that.&lt;&#x2F;p&gt;
&lt;p&gt;The talk went flawlessly. If you have ever presented at a conference of that scale, you know that “flawless” is not something you take for granted. There is always the moment before you start where you wonder if the demo will work, if the projector will behave, if your timing will hold. All of it held. Milan and I had rehearsed enough that the talk felt natural rather than performed, which is the line you want to hit.&lt;&#x2F;p&gt;
&lt;p&gt;I will write a separate post about the content of the talk itself, what we built, what we learned, how the audience responded to specific ideas. This post is about everything else.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;third-time-first-time&quot;&gt;Third time, first time&lt;&#x2F;h2&gt;
&lt;p&gt;This was my third RSA Conference. I attended in 2024, 2025, and now 2026. But it was my first time as a speaker, and that changed the experience in ways I did not fully expect.&lt;&#x2F;p&gt;
&lt;p&gt;When you attend as a participant, you are a consumer of the conference. You pick sessions, you walk the expo floor, you absorb. When you are a speaker, even for just one session, you become part of the fabric. People approach you after the talk. They reference something you said in a hallway conversation two days later. You are on the other side of the dynamic, and it gives you a different relationship with the event.&lt;&#x2F;p&gt;
&lt;p&gt;I have grown to genuinely appreciate the sheer volume and quality of the whole thing. RSAC is not one conference. It is several conferences layered on top of each other. There are deeply technical sessions where people walk you through real implementations, real code, real incident response timelines. There are strategic talks where CISOs and policy architects work through the organizational and regulatory implications of what is changing. And then there are the keynotes, the big voices, people who have shaped the field for decades, sharing what they see on the horizon.&lt;&#x2F;p&gt;
&lt;p&gt;Because it happens in San Francisco, in the heart of the Bay Area, the reach is different from any other security event. You are not just at a conference. You are at the geographic center of the industry that is driving the transformation everyone is trying to understand. The density of talent, capital, and ambition in that city during RSAC week is difficult to describe if you have not experienced it. The only comparable events I can think of are BlackHat and DefCon, but even those have a different energy. RSAC pulls in a wider spectrum of the industry, from the deeply technical to the deeply strategic, from startup founders to government officials, and puts them all in the same building for a week. That range is what makes it valuable.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-year-of-agents&quot;&gt;The year of agents&lt;&#x2F;h2&gt;
&lt;p&gt;Ever since I started attending in 2024, AI has been a substantial part of the conference. That makes sense. The developments in AI over the past few years have had a prominent cybersecurity dimension from the start, and the industry has been working through what that means, both as a threat to defend against and as a capability to harness.&lt;&#x2F;p&gt;
&lt;p&gt;But this year felt qualitatively different from the previous two. In 2024 and 2025, the AI conversation was broad and somewhat exploratory. What can large language models do for security? How do we detect AI-generated phishing? What does the threat landscape look like when attackers have access to the same models we do? Important questions, but still in the “what is possible” phase.&lt;&#x2F;p&gt;
&lt;p&gt;2026 was past that. The conversation had narrowed and deepened. It was specifically about agents. Not AI in general, not language models as a capability, but autonomous agents as a new category of infrastructure. Enterprise-grade agentic systems. Agentic orchestration patterns. Agent-native architectures. The shift from “can we use AI?” to “how do we architect our systems around autonomous agents that are already here?” was palpable in almost every session I attended.&lt;&#x2F;p&gt;
&lt;p&gt;The concept that kept coming up was NHI: non-human identities. This term has existed in the identity and access management world for a while, but at RSAC 2026 it had taken on a new meaning. The old NHI conversation was about service accounts, API keys, machine certificates. The new NHI conversation is about LLM inference backends that operate as something fundamentally different from traditional automated systems. These are entities that do not just execute a fixed pipeline. They reason, they make judgment calls, they interact with systems and data in ways that look more like what a human analyst does than what a cron job does. But they operate at machine speed, they do not sleep, and they do not have the contextual judgment or accountability that comes with a human in the seat.&lt;&#x2F;p&gt;
&lt;p&gt;The trust problem this creates is real, and it is not just a theoretical concern. Human employees were already risk factors before AI entered the picture. Insider threats, social engineering, credential compromise, accidental misconfiguration. These are well-understood attack surfaces. Now add entities that move faster than any human, that can touch more systems in a minute than a human employee touches in a day, and that are harder to audit because their reasoning process is opaque. The attack surface did not just grow. It changed shape in ways that existing security architectures were not designed for.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-human-in-the-loop-debate&quot;&gt;The human-in-the-loop debate&lt;&#x2F;h2&gt;
&lt;p&gt;This was the most interesting tension I observed across the conference, and I think it is going to be one of the defining questions for cybersecurity in the next few years.&lt;&#x2F;p&gt;
&lt;p&gt;A small number of talks presented approaches that kept a human firmly in the loop. AI assists, human decides. AI flags, human acts. AI generates recommendations, human approves or rejects. These were careful, measured presentations, and some of them were good. The argument is intuitive and appeals to anyone who has been burned by automation gone wrong: keep a human in the critical path because humans have judgment that machines do not.&lt;&#x2F;p&gt;
&lt;p&gt;But the majority of the conference had reached a different consensus, and it was stated with increasing confidence as the week went on: the human in the loop is a bottleneck. Not philosophically. Operationally. In terms of the speed at which threats materialize and the speed at which defenses need to respond.&lt;&#x2F;p&gt;
&lt;p&gt;The argument is straightforward once you lay it out. Adversaries and threat actors are already leveraging AI to accelerate their operations. They are scanning for vulnerabilities at machine speed. They are generating novel attack variations faster than any human analyst can write detection rules. They are using AI to identify and exploit zero-day vulnerabilities in timeframes that make traditional patch cycles look like geological processes. If your defensive response depends on a human reading an alert, understanding the context, making a judgment call, and clicking a button before a countermeasure activates, you have introduced a rate limiter into your defense that your attacker does not have. You are playing at human speed against an adversary operating at machine speed.&lt;&#x2F;p&gt;
&lt;p&gt;I agree with this assessment, and I want to be precise about what I mean by that. I do not think humans are irrelevant to security. They are not. Human judgment, human understanding of organizational context, human ability to reason about novel situations, these remain essential. But I think the role of the human needs to shift fundamentally. The human should be a supervisor, not a gatekeeper. The human should set policy, define constraints, establish acceptable parameters, review outcomes, and intervene when something goes wrong. But the human should not be the bottleneck whose reaction time determines how fast sophisticated defense measures can respond to a threat that is moving at inference speed.&lt;&#x2F;p&gt;
&lt;p&gt;This is an engineering problem, not a philosophical one. How do you design agent-native architectures where the human is still in control, still has full visibility, still sets the rules, but is not the limiting factor in the response loop? That is the challenge of 2026. I did not hear anyone at the conference claim to have fully solved it. But I heard a lot of people working on it seriously, and the framing had matured beyond the naive “just automate everything” takes that dominated the early AI-security conversation.&lt;&#x2F;p&gt;
&lt;p&gt;This is also, incidentally, part of why I built &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&#x2F;&quot;&gt;Zentinel&lt;&#x2F;a&gt; the way I did. A reverse proxy that sits at the edge, where policy enforcement happens at wire speed, is exactly the kind of system that needs to operate autonomously within human-defined constraints. The agent architecture in Zentinel, where security logic runs in isolated processes with bounded resources and explicit failure modes, is my answer to the question of how you let autonomous systems make real-time decisions while keeping the human in the position of supervisor rather than bottleneck. I wrote about this in more detail in &lt;a href=&quot;&#x2F;articles&#x2F;what-zentinel-is-really-optimizing-for&#x2F;&quot;&gt;What Zentinel Is Really Optimizing For&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-four-factors&quot;&gt;The four factors&lt;&#x2F;h2&gt;
&lt;p&gt;I attended a panel with four former NSA directors and US Cyber Command commanders. Both positions are held by the same person at any given time, so these were people who had sat at the intersection of signals intelligence and military cyber operations at the highest level the United States has. Regardless of how you feel about the NSA or US foreign policy, the caliber of strategic thinking in that room was extraordinary.&lt;&#x2F;p&gt;
&lt;p&gt;Paul Nakasone said something that has stayed with me since. He laid out what he considers the four most important factors when assessing the strategic potentiality of a nation state in this era. Not military strength. Not GDP. Four specific things: chips, data, talent, and energy.&lt;&#x2F;p&gt;
&lt;p&gt;Chips, meaning silicon, meaning raw compute power. How many advanced GPUs can you deploy? How advanced are they relative to the frontier? And critically: can you manufacture them domestically, or are you dependent on someone else’s fabrication capacity? Right now, the entire world depends on TSMC in Taiwan for leading-edge chip fabrication, and the geopolitical implications of that single point of dependency are staggering.&lt;&#x2F;p&gt;
&lt;p&gt;Data, meaning access to the raw material that AI systems learn from. Who has it, how much of it, how diverse is it, and under what legal and political constraints can it be used for training and inference?&lt;&#x2F;p&gt;
&lt;p&gt;Talent, meaning the human capital that knows how to build, train, deploy, secure, and govern these systems. Where that talent lives, where it wants to live, and what it takes to attract and retain it. This is not just about researchers at frontier labs. It is about the entire pipeline: the engineers who build the infrastructure, the operators who keep it running, the security professionals who defend it, the policy people who regulate it.&lt;&#x2F;p&gt;
&lt;p&gt;Energy, meaning access to cheap, abundant, reliable power. Because the compute demands of frontier AI are measured in gigawatts now, not megawatts. A single frontier training run can consume more electricity than a small city. The question of whether you can physically power your AI ambitions is no longer abstract.&lt;&#x2F;p&gt;
&lt;p&gt;I could not stop thinking about &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ai-2027.com&#x2F;&quot;&gt;AI 2027&lt;&#x2F;a&gt; while listening to Nakasone. The scenario work by Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean, which I first wrote about in &lt;a href=&quot;&#x2F;articles&#x2F;how-i-work-these-days&#x2F;&quot;&gt;How I Work These Days&lt;&#x2F;a&gt;, makes strikingly similar assessments from a technology trajectory perspective rather than a national security one. Their scenario tracks the distribution of global AI compute (the US holding roughly 70% of frontier capacity through its companies, China around 12%), the geopolitical competition for chip manufacturing, the energy infrastructure required to sustain frontier operations (their projection of global AI datacenter spending reaching the trillion-dollar range by 2026 no longer reads like speculation), and the talent concentration in a handful of US-based labs.&lt;&#x2F;p&gt;
&lt;p&gt;What makes AI 2027 feel prophetic, and I do not use that word casually, is that its core thesis keeps holding up month after month. The idea that automating AI research itself creates a self-reinforcing feedback loop, that the cycle between capability and capability-building is compressing, that the timeline for transformative change is shorter than most institutional planning horizons assume. The specific dates and milestones may shift. The authors themselves have revised some timelines. But the directional assessment, the shape of the curve, still looks right to me as of April 2026. The scenario’s detailed treatment of compute distribution, espionage risks, and the escalatory dynamics between nation states competing for AI dominance maps remarkably well onto what I heard discussed in more guarded terms on the RSAC floor.&lt;&#x2F;p&gt;
&lt;p&gt;Listening to Nakasone lay out those four factors, I kept thinking about Europe. And about Switzerland specifically, since that is where I live and work. Europe is behind on all four. The continent does not manufacture frontier chips. It does not host the leading AI labs. Its regulatory environment, while well-intentioned, has optimized more for constraint than for capability. Its energy infrastructure is in the middle of a complex transition. And its talent pipeline, while strong in research, struggles to retain builders who can turn research into deployed systems at scale, because many of them leave for the Bay Area, London, Singapore, the Gulf states, or other places where the ecosystem is more supportive of what they want to build.&lt;&#x2F;p&gt;
&lt;p&gt;This is part of why I co-founded &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;die-zukunft.ch&#x2F;&quot;&gt;Die Zukunft&lt;&#x2F;a&gt;, a new Swiss political party focused on structural transformation. The name means “The Future” in German. The party exists because I believe the political infrastructure in Switzerland, and in most European countries, is not designed to respond to the kind of structural shift that Nakasone was describing. The decisions being made right now about compute sovereignty, energy policy, talent retention, and regulatory frameworks will determine whether Europe is a participant in the next decade or a consumer of other people’s technology. Die Zukunft’s platform addresses these questions directly: digital sovereignty defined in infrastructure terms, faster permitting for critical energy and compute projects, open standards as a hard requirement for government systems, and immigration policy designed to attract the talent that builds these systems. It is not a technology party in the narrow sense. It is a party built on the recognition that the structural transformation AI is driving is too consequential to be left to the current pace of European political response.&lt;&#x2F;p&gt;
&lt;p&gt;And this is also why I built &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&#x2F;&quot;&gt;Archipelag&lt;&#x2F;a&gt;. If Europe wants digital sovereignty, it needs sovereign compute infrastructure. Not just policy positions about data residency, but actual physical capacity to run AI workloads within European jurisdictions, at competitive cost, without depending on American hyperscalers. Archipelag is a decentralized AI compute network that routes inference jobs to community-operated nodes with jurisdiction-aware routing baked into the infrastructure layer. It is designed so that a European company can run AI workloads with cryptographic guarantees about where their data is processed, using idle GPU capacity that already exists across the continent. It is my direct answer to the infrastructure gap that Nakasone’s four factors expose so clearly for Europe.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-enterprise-problem&quot;&gt;The enterprise problem&lt;&#x2F;h2&gt;
&lt;p&gt;This connects to something broader that I have been thinking about since well before RSAC, but that the conference brought into sharper focus.&lt;&#x2F;p&gt;
&lt;p&gt;I work for a large Swiss financial institution. My perspective is shaped by what I see inside that kind of organization every day. But I believe the dynamic applies to enterprises of all sizes and in all sectors, even if the scale and specifics differ.&lt;&#x2F;p&gt;
&lt;p&gt;The biggest threat I see right now is not a specific vulnerability, not a particular attack vector, not a novel exploit technique. It is inertia. It is the widening gap between what is happening at the frontier of AI capability and what most organizations are actually doing about it. Too many companies still think they can outsource their way through this transition. Buy an AI-powered security product from a vendor. Subscribe to a managed detection service that mentions “AI” somewhere in its marketing materials. Check the compliance box and move on to the next quarter.&lt;&#x2F;p&gt;
&lt;p&gt;I do not think that is going to work. Not because those vendors are bad. Some of them are genuinely good at what they do. But because the organizations that will remain competitive and secure over the next few years are the ones that build internal AI capability, not just consume external AI services. That means investing in your own GPU compute. That means building the internal expertise to deploy, fine-tune, and operate models on your own infrastructure. That means treating AI as a core organizational competency, not a procurement line item.&lt;&#x2F;p&gt;
&lt;p&gt;This is not a popular opinion in many boardrooms. It is expensive. It is hard. It requires talent that is difficult to hire and even harder to retain when they can work at a frontier lab or a well-funded startup instead. But the alternative, waiting and relying on service providers to package AI innovation for you at their pace and under their terms, means you are always operating one step behind. You are consuming someone else’s capability with someone else’s priorities, under someone else’s constraints. In a landscape that is moving as fast as this one, that delay compounds.&lt;&#x2F;p&gt;
&lt;p&gt;The companies that understand this and invest now are going to have a structural advantage that widens over time. The ones that wait are going to find themselves trying to close a gap that gets larger with every quarter of inaction. I saw enough at RSAC to believe that some organizations have internalized this. And I saw enough to believe that many more have not.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-vendor-floor&quot;&gt;The vendor floor&lt;&#x2F;h2&gt;
&lt;p&gt;I want to be fair about this, because I know how the previous section sounds, and I do not want to come across as someone who dismisses the entire vendor ecosystem. I am not easily swayed by big tech vendors and their keynote promises, and I think anyone who works in security should maintain a healthy skepticism toward product demos and polished presentations. That is just professional hygiene.&lt;&#x2F;p&gt;
&lt;p&gt;That said, I genuinely enjoyed many of the keynotes this year. Some of these companies have real long-term vision. They see the shape of what is coming, and their best presenters can communicate that vision with clarity and conviction. I respect that, even when I disagree with their specific approach or their business model.&lt;&#x2F;p&gt;
&lt;p&gt;But enjoying a keynote and trusting that buying a product will translate into sustainable cybersecurity for your organization are very different things. Actual security is hands-on work. It is understanding your own systems, your own architecture, your own threat model, your own failure modes. It is the boring, unglamorous work of knowing what runs where, what talks to what, what happens when something fails, and what your actual attack surface looks like on a Tuesday afternoon. That work cannot be fully offloaded to a vendor. It cannot be outsourced to a dashboard, no matter how sophisticated the analytics behind it.&lt;&#x2F;p&gt;
&lt;p&gt;The vendors that impressed me most this year were the ones that acknowledged this honestly. The ones that positioned their tools as force multipliers for competent teams rather than replacements for the need to have competent teams in the first place. That distinction matters, and the vendors who understand it tend to build better products because they are designing for operators, not for procurement committees.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;closing-night&quot;&gt;Closing night&lt;&#x2F;h2&gt;
&lt;p&gt;The last session of the conference was Hugh Jackman in conversation with Hugh Thompson. I was not sure what to expect from a Hollywood actor closing out a cybersecurity conference, and I suspect a lot of people in the audience had the same reservation going in. But it worked. Jackman is funny, self-aware, and surprisingly thoughtful about creativity, discipline, and the craft of doing hard things well. He talked about preparation, about the difference between performing and connecting, about the years of work that go into making something look effortless.&lt;&#x2F;p&gt;
&lt;p&gt;At one point he taught the audience that if you say “raise up lights” in an American accent, you are saying “razor blades” in Australian. The room loved it. It was one of those moments where several thousand cybersecurity professionals all became delighted seven-year-olds for about ten seconds, and it was a good reminder that conferences are also about shared human moments, not just information transfer.&lt;&#x2F;p&gt;
&lt;p&gt;It was the right way to end a dense week. Light enough to let people exhale after five days of intense content, but substantive enough in its own way that it did not feel like filler.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-road-after&quot;&gt;The road after&lt;&#x2F;h2&gt;
&lt;p&gt;After the conference closed, a few of us stayed on. We rented a car and drove south from San Francisco, down Highway 1 through Big Sur. If you have not done that drive, I do not know how to describe it adequately except to say that it recalibrates your sense of scale. The Pacific is very large and very indifferent, and spending a few hours winding along cliff-edge roads with that water stretching out to the horizon below you is a useful counterweight to a week of thinking about the future of everything.&lt;&#x2F;p&gt;
&lt;p&gt;We spent time in LA. Santa Monica and Venice Beach, walking the boardwalk, eating food that was too expensive and not caring. The kind of aimless, unstructured time that my brain needed after five days of absorbing information at high density. I find that the most useful thinking often happens when you are not trying to think. When you are just watching the ocean or walking on a beach and letting your subconscious do whatever it does with the raw material you fed it.&lt;&#x2F;p&gt;
&lt;p&gt;Then Las Vegas for Easter weekend. Spring break crowds, desert heat starting to build, the particular surreality of the Strip. It was not productive time in any conventional sense, and it was not meant to be. It was decompression. Space for the conference to settle from a collection of impressions into something more like understanding.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-stays-with-me&quot;&gt;What stays with me&lt;&#x2F;h2&gt;
&lt;p&gt;Two weeks out, here is what I think is different.&lt;&#x2F;p&gt;
&lt;p&gt;I went to RSAC 2026 with a set of convictions about where things are heading. Agents are going to be the primary operating model for security infrastructure. Human-in-the-loop is going to shift to human-as-supervisor. Organizations that do not build their own AI infrastructure are going to fall behind structurally. Europe needs to wake up to the compute sovereignty problem before it becomes irreversible. These were things I already believed before I got on the plane to San Francisco.&lt;&#x2F;p&gt;
&lt;p&gt;What the conference did was sharpen them. Hearing Nakasone frame national potentiality in terms of chips, data, talent, and energy gave me a cleaner lens for thinking about the geostrategic dimension. Seeing the breadth and depth of the agentic conversation on the conference floor confirmed that this is not a niche position or an edge case anymore. It is the emerging consensus of the industry. And presenting our own work, standing in front of a room and showing what we actually built, made it more real in a way that writing code in a terminal at midnight does not.&lt;&#x2F;p&gt;
&lt;p&gt;Conferences like RSAC have always had this effect on me. They compress a year’s worth of signals into a week, and then you spend the following weeks unpacking what you heard and figuring out what it means for what you are building. After RSAC 2024, I started thinking seriously about edge security architectures, which eventually became Zentinel. After RSAC 2025, the urgency around AI-native infrastructure solidified into the work that became Archipelag. This year, I expect the sharpened understanding of agentic systems and the geostrategic landscape to feed directly into what I build next.&lt;&#x2F;p&gt;
&lt;p&gt;I also came away with a renewed sense of urgency about the gap between what the frontier looks like and what most organizations are doing about it. That gap, between the leading edge and the institutional mean, is the real risk. Not any single threat actor, not any specific vulnerability class. The systemic inability of large institutions to move at the pace that the situation demands. That is what keeps me up at night, and that is what I am trying to address in my own work, whether it is building infrastructure, writing about it, or working on the political dimension through Die Zukunft.&lt;&#x2F;p&gt;
&lt;p&gt;I am going to write a separate post about the talk itself, about what Milan and I built with self-learning WAFs and what we learned along the way. That is coming soon. For now, these are the notes I wanted to capture while the impressions are still sharp enough to be useful.&lt;&#x2F;p&gt;
&lt;p&gt;I am already looking forward to RSAC 2027.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references-and-further-reading&quot;&gt;References and further reading&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.rsaconference.com&#x2F;&quot;&gt;RSAC 2026&lt;&#x2F;a&gt; - The conference itself&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ai-2027.com&#x2F;&quot;&gt;AI 2027&lt;&#x2F;a&gt; - Scenario work by Kokotajlo, Alexander, Larsen, Lifland, and Dean&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;&#x2F;articles&#x2F;how-i-work-these-days&#x2F;&quot;&gt;How I Work These Days&lt;&#x2F;a&gt; - Where I first wrote about AI 2027 and the shift in how I build&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;&#x2F;articles&#x2F;what-zentinel-is-really-optimizing-for&#x2F;&quot;&gt;What Zentinel Is Really Optimizing For&lt;&#x2F;a&gt; - The design philosophy behind Zentinel and why agent isolation matters&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&#x2F;&quot;&gt;Zentinel&lt;&#x2F;a&gt; - The security-first reverse proxy I built on Pingora&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&#x2F;&quot;&gt;Archipelag&lt;&#x2F;a&gt; - Decentralized, sovereignty-first AI compute network&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;die-zukunft.ch&#x2F;&quot;&gt;Die Zukunft&lt;&#x2F;a&gt; - Swiss political party for structural transformation&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>What Zentinel Is Really Optimizing For</title>
          <pubDate>Sun, 22 Mar 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/what-zentinel-is-really-optimizing-for/</link>
          <guid>https://raskell.io/articles/what-zentinel-is-really-optimizing-for/</guid>
          <description xml:base="https://raskell.io/articles/what-zentinel-is-really-optimizing-for/">&lt;p&gt;The clearest way I can describe the motivation behind Zentinel is this: I got tired of not trusting the thing that stood between my users and the internet.&lt;&#x2F;p&gt;
&lt;p&gt;Not because the proxies I ran were bad. They were not. I have genuine respect for the engineering in Nginx, HAProxy, Envoy. I learned a lot from operating them, and I mean that sincerely. But over years of running these systems in production, a pattern kept repeating, and it was always some version of the same story: the proxy did something I could not predict from reading its configuration.&lt;&#x2F;p&gt;
&lt;p&gt;A WAF module gets slow under load, and because it runs inside the proxy process, the entire data path backs up. A retry storm starts because the default retry policy is implicit rather than explicit. A configuration reload takes effect partially because there is no atomic swap. An unbounded queue grows until memory runs out and the OOM killer takes the proxy down along with everything else on the box.&lt;&#x2F;p&gt;
&lt;p&gt;None of these are exotic. If you have operated proxies at any real scale, you have seen most of them. And every time, the root cause pointed at the same structural issue: the proxy was optimized for something other than what I actually needed from it.&lt;&#x2F;p&gt;
&lt;p&gt;That is not a criticism. It is a statement about time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;every-proxy-was-built-for-its-era&quot;&gt;Every proxy was built for its era&lt;&#x2F;h2&gt;
&lt;p&gt;HAProxy was born in 2000. Willy Tarreau built it to solve a specific, urgent problem: distributing TCP connections across a pool of backend servers. The internet was scaling fast. Sites needed load balancing. HAProxy did that one job with extraordinary precision, and twenty-five years later it still does. It is one of the most reliable pieces of infrastructure software ever written. But it was built as a load balancer, and when you need it to do security enforcement, you are extending a load balancer.&lt;&#x2F;p&gt;
&lt;p&gt;Nginx arrived in 2004. Igor Sysoev was solving the C10K problem: how do you handle 10,000 concurrent connections without the process-per-connection model falling over? Nginx was built as a web server. An event-driven architecture that served static files and handled connections with remarkable efficiency. The reverse proxy capability came later, almost as a side effect of how well it handled connections, and eventually became one of its most important use cases. But the core was always a web server. The assumptions about configuration, about reload behavior, about how modules interact with the request path, those assumptions come from web serving.&lt;&#x2F;p&gt;
&lt;p&gt;Varnish showed up in 2006. Poul-Henning Kamp built it because dynamic web pages were slow and caching was the answer. Varnish sat in front of your web server, cached responses in memory, and served them fast. That was the whole job. A caching proxy, and a beautiful one.&lt;&#x2F;p&gt;
&lt;p&gt;Envoy was born at Lyft around 2016. Microservices had created a new problem: how do you route, observe, and control traffic between hundreds of internal services that come and go? The service mesh was the answer, and Envoy was the data plane. It brought observability, retries, circuit breaking, and policy enforcement to a world where the network topology was no longer something you could draw on a whiteboard and expect to remain accurate for more than a week.&lt;&#x2F;p&gt;
&lt;p&gt;Traefik arrived in the same era, optimized for automatic service discovery in container environments. Services appear and disappear. The proxy figures out the routing on its own.&lt;&#x2F;p&gt;
&lt;p&gt;Every single one of these was the right tool at the right time. They solved the problem that mattered most when they were built, and they solved it well. I used most of them. I admired most of them. I am not here to argue that any of them were wrong.&lt;&#x2F;p&gt;
&lt;p&gt;But I am here to say that their time shaped what they became, and so did the economics of how software was built.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-generality-trap&quot;&gt;The generality trap&lt;&#x2F;h2&gt;
&lt;p&gt;Here is something I noticed over years of operating these tools: they all converge.&lt;&#x2F;p&gt;
&lt;p&gt;Nginx adds load balancing. HAProxy adds HTTP&#x2F;2 and Lua scripting. Envoy adds caching, WAF capabilities, ext_proc for external processing. Traefik adds middleware chains. Every proxy, over time, becomes a Swiss army knife.&lt;&#x2F;p&gt;
&lt;p&gt;This is not a design failure. It is an economic inevitability.&lt;&#x2F;p&gt;
&lt;p&gt;Building a production-grade reverse proxy takes a team, sometimes a large one, working over many years. Nginx took Igor years before it was production-ready. Envoy is maintained by hundreds of contributors across multiple organizations. HAProxy has been continuously refined for a quarter of a century by one of the most skilled systems programmers alive.&lt;&#x2F;p&gt;
&lt;p&gt;When the cost of building software is that high, you need the result to serve a broad market. You cannot afford to optimize for one narrow concern. You need your proxy to be useful to web servers and API gateways and service meshes and CDN edges and everything in between. The economic pressure pushes relentlessly toward generality. Toward one more feature. Toward covering one more use case. Toward becoming the tool that everyone can use, even if nobody uses it for exactly the thing they wish it was designed for.&lt;&#x2F;p&gt;
&lt;p&gt;This creates a particular kind of friction. You adopt a proxy for its core strength, and then you spend years working around its assumptions in every other area. You chose Nginx because it handles connections well, but now you are fighting its reload model and its embedded Lua modules that share a fate with the worker process. You chose Envoy because it observes service traffic brilliantly, but now you are wrestling with an xDS configuration surface that could fill a textbook, and a C++ codebase where memory safety is a matter of programmer discipline rather than compiler guarantee.&lt;&#x2F;p&gt;
&lt;p&gt;I lived in that friction for a long time. I knew what I wanted. I could describe it in detail to anyone who would listen. But wanting a purpose-built proxy and building one are different things when building one requires a team and years of sustained effort.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-kept-wishing-for&quot;&gt;What I kept wishing for&lt;&#x2F;h2&gt;
&lt;p&gt;After enough 3 AM incidents and enough post-mortems, the shape of what I was looking for became concrete enough to write down.&lt;&#x2F;p&gt;
&lt;p&gt;I wanted a reverse proxy where the operator can reason about what will happen under any condition. Including conditions they did not anticipate. That is the whole thesis. Everything else follows from it.&lt;&#x2F;p&gt;
&lt;p&gt;When you are on call, “reason about what will happen” is not philosophical. It means concrete things.&lt;&#x2F;p&gt;
&lt;p&gt;It means every queue has a maximum depth. Every timeout is explicit and declared. Every connection pool has a ceiling. No unbounded allocations anywhere. If you set a body size limit of 10 MB, that is a hard limit, not a suggestion. If a security agent’s concurrency is capped at 100, the 101st request gets the configured failure mode, not a silent queue that grows until the box dies.&lt;&#x2F;p&gt;
&lt;p&gt;It means every route declares what happens when a security agent is unreachable. Not a global toggle. Per route, per agent. Your API fails closed when the WAF is down (deny everything, because your API handles sensitive data and you do not want it exposed without inspection). Your marketing site fails open (allow traffic, log the gap, because a few minutes of unfiltered marketing pages is better than a full outage). You decide. You write it down. The system enforces it.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;kdl&quot; class=&quot;language-kdl &quot;&gt;&lt;code class=&quot;language-kdl&quot; data-lang=&quot;kdl&quot;&gt;agents {
    agent &amp;quot;waf&amp;quot; {
        transport { unix-socket &amp;quot;&amp;#x2F;var&amp;#x2F;run&amp;#x2F;zentinel-waf.sock&amp;quot; }
        events &amp;quot;request-headers&amp;quot; &amp;quot;request-body&amp;quot;
        timeout-ms 50
        max-concurrent-calls 100
        failure-mode &amp;quot;closed&amp;quot;

        circuit-breaker {
            failure-threshold 5
            success-threshold 3
            timeout-seconds 30
        }
    }
}

routes {
    route &amp;quot;api&amp;quot; {
        priority 100
        matches { path-prefix &amp;quot;&amp;#x2F;api&amp;#x2F;&amp;quot; }
        upstream &amp;quot;backend&amp;quot;
        filters &amp;quot;waf&amp;quot;
        failure-mode &amp;quot;closed&amp;quot;
    }

    route &amp;quot;marketing&amp;quot; {
        priority 50
        matches { path-prefix &amp;quot;&amp;#x2F;&amp;quot; }
        upstream &amp;quot;static-backend&amp;quot;
        filters &amp;quot;waf&amp;quot;
        failure-mode &amp;quot;open&amp;quot;
    }
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It means every security decision gets a trace ID and ends up in structured logs. When someone asks “why was this request blocked at 3:47 AM?”, you can answer with a correlation ID and a full trace: which agent decided, which rule matched, how long it took. Not “the WAF blocked it, probably.” The actual chain of events.&lt;&#x2F;p&gt;
&lt;p&gt;It means configuration reloads are atomic. You send SIGHUP, the new configuration is parsed, validated, and swapped in. In-flight requests finish on the old config. New requests pick up the new one. No window where half the routes are on the old version and half on the new.&lt;&#x2F;p&gt;
&lt;p&gt;I could describe all of this clearly. I had been able to for years. Describing what you want and having the means to build it are different things.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-insight-about-isolation&quot;&gt;The insight about isolation&lt;&#x2F;h2&gt;
&lt;p&gt;The single biggest realization I had was about failure domains.&lt;&#x2F;p&gt;
&lt;p&gt;In every proxy I had operated, extension logic ran inside the proxy process. Nginx has embedded Lua via OpenResty. HAProxy has SPOE and Lua. Envoy has Wasm filters and ext_proc. They all share the same structural problem: the extension and the proxy share a fate.&lt;&#x2F;p&gt;
&lt;p&gt;If your Lua WAF script enters an infinite loop in nginx, the nginx worker is stuck. If your Wasm filter in Envoy allocates too much memory, the Envoy process pays for it. A slow SPOE agent in HAProxy backs pressure into the proxy’s request handling. I have been on the receiving end of all three patterns, and they all end the same way: you are awake at 3 AM trying to figure out why your entire proxy fleet is degraded because one security module is having a bad day.&lt;&#x2F;p&gt;
&lt;p&gt;This is the classic shared-fate problem. When everything runs in one process, everything fails together. A slow WAF does not just slow down WAF-protected routes. It consumes worker resources that affect all routes. A memory leak in an auth filter does not just take down auth. It takes down the process.&lt;&#x2F;p&gt;
&lt;p&gt;The answer I kept coming back to was process isolation. Not as a compromise or a workaround, but as the foundational design principle. Security and policy logic should live in separate processes, each with its own memory, its own concurrency limits, and its own circuit breaker.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo diagram&quot;&gt;&lt;code data-lang=&quot;diagram&quot; data-title=&quot;Zentinel agent isolation&quot;&gt;┌─────────────────────┐
             │   Proxy Core        │
             │   (Rust &amp;#x2F; Pingora)  │
             └──────────┬──────────┘
                        │
          ┌─────────────┼─────────────┐
          │             │             │
 ┌────────▼───────┐ ┌──▼──────────┐ ┌▼───────────────┐
 │  WAF Agent     │ │ Auth Agent  │ │ Custom Agent   │
 │  semaphore: 100│ │ semaphore:50│ │ semaphore: 25  │
 │  timeout: 50ms │ │ timeout:30ms│ │ timeout: 200ms │
 │  fail: closed  │ │ fail: closed│ │ fail: open     │
 └────────────────┘ └─────────────┘ └────────────────┘

 Each agent: own process, own memory, own circuit breaker.
 Slow WAF ≠ slow auth. Crashed agent ≠ crashed proxy.&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If the WAF agent gets slow, the WAF agent’s semaphore fills up. The auth agent keeps running on its own semaphore. The proxy core keeps routing. Nobody shares a failure domain unless you explicitly configure them to.&lt;&#x2F;p&gt;
&lt;p&gt;This is not just about crash isolation, though that matters. The deeper point is queue isolation. In a shared-process model, a slow filter creates backpressure that affects all traffic. With process isolation, a slow agent only affects the routes that use it, and only up to the concurrency limit you configured. The blast radius is bounded and declared. You can look at the config and know the worst case.&lt;&#x2F;p&gt;
&lt;p&gt;The agents communicate over Unix domain sockets or gRPC, and they can be written in any language. There are SDKs for Rust, Go, Python, TypeScript, Elixir, Kotlin, and Haskell. The protocol is simple: 4-byte length, 1-byte type, JSON or MessagePack payload.&lt;&#x2F;p&gt;
&lt;p&gt;The operational consequence is what I care about most: you can deploy, restart, and update agents independently. Roll out a new WAF rule set without touching the proxy. Restart a misbehaving auth agent without dropping a single connection. When you are trying to fix one thing at 3 AM without breaking three others, that independence matters more than any benchmark number.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-product-of-this-moment&quot;&gt;A product of this moment&lt;&#x2F;h2&gt;
&lt;p&gt;Every piece of software is shaped by when it was built.&lt;&#x2F;p&gt;
&lt;p&gt;The proxies I described earlier were products of their time not just in what problems they solved, but in how they could be built. Nginx needed Igor working for years. Envoy needed Google, Lyft, and a large open source community. HAProxy needed Willy Tarreau’s decades of sustained refinement. That was the only way to build infrastructure of that quality. One person, or even a small team, could not realistically build a production-grade reverse proxy with a novel architecture and ship it in months. The economics did not allow it.&lt;&#x2F;p&gt;
&lt;p&gt;That structural reality is changing, and I think the change matters more than most people in infrastructure have absorbed yet.&lt;&#x2F;p&gt;
&lt;p&gt;I wrote about the broader shift in &lt;a href=&quot;&#x2F;articles&#x2F;how-i-work-these-days&#x2F;&quot;&gt;How I Work These Days&lt;&#x2F;a&gt;. The short version: I had been using Claude Code since May 2025, but it was not until Christmas 2025, working with Opus 4.5, that something fundamentally clicked. The constraint I had lived with for years, the gap between knowing exactly what I wanted and having the bandwidth to build it, narrowed in a way that I still find hard to fully describe. Not because the model wrote the code for me. But because the feedback loop between design intent and working implementation compressed from weeks to hours.&lt;&#x2F;p&gt;
&lt;p&gt;When Cloudflare open-sourced Pingora in 2024, I had paid attention immediately. A proxy framework written in Rust, battle-tested at over a trillion requests per day in Cloudflare’s own network. The TCP listener, the HTTP parser, the TLS termination, the connection pooling, the async runtime. All the low-level machinery that you do not want to write from scratch. I had watched River, the community Pingora-based reverse proxy, hoping it would become the thing I could reach for and trust. It never got there.&lt;&#x2F;p&gt;
&lt;p&gt;So I stopped waiting and started building. Pingora as a foundation. Rust for memory safety at the boundary. An agentic workflow that let one person move at the pace of a small team.&lt;&#x2F;p&gt;
&lt;p&gt;What came out was not a general-purpose proxy. It was not a Swiss army knife. It was a purpose-built tool, tailored from the start to one specific problem: safe, observable, operatable edge traffic enforcement. No more, no less.&lt;&#x2F;p&gt;
&lt;p&gt;This is the part I think matters beyond Zentinel itself: when the cost of building serious software drops dramatically, you can afford to be specialized. You do not need to serve a broad market to justify the investment. You can build exactly the thing that solves exactly your problem, with exactly the tradeoffs you want. No feature creep driven by needing to justify a twenty-person team. No compromises driven by needing to appeal to every possible use case.&lt;&#x2F;p&gt;
&lt;p&gt;I think of it as bespoke infrastructure. Software that is tailored to a specific problem by someone who deeply understands that problem, made viable by tools that compress the gap between “I know exactly what this should be” and “it exists.” The same way that a load balancer was the right thing to build in 2000, and a service mesh data plane was the right thing to build in 2016, a purpose-built safe reverse proxy is the right thing to build in 2026. Not because the world suddenly needs one more proxy, but because the world can finally have proxies that are designed for specific jobs instead of being general enough to justify their development cost.&lt;&#x2F;p&gt;
&lt;p&gt;Zentinel is a post-agentic reverse proxy. Not because it uses AI internally (it does not, unless you count the inference-aware rate limiting for LLM traffic). But because it could only exist in a world where agentic development made bespoke infrastructure viable. One person. Three months. A clear vision that had been accumulating for years. That is not a story that was possible to tell in 2020.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;glass-box-infrastructure&quot;&gt;Glass-box infrastructure&lt;&#x2F;h2&gt;
&lt;p&gt;There is a conviction behind Zentinel that is not technical, and I want to be honest about it.&lt;&#x2F;p&gt;
&lt;p&gt;I believe critical web infrastructure should be open. Not “open core” with the important parts behind a license. Not “source available” with restrictions on how you run it. Open in the way that matters: you can read it, fork it, modify it, run it on your own hardware, never call anyone for permission.&lt;&#x2F;p&gt;
&lt;p&gt;Zentinel is Apache-2.0-licensed. Every agent is open source. The configuration format is documented. The protocol is specified. There is no hidden control plane, no phone-home telemetry, no vendor dependency.&lt;&#x2F;p&gt;
&lt;p&gt;But open source alone is not what I mean by transparent. Plenty of projects publish their source and remain effectively opaque. The code is there, technically, but understanding what a particular configuration will actually do still requires reading thousands of lines of parser logic, or just deploying it and hoping for the best.&lt;&#x2F;p&gt;
&lt;p&gt;This is where the Rust decision pays off in a way I did not fully anticipate when I started.&lt;&#x2F;p&gt;
&lt;p&gt;Because Zentinel is written in Rust, the core crates compile to WebAssembly. Not as a side project or a reimplementation. The same &lt;code&gt;zentinel-config&lt;&#x2F;code&gt; crate that parses and validates your KDL configuration in production compiles to a Wasm module that runs in a browser tab.&lt;&#x2F;p&gt;
&lt;p&gt;This means the &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&#x2F;playground&#x2F;&quot;&gt;config playground&lt;&#x2F;a&gt; on the Zentinel website is not a JavaScript approximation of what the parser does. It is the parser. The actual Rust code, compiled to Wasm, running against your configuration in real time. When it says your config is valid, that is the same validation logic that will run when Zentinel starts up on your server. When it flags an error, that is the same error you would see in production.&lt;&#x2F;p&gt;
&lt;p&gt;The same applies to the &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&#x2F;converter&#x2F;&quot;&gt;config converter&lt;&#x2F;a&gt;. If you are migrating from nginx, HAProxy, or Traefik, the conversion tool runs the actual Zentinel config crate in your browser. You paste your existing config, you get KDL output, and you can validate the result on the spot. No round-trip to a server. No “upload your infrastructure config to our cloud service.” It runs locally, in your browser, using the production code.&lt;&#x2F;p&gt;
&lt;p&gt;This matters more than it might sound. When you can run the same code that your proxy runs in production, right in your browser, you can reason about the proxy’s internal behavior directly. You are not trusting documentation about how the parser interprets a particular KDL construct. You are running the parser. You are not guessing what happens when two routes have the same priority. You are watching the actual matching logic evaluate your routes.&lt;&#x2F;p&gt;
&lt;p&gt;The proxy becomes glass-like. Not transparent in the “we published the source, good luck reading it” sense. Transparent in the sense that you can interact with its internals, poke at its logic, verify its behavior before it ever touches production traffic. The same Rust, the same types, the same validation rules, running wherever you need them: on the server, in CI, in your browser.&lt;&#x2F;p&gt;
&lt;p&gt;That is what I mean by trustworthy infrastructure. Not “trust us, it works.” Trust it because you can verify it yourself, using the same code, without asking anyone for permission.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-this-leaves-me&quot;&gt;Where this leaves me&lt;&#x2F;h2&gt;
&lt;p&gt;Every generation of proxies solved the problem of its era. HAProxy solved load balancing. Nginx solved web serving. Envoy solved service mesh routing. Varnish solved caching. Each was the right answer at the right time, and each was shaped by what was possible when it was built.&lt;&#x2F;p&gt;
&lt;p&gt;The problem I kept running into was different. Not throughput. Not feature count. Not service discovery. Just: can I understand what this system will do at 3 AM when something I did not plan for happens? Can I reason about its failure modes from reading its configuration? Can I trust it enough to sleep?&lt;&#x2F;p&gt;
&lt;p&gt;No existing proxy was designed from scratch for that question, because no existing proxy could afford to be that specialized. The economics of building infrastructure software pushed everything toward generality.&lt;&#x2F;p&gt;
&lt;p&gt;What changed is that the economics changed. One person, with the right foundation and the right tools, can now build purpose-built infrastructure that would have required a team and a multi-year roadmap before.&lt;&#x2F;p&gt;
&lt;p&gt;Zentinel is the proxy I needed and could not find, built in the specific window of time when building it became possible. It is a product of this moment, in the same way that every proxy before it was a product of its own.&lt;&#x2F;p&gt;
&lt;p&gt;And maybe that is what this era of software is really about. Not that AI writes code for you. That framing misses the point entirely. It is that the gap between knowing what should exist and making it exist got smaller, and the things people build when that gap closes are going to be very specific, very opinionated, and very good at exactly one thing.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references-and-further-reading&quot;&gt;References and further reading&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;zentinelproxy&#x2F;zentinel&quot;&gt;Zentinel&lt;&#x2F;a&gt; - The source code, Apache-2.0-licensed&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.zentinelproxy.io&#x2F;&quot;&gt;Zentinel documentation&lt;&#x2F;a&gt; - Architecture, configuration reference, agent protocol&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&#x2F;playground&#x2F;&quot;&gt;Zentinel playground&lt;&#x2F;a&gt; - Browser-based config validation using the actual Rust crate compiled to Wasm&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&#x2F;converter&#x2F;&quot;&gt;Zentinel config converter&lt;&#x2F;a&gt; - Migrate from nginx, HAProxy, or Traefik configs to KDL&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&#x2F;manifesto&#x2F;&quot;&gt;Zentinel manifesto&lt;&#x2F;a&gt; - The design philosophy in full&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;cloudflare&#x2F;pingora&quot;&gt;Pingora&lt;&#x2F;a&gt; - Cloudflare’s open source proxy framework that Zentinel builds on&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;&#x2F;articles&#x2F;how-i-work-these-days&#x2F;&quot;&gt;How I Work These Days&lt;&#x2F;a&gt; - The broader shift in how I build software, and where Zentinel fits in that story&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;www.haproxy.org&#x2F;&quot;&gt;HAProxy&lt;&#x2F;a&gt; - Willy Tarreau’s load balancer, still one of the best pieces of infrastructure software ever written&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nginx.org&#x2F;&quot;&gt;Nginx&lt;&#x2F;a&gt; - Igor Sysoev’s web server that became the internet’s default reverse proxy&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.envoyproxy.io&#x2F;&quot;&gt;Envoy&lt;&#x2F;a&gt; - The service mesh data plane that brought observability to distributed systems&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;varnish-cache.org&#x2F;&quot;&gt;Varnish&lt;&#x2F;a&gt; - Poul-Henning Kamp’s caching proxy&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>How I Work These Days</title>
          <pubDate>Sun, 15 Mar 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/how-i-work-these-days/</link>
          <guid>https://raskell.io/articles/how-i-work-these-days/</guid>
          <description xml:base="https://raskell.io/articles/how-i-work-these-days/">&lt;p&gt;If you had asked me a few years ago what kind of shift would truly change software again, I would probably have said something vague about machine learning becoming more useful, more accessible, more integrated into normal tooling. I would not have said that within a few years I would be spending large parts of my day in conversation with models, building products at a pace that used to feel unrealistic for one person.&lt;&#x2F;p&gt;
&lt;p&gt;But that is where I am now, and the path here did not start with ChatGPT. It started earlier.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;before-the-shock&quot;&gt;Before the shock&lt;&#x2F;h2&gt;
&lt;p&gt;I had been on Kaggle and Hugging Face since 2020. I was already paying attention. I had a decent understanding of machine learning, enough to know that something important was happening. I was not looking at this space as an outsider who suddenly discovered AI in a news cycle. I had been around it long enough to see that the ingredients were there.&lt;&#x2F;p&gt;
&lt;p&gt;Still, understanding a field and feeling a historical shift are not the same thing. When OpenAI released ChatGPT 3.5 in November 2022, something in me changed almost immediately. I do not mean that in a mystical way. I mean I recognized, very quickly, that this was not just another incremental product launch. It felt like a boundary marker, one of those moments where you can see a new layer of the technology stack forming in front of you.&lt;&#x2F;p&gt;
&lt;p&gt;At the time, I thought: this is going to be enormous. Bigger than most people realize. Bigger, maybe, than the web itself in terms of how deeply it will alter the shape of work, software, and the distribution of capability. That sounds exaggerated when people say it too casually. I know that. But that was honestly my reaction back then. Not hype. Recognition.&lt;&#x2F;p&gt;
&lt;p&gt;I was one of the early people willing to pay OpenAI. That mattered to me. I wanted access, and I wanted to stay close to the frontier as it moved. I did not want to be reading second-hand summaries while something this consequential was taking shape in real time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-part-i-still-underestimated&quot;&gt;The part I still underestimated&lt;&#x2F;h2&gt;
&lt;p&gt;Even then, I still underestimated one thing: the speed. I understood generative AI was around the corner. I did not understand just how fast it would become operationally useful for actual software creation.&lt;&#x2F;p&gt;
&lt;p&gt;I had read AI 2027, the scenario work by Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean. It stayed with me, and it still does now in March 2026. I still think the basic direction it sketches is largely correct, even if the path is turning out a bit differently in practice than any single forecast can capture. But even with that framing in my head, I did not fully expect that less than two years after ChatGPT 3.5, coding agents would already start to feel like a real category rather than a novelty.&lt;&#x2F;p&gt;
&lt;p&gt;That part came faster than I thought.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;paying-attention-waiting-for-the-right-moment&quot;&gt;Paying attention, waiting for the right moment&lt;&#x2F;h2&gt;
&lt;p&gt;In 2025 I was trying different tools seriously. I was paying for Claude Code from May 2025 onward, but I was not especially impressed at first. I liked the CLI orientation. I liked that it felt coder-friendly. That part made sense to me immediately. But the model quality at that moment, and the rate limiting, left me cold. It was interesting. It was not yet transformative for my own workflow.&lt;&#x2F;p&gt;
&lt;p&gt;So I mostly leaned on Zed’s offering. I liked the editor experience, and I still do. I still reach for Zed when I want to inspect, edit, or move through files outside of Vim in the terminal. It fit me better in that phase.&lt;&#x2F;p&gt;
&lt;p&gt;Then November 2025 arrived, three years after ChatGPT 3.5, and then December came with Christmas break. I gave Claude Code another real try, this time with Opus 4.5, and that was the moment it really landed for me. Not politely. Not academically. It hit me hard.&lt;&#x2F;p&gt;
&lt;p&gt;I remember the feeling very clearly: this is it. This is the first time the whole thing feels like more than an assistant and less than a gimmick. This is a tool I can genuinely build with.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;zentinel-was-the-proof&quot;&gt;Zentinel was the proof&lt;&#x2F;h2&gt;
&lt;p&gt;Zentinel had been sitting in the back of my mind for a long time. The idea was not new. The frustration behind it was not new either. At my day job, we had been dealing with unreliable reverse proxies for long enough that the pain was familiar. I had always wanted River, the Pingora-based reverse proxy, to succeed. I wanted that project to become the thing I could reach for and trust. But it never got there.&lt;&#x2F;p&gt;
&lt;p&gt;So I did what this new moment suddenly made possible: I stopped waiting for somebody else to build the thing I wanted to exist.&lt;&#x2F;p&gt;
&lt;p&gt;I built Zentinel with Opus 4.5, and it worked.&lt;&#x2F;p&gt;
&lt;p&gt;That is the part that still feels a little surreal when I say it plainly. Three months later it is up, it is real, and people are using it. Not as a demo. Not as an abandoned prototype. As actual software in the hands of actual users.&lt;&#x2F;p&gt;
&lt;p&gt;That changed something fundamental in how I think about work. Once one long-held idea made it through that bottleneck, a lot of others started moving too. It was not just that I had a new tool. It was that the relationship between ambition and execution had changed. The old constraint, the one that said “yes, this could exist, but not with your current time and current bandwidth,” had weakened.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;then-everything-else-started-moving&quot;&gt;Then everything else started moving&lt;&#x2F;h2&gt;
&lt;p&gt;In the time since, I have built a whole range of things that had been accumulating in my head for years: Cyanea, Archipelag, Humankind, Arcanist and its Rust-based &lt;code&gt;hx&lt;&#x2F;code&gt; Haskell toolchain, the new Basel Haskell Compiler, and other pieces besides.&lt;&#x2F;p&gt;
&lt;p&gt;It has been a wild ride, but not in the shallow “everything is crazy” sense people often write about. More in the sense that an internal dam broke. For a long time I had more ideas than I had time, more design clarity than I had execution bandwidth, and more conviction than I had manpower. That is a frustrating place to live in for years. You learn to carry around a quiet backlog of unrealized things. Some of them stay alive as notes. Some become recurring thoughts during commutes or late at night. Some start to hurt a little because you know they are viable, but you also know you are not going to get to them with the tools and energy available to you at that point in your life.&lt;&#x2F;p&gt;
&lt;p&gt;Now the world is different. I can finally push through the backlog that used to exist only in notebooks, mental sketches, half-written design docs, and conversations with myself.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-my-days-look-like-now&quot;&gt;What my days look like now&lt;&#x2F;h2&gt;
&lt;p&gt;My daily routine changed dramatically.&lt;&#x2F;p&gt;
&lt;p&gt;I still have a day job, and then I have the rest of my work. Together it often adds up to something close to sixteen hours a day. That would sound bleak if I were forcing it. It does not feel bleak. It feels like release.&lt;&#x2F;p&gt;
&lt;p&gt;My workspace reflects that change. These days I use mostly Apple hardware, which is funny if you know how much of a Linux person I am, and how much of an OpenBSD person I still am. That part of me has not gone anywhere. I still love those systems. I still think they matter deeply. But if I am being honest about the practical question of where I am most productive right now, Apple has become the answer.&lt;&#x2F;p&gt;
&lt;p&gt;I work across a MacBook Pro M4, an iMac, and an Apple Vision Pro. I spend time talking ideas out in real time with ChatGPT’s voice mode, using it less like a search engine and more like a sparring partner. Sometimes intimate, sometimes brutally honest, sometimes audacious in exactly the way a good thinking partner should be. I push an idea, it pushes back, I sharpen it, it sharpens me, and we keep going until something solid emerges.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;how-i-work-these-days-desk.avif&quot; alt=&quot;My desk setup right now&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Then there is the terminal, where much of the actual implementation happens in Ghostty, usually with four panes open, often with multiple Claude Code agents running in parallel on the Max plan. Two hundred dollars a month for that level of leverage is, for me, one of the clearest trades I have ever made.&lt;&#x2F;p&gt;
&lt;p&gt;This is not a lifestyle performance. It is just the current shape of my work: ideas moving between speech, terminal, editor, design notes, code, back to speech, then back to code again.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;beast-mode-for-lack-of-a-better-term&quot;&gt;Beast mode, for lack of a better term&lt;&#x2F;h2&gt;
&lt;p&gt;There is a part of this that feels almost embarrassingly direct to say, but it would be dishonest to leave it out: I am the happiest I have ever been.&lt;&#x2F;p&gt;
&lt;p&gt;Not because everything is easy. It is not. Not because every project succeeds. They will not. Not because the industry suddenly became sane. It did not.&lt;&#x2F;p&gt;
&lt;p&gt;I am happy because the mismatch that used to define so much of my working life has narrowed. For years I had to live with the feeling that my ideas were outrunning my available hours and my available hands. Now, for the first time, it feels like I can actually meet myself where my ambition has been waiting.&lt;&#x2F;p&gt;
&lt;p&gt;There is a phrase people use, “beast mode,” and usually I would avoid it because it sounds like posturing. But I do not really have a cleaner shorthand for the intensity of this period. I am working hard, very hard, but with a degree of joy and clarity that makes the effort feel proportionate.&lt;&#x2F;p&gt;
&lt;p&gt;I am in conquest mode.&lt;&#x2F;p&gt;
&lt;p&gt;Not conquest in the empty startup sense. Not domination, not vanity metrics, not growth for its own sake. I mean conquest over the inertia that used to keep good ideas trapped inside my head. Conquest over backlog. Conquest over hesitation. Conquest over the old excuses about lacking time, lacking team, lacking the right moment.&lt;&#x2F;p&gt;
&lt;p&gt;And yes, some of it is for me. To feel better. To feel more whole. To stop carrying around years of deferred execution. But some of it is also because I genuinely want to make useful things. I want to build software that improves the texture of work, that makes systems more reliable, that gives people better tools, that opens up possibilities that were previously too expensive or too cumbersome to pursue.&lt;&#x2F;p&gt;
&lt;p&gt;That still matters to me. Probably more than ever.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-real-change&quot;&gt;The real change&lt;&#x2F;h2&gt;
&lt;p&gt;So when I say that the way I work these days has changed, I do not just mean that I use different tools.&lt;&#x2F;p&gt;
&lt;p&gt;I mean that the relation between thought and execution changed. The lag collapsed. The emotional burden of unrealized ideas shrank. The number of things that are now viable to attempt expanded dramatically. That is the real story.&lt;&#x2F;p&gt;
&lt;p&gt;I was already paying attention in 2020. I recognized the significance of ChatGPT in 2022. I underestimated the speed anyway. Then late 2025 arrived, the tools crossed a threshold, and my daily life reorganized itself around that fact.&lt;&#x2F;p&gt;
&lt;p&gt;Three years is not a long time. It feels longer when you live through a real transition.&lt;&#x2F;p&gt;
&lt;p&gt;And I suspect we are still early.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references-and-further-reading&quot;&gt;References and further reading&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ai-2027.com&#x2F;&quot;&gt;AI 2027&lt;&#x2F;a&gt; - Scenario work that influenced how I thought about the trajectory of this space&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zed.dev&#x2F;&quot;&gt;Zed&lt;&#x2F;a&gt; - Editor I still use alongside Vim&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ghostty.org&#x2F;&quot;&gt;Ghostty&lt;&#x2F;a&gt; - Terminal I use for most of my agent-heavy coding sessions&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&#x2F;&quot;&gt;Zentinel&lt;&#x2F;a&gt; - Reverse proxy project that became the first real proof point for this workflow&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cyanea.bio&#x2F;&quot;&gt;Cyanea&lt;&#x2F;a&gt; - One of the projects that came to life during this period&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&#x2F;&quot;&gt;Archipelag&lt;&#x2F;a&gt; - Another product that moved from idea to reality in this new working mode&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>Archipelag.io Is in Open Beta: Here&#x27;s Why I Built It</title>
          <pubDate>Fri, 13 Mar 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/archipelag-io-distributed-compute-from-mining-rigs-to-open-beta/</link>
          <guid>https://raskell.io/articles/archipelag-io-distributed-compute-from-mining-rigs-to-open-beta/</guid>
          <description xml:base="https://raskell.io/articles/archipelag-io-distributed-compute-from-mining-rigs-to-open-beta/">&lt;p&gt;There is an abandoned factory building in Glarus, a small town wedged between mountains in eastern Switzerland. In 2016, the building was loud. Not machinery-loud, fan-loud. Rows of bare motherboards bolted to open-air frames, each bristling with GPUs and daisy-chained power supplies. The air tasted like warm dust and ozone. Cables ran everywhere, held in place by zip ties and optimism. This was an Ethereum mining operation, and I was standing in the middle of it, watching people I knew convert their gaming rigs, hardware they loved, into money-printing machines.&lt;&#x2F;p&gt;
&lt;p&gt;I was there because Vitalik Buterin had decided to visit. He had flown in on a private jet to Geneva, driven up in a black limousine with tinted windows, and walked into this dusty, chaotic space to see what Swiss miners were building. It was surreal. The creator of Ethereum, stepping over power cables in an industrial ruin, nodding at rack after rack of GPUs humming away at proof-of-work hashes. I do not think he was impressed by the elegance of the setup. Nobody was. But something about that scene stuck with me.&lt;&#x2F;p&gt;
&lt;p&gt;People were willing to sacrifice their gaming entertainment, their &lt;em&gt;leisure hardware&lt;&#x2F;em&gt;, to chase the dream of sovereign financial independence using fundamentally nerdy equipment: PCs, internet connections, blockchain protocols, and GPU graphics cards. They were converting consumer-grade technology into economic infrastructure, and they were doing it themselves. No data center leases. No vendor contracts. No permission from anyone. Just people, hardware, and a protocol that made it worth their while.&lt;&#x2F;p&gt;
&lt;p&gt;I had skin in the game too. I invested (gambled, honestly) in crypto during that era. I watched the charts, rode the swings, felt the dopamine spikes and the stomach-drops. The financial side was wild and ultimately unsustainable for most people. But the &lt;em&gt;infrastructure&lt;&#x2F;em&gt; side, the part where ordinary humans turned their homes into compute nodes and got paid for it: that part was real, and that part stayed with me long after the crypto hype faded and the rigs went quiet.&lt;&#x2F;p&gt;
&lt;p&gt;This is the story of how that factory visit turned into &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&quot;&gt;Archipelag.io&lt;&#x2F;a&gt;, a distributed compute network that entered open beta today. It has been ten years of thinking, one year of building, and a lot of being wrong about the right things at the wrong time.&lt;&#x2F;p&gt;
</description>
      </item>
      <item>
          <title>How AI Makes Bare Metal Viable Again</title>
          <pubDate>Sun, 08 Mar 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/how-ai-makes-bare-metal-viable-again/</link>
          <guid>https://raskell.io/articles/how-ai-makes-bare-metal-viable-again/</guid>
          <description xml:base="https://raskell.io/articles/how-ai-makes-bare-metal-viable-again/">&lt;p&gt;I was paying over two hundred dollars a month to run two apps that had zero paying users.&lt;&#x2F;p&gt;
&lt;p&gt;Not because the apps were complex. Not because they needed high availability across regions. Because I was running Kubernetes on DigitalOcean, and Kubernetes has opinions about how much infrastructure you need. A control plane. Worker nodes. Load balancers. Persistent volumes. Managed databases. Each line item modest on its own, adding up to a bill that felt absurd for two Phoenix applications in their bootstrapping phase.&lt;&#x2F;p&gt;
&lt;p&gt;The apps are &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&quot;&gt;archipelag.io&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cyanea.bio&quot;&gt;cyanea.bio&lt;&#x2F;a&gt;. Both are Elixir&#x2F;Phoenix projects. Archipelag uses PostgreSQL and NATS for its messaging layer. Cyanea uses SQLite. Neither gets meaningful traffic yet. Both are real products I am actively building, not side projects I will abandon next month. But they are pre-revenue, and every dollar I spend on infrastructure is a dollar I am betting against future income that does not exist yet.&lt;&#x2F;p&gt;
&lt;p&gt;Something had to change.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-kubernetes-trap&quot;&gt;The Kubernetes trap&lt;&#x2F;h2&gt;
&lt;p&gt;Here is the thing about Kubernetes: it solves problems you might not have. If you are running fifty microservices across three regions with autoscaling requirements and a platform team to manage it, Kubernetes earns its keep. If you are running two BEAM applications that each consume less than 512 MB of memory, you are paying a complexity tax for infrastructure capabilities you will never touch.&lt;&#x2F;p&gt;
&lt;p&gt;My K8s setup on DigitalOcean looked like this: a managed cluster with two worker nodes (the minimum for any reasonable availability), a managed PostgreSQL instance for Archipelag, a load balancer for ingress, persistent volumes for Cyanea’s SQLite database. Each component had its own monthly cost. The cluster management fee alone was more than what I would eventually pay for an entire bare metal server.&lt;&#x2F;p&gt;
&lt;p&gt;The operational overhead was worse than the cost. Helm charts. Ingress controllers. Certificate managers. Pod disruption budgets. Every time I wanted to deploy a new version, I was wrangling YAML files that described infrastructure concerns my apps did not care about. A Phoenix release does not need a pod spec. It needs a port, an environment, and someone to restart it if it crashes.&lt;&#x2F;p&gt;
&lt;p&gt;And the YAML, my God, the YAML. A simple Phoenix app that listens on a port and serves HTTP needs, at minimum, a Deployment manifest, a Service manifest, and an Ingress manifest. Add a ConfigMap for environment variables, a Secret for credentials, a PersistentVolumeClaim if you need disk, a HorizontalPodAutoscaler if you want autoscaling. For Cyanea alone, I had six Kubernetes manifests totaling a few hundred lines of YAML, all to describe an application that boils down to: run this binary, give it a port, point a domain at it.&lt;&#x2F;p&gt;
&lt;p&gt;The cognitive load compounds. You learn the Kubernetes resource model, then the DigitalOcean-specific annotations for their load balancer, then the cert-manager CRDs for TLS, then the quirks of persistent volumes on managed K8s (spoiler: they are not as persistent as you think if you do not get the reclaim policy right). Each layer has its own documentation, its own failure modes, its own upgrade cycle. I spent more time debugging infrastructure than building product.&lt;&#x2F;p&gt;
&lt;p&gt;The irony is not lost on me. Kubernetes was designed for teams running hundreds of services at Google-scale. I was running two apps. The orchestrator had more moving parts than the things it was orchestrating. It was like hiring a logistics fleet to deliver two packages across town.&lt;&#x2F;p&gt;
&lt;p&gt;I knew I was over-engineered. But the alternative, at the time, seemed like a step backward.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-nomad-detour&quot;&gt;The Nomad detour&lt;&#x2F;h2&gt;
&lt;p&gt;I should mention that Kubernetes was never the only orchestrator I considered. For the past five years, while the industry went all-in on K8s, I had been quietly admiring HashiCorp’s &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.nomadproject.io&#x2F;&quot;&gt;Nomad&lt;&#x2F;a&gt;. Where Kubernetes is a sprawling ecosystem of CRDs, operators, and control loops, Nomad is refreshingly minimal. A single binary. A simple job spec. No opinions about networking, no built-in service mesh, no mandatory etcd cluster. You tell it what to run, it runs it.&lt;&#x2F;p&gt;
&lt;p&gt;That minimalism appealed to me. Nomad treats workload scheduling as the core problem and stays out of everything else. No built-in networking layer means you bring your own, which sounds like a drawback until you realize it means you are not locked into someone else’s networking model.&lt;&#x2F;p&gt;
&lt;p&gt;And I happened to have my own networking layer already. I had been building &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&#x2F;&quot;&gt;Zentinel&lt;&#x2F;a&gt; in parallel, a security-first reverse proxy built on Cloudflare’s &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;cloudflare&#x2F;pingora&quot;&gt;Pingora&lt;&#x2F;a&gt; framework in Rust. Zentinel handles TLS termination, WAF inspection, rate limiting, domain-based routing, all the edge concerns I care about. It also supports sleepable ops, where backend instances can be suspended and woken on demand, which is perfect for apps that do not need to be running 24&#x2F;7.&lt;&#x2F;p&gt;
&lt;p&gt;So I tried pairing them. Nomad for workload scheduling, Zentinel for the network layer. And it worked. The combination gave me a lightweight orchestrator that did not try to own every concern, paired with a reverse proxy that handled edge traffic the way I wanted. Two focused tools, each doing one thing well.&lt;&#x2F;p&gt;
&lt;p&gt;But then IBM acquired HashiCorp, and the calculus changed.&lt;&#x2F;p&gt;
&lt;p&gt;The acquisition itself was not the problem. Companies get acquired. It happens. The problem was the trajectory. HashiCorp had already re-licensed Terraform from MPL to BSL (Business Source License) in 2023, a move that fractured the community and spawned the &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;opentofu.org&#x2F;&quot;&gt;OpenTofu&lt;&#x2F;a&gt; fork. The pattern was familiar: open-source project gains adoption, company monetizes through enterprise features, company gets acquired, new owner tightens the screws. I had watched it happen with Redis, with Elasticsearch, with MongoDB. Each time the community forks, there is a period of uncertainty, split maintenance effort, and feature divergence.&lt;&#x2F;p&gt;
&lt;p&gt;I did not want to build my infrastructure on a foundation where the governance could shift at any time. Nomad is still open source today. But “still open source” and “will remain open source” are different statements, and after the Terraform situation, I was not confident in the latter. The BSL license change had been a signal, and IBM’s acquisition amplified it. I did not need to go down that road with another HashiCorp product.&lt;&#x2F;p&gt;
&lt;p&gt;The Nomad experiment did teach me something valuable, though. It confirmed that the KISS approach to deployment was right. You do not need the full Kubernetes machinery. A scheduler that starts processes, checks their health, and restarts them when they crash is sufficient for a wide range of workloads. And a dedicated reverse proxy that handles TLS and routing is cleaner than bundling networking into the orchestrator.&lt;&#x2F;p&gt;
&lt;p&gt;That insight, Nomad’s minimalism plus Zentinel’s Pingora-based proxy architecture, became the design seed for what I would eventually build.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-fly-io-middle-ground&quot;&gt;The fly.io middle ground&lt;&#x2F;h2&gt;
&lt;p&gt;With Nomad off the table as a long-term bet, I migrated to &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;fly.io&quot;&gt;fly.io&lt;&#x2F;a&gt; in late 2025. It was genuinely better than K8s for my use case. Fly understands BEAM applications at a fundamental level. The BEAM runtime is designed for the kind of lightweight, long-lived processes that Fly’s infrastructure optimizes for. You push a release, it runs it. No YAML. No ingress controllers. No cluster management.&lt;&#x2F;p&gt;
&lt;p&gt;Fly also made the service dependencies painless. Managed Postgres with a few commands. NATS was straightforward to set up. Tigris (Fly’s S3-compatible object storage) handled blob storage for Cyanea’s file uploads. The developer experience was genuinely excellent, and I mean that without reservation. The Fly team has built something thoughtful.&lt;&#x2F;p&gt;
&lt;p&gt;The cost dropped meaningfully. No cluster management fee. No minimum node count. Pay-per-VM pricing that scales down to fractions of a shared CPU. Fly’s model is honest about what small applications actually need, and the pricing reflects that. I went from over two hundred dollars a month on DigitalOcean K8s to roughly a quarter of that.&lt;&#x2F;p&gt;
&lt;p&gt;For a while, it was the right answer. And if I had been scaling horizontally, adding regions, needing the kind of elastic compute that cloud-native platforms excel at, I would have stayed. If my apps suddenly got traction and I needed instances in Tokyo, Frankfurt, and Virginia, Fly would be the obvious choice. The multi-region story is one of Fly’s genuine strengths. You deploy once, it runs everywhere. That is hard to replicate.&lt;&#x2F;p&gt;
&lt;p&gt;But I was not scaling horizontally. I was running two apps in one location. On a good day, they handled maybe a few hundred requests. The compute they needed was trivial, a fraction of a shared CPU core. And I was still paying for a platform designed to scale to thousands of instances across dozens of regions, even though I needed exactly one instance of each app, in exactly one place, doing very little work.&lt;&#x2F;p&gt;
&lt;p&gt;There is also a subtler cost that managed platforms carry: the abstraction tax. When something goes wrong on Fly (and it did, occasionally, things like deployment timeouts or the odd networking hiccup), you are debugging at the platform level, not the system level. You file a support ticket or check the status page. You do not SSH in and look at processes, because there are no processes you can see. The platform is the intermediary, and the intermediary has its own failure modes that you cannot inspect or fix.&lt;&#x2F;p&gt;
&lt;p&gt;The cloud-native model, even the lean version that Fly offers, has a floor. You are always paying for the platform’s capabilities, not just your usage of them. When your usage is “two small apps, one location, no scale,” that floor matters. And when the platform sits between you and your processes, you lose the ability to debug at the level where the answers actually live.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-bare-metal-math&quot;&gt;The bare metal math&lt;&#x2F;h2&gt;
&lt;p&gt;I started looking at dedicated servers. Not VPS instances, not cloud VMs. Actual hardware you can SSH into, where your processes run on real cores and your data sits on real disks.&lt;&#x2F;p&gt;
&lt;p&gt;Hetzner runs a &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.hetzner.com&#x2F;sb&#x2F;&quot;&gt;server auction&lt;&#x2F;a&gt; where they sell refurbished dedicated machines at steep discounts. These are servers that have been running in Hetzner’s data centers, got rotated out of customer contracts, and are resold at prices that make cloud compute look like a luxury good. The hardware is used but maintained, and Hetzner’s data centers are well-run, proper cooling, redundant power, good network connectivity.&lt;&#x2F;p&gt;
&lt;p&gt;I found a box with a multi-core Intel CPU, 128 GB of DDR4 RAM, and two 1 TB NVMe drives that I configured in RAID 1 for redundancy. EUR 38 a month. About forty-two dollars. Fixed price. No bandwidth metering (Hetzner includes 20 TB of traffic on dedicated servers, which for my workload might as well be unlimited). No surprises on the bill.&lt;&#x2F;p&gt;
&lt;p&gt;Let that sink in for a moment. For less than what I was paying for managed Postgres alone on either platform, I could have an entire server with more RAM than I know what to do with, fast NVMe storage with mirror redundancy, and enough compute headroom to run not two but twenty applications without breaking a sweat. The two NVMe drives alone, if bought retail, would cost more than a year of hosting.&lt;&#x2F;p&gt;
&lt;p&gt;I ran the numbers on capacity. My two Phoenix apps, even under load, would use maybe 1-2 GB of RAM combined. PostgreSQL with a modest dataset, another gig or two. NATS, negligible. That leaves well over 120 GB of RAM sitting idle. The CPU tells a similar story. Phoenix on the BEAM is remarkably efficient with CPU resources, the scheduler does its own preemptive multitasking across lightweight processes, and my workloads are I&#x2F;O-bound, not compute-bound. I could run my entire current stack and barely register on a load graph.&lt;&#x2F;p&gt;
&lt;p&gt;The headroom is the point. On a cloud platform, headroom costs money. More RAM, higher tier. More CPU, higher tier. On bare metal, the headroom is already paid for. Growing from two apps to ten does not change my monthly bill. Adding a staging environment does not change my monthly bill. Running background workers, a metrics stack, a CI runner, none of it changes my monthly bill. The marginal cost of additional workloads on existing hardware is zero.&lt;&#x2F;p&gt;
&lt;p&gt;The math was obvious. The problem was everything else.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-bare-metal-was-hard&quot;&gt;Why bare metal was hard&lt;&#x2F;h2&gt;
&lt;p&gt;Bare metal has always been cheap. That was never the issue. The issue was everything you had to build and maintain yourself.&lt;&#x2F;p&gt;
&lt;p&gt;On a managed platform, you get deployment pipelines, TLS certificate management, process supervision, reverse proxying, log aggregation, health checks, and rollback mechanisms out of the box. On bare metal, you get a Linux login prompt and a blinking cursor.&lt;&#x2F;p&gt;
&lt;p&gt;Historically, going bare metal for web applications meant weeks of setup:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Install and configure nginx or HAProxy as a reverse proxy&lt;&#x2F;li&gt;
&lt;li&gt;Set up Certbot or acme.sh for Let’s Encrypt certificates, and hope the renewal cron does not silently break&lt;&#x2F;li&gt;
&lt;li&gt;Write deployment scripts (rsync, symlinks, restart commands) and debug them for months&lt;&#x2F;li&gt;
&lt;li&gt;Configure systemd services for each app, with the right restart policies and environment files&lt;&#x2F;li&gt;
&lt;li&gt;Build a process supervision layer that handles crashes, port allocation, and graceful shutdowns&lt;&#x2F;li&gt;
&lt;li&gt;Figure out zero-downtime deploys (which means running two instances, health checking the new one, swapping traffic, draining the old one)&lt;&#x2F;li&gt;
&lt;li&gt;Set up log rotation, monitoring, backups&lt;&#x2F;li&gt;
&lt;li&gt;Harden the server (firewall, SSH config, automatic security updates)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Each of these is a solved problem in isolation. There are blog posts and Stack Overflow answers for every one of them. But stitching them together into a coherent, reliable deployment system is a full-time job for a week or two, and maintaining it is an ongoing tax on your attention.&lt;&#x2F;p&gt;
&lt;p&gt;This is why the cloud won. Not because bare metal is expensive. Because the operational cost of doing it yourself was prohibitive for small teams. The cloud sold you a package deal: we handle the infrastructure, you handle the application. Worth it, even at a premium.&lt;&#x2F;p&gt;
&lt;p&gt;But what if that operational cost dropped to near zero?&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-ai-shift&quot;&gt;The AI shift&lt;&#x2F;h2&gt;
&lt;p&gt;I had Claude Code with Opus 4.6 available. I had spent months working with it on other projects. Compilers, CRDT engines, reverse proxies. I knew what it could do with a clear spec and a well-defined problem domain.&lt;&#x2F;p&gt;
&lt;p&gt;And deploying web applications to bare metal is a well-defined problem domain.&lt;&#x2F;p&gt;
&lt;p&gt;The core requirements are straightforward: upload an artifact, start it on a port, check that it is healthy, route traffic to it, stop the old one. Everything else, TLS, process supervision, rollback, log capture, is layered on top of that core loop. The problem space is wide but shallow. Lots of features, few genuinely novel algorithms.&lt;&#x2F;p&gt;
&lt;p&gt;This is exactly the kind of work where AI shines. Not because it writes perfect code on the first try. But because it can iterate through a feature list at a pace that would take a solo developer weeks, producing working implementations in hours. The feedback loop is tight: describe what you want, get code, test it, refine. The domain knowledge exists in a thousand deployment tools that came before. The AI has seen all of them.&lt;&#x2F;p&gt;
&lt;p&gt;So I decided to build my own deployment tool. From scratch. With AI as my co-engineer.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;building-vela&quot;&gt;Building Vela&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;vela&quot;&gt;Vela&lt;&#x2F;a&gt; is what came out of that process. A single Rust binary that handles everything I listed above: reverse proxy, auto-TLS, process supervision, zero-downtime deploys, health checks, secret management, log streaming, rollbacks. No containers. No Docker. No YAML.&lt;&#x2F;p&gt;
&lt;p&gt;The design draws from both of its ancestors. From Nomad, the suckless philosophy: a single binary, minimal configuration, no opinions about things that are not its problem. From Zentinel, the Pingora-inspired proxy architecture: hyper-based reverse proxy with TLS termination, domain-based routing, and WebSocket support baked into the same process. Vela is what happens when you take the best ideas from tools you admire and combine them into something purpose-built for your exact workload.&lt;&#x2F;p&gt;
&lt;p&gt;The design philosophy is blunt: one binary, two modes.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;┌─────────────────────────────────────────────┐
│  Your server                                │
│                                             │
│  Vela daemon                                │
│  ├── Reverse proxy (:80&amp;#x2F;:443, auto-TLS)     │
│  ├── Process manager (start, health, swap)  │
│  └── IPC socket                             │
│                                             │
│  Apps                                       │
│  ├── cyanea.bio      → :10001              │
│  └── archipelag.io   → :10002              │
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│  Your laptop                                │
│                                             │
│  vela deploy  →  scp + ssh  →  server       │
└─────────────────────────────────────────────┘
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;vela serve&lt;&#x2F;code&gt; runs on the server. It is the reverse proxy, the process manager, and the IPC daemon, all in one process. &lt;code&gt;vela deploy&lt;&#x2F;code&gt; runs on your laptop. It reads a manifest, uploads your artifact over SSH, and tells the server to activate it.&lt;&#x2F;p&gt;
&lt;p&gt;SSH is the control plane. No tokens, no API keys, no custom authentication layer. If you can SSH into the server, you can deploy. This is a deliberate choice. SSH key management is a solved problem. Every developer already has it configured. Every server already has it running. Building a custom auth system on top would be adding complexity for no practical gain.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-manifest&quot;&gt;The manifest&lt;&#x2F;h3&gt;
&lt;p&gt;Each app gets a &lt;code&gt;Vela.toml&lt;&#x2F;code&gt; in its project root:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;toml&quot; class=&quot;language-toml &quot;&gt;&lt;code class=&quot;language-toml&quot; data-lang=&quot;toml&quot;&gt;[app]
name = &amp;quot;cyanea&amp;quot;
domain = &amp;quot;app.cyanea.bio&amp;quot;

[deploy]
server = &amp;quot;deploy@my-server&amp;quot;
type = &amp;quot;beam&amp;quot;
binary = &amp;quot;server&amp;quot;
health = &amp;quot;&amp;#x2F;health&amp;quot;
strategy = &amp;quot;sequential&amp;quot;
pre_start = &amp;quot;bin&amp;#x2F;cyanea eval &amp;#x27;Cyanea.Release.migrate()&amp;#x27;&amp;quot;

[env]
DATABASE_PATH = &amp;quot;${data_dir}&amp;#x2F;cyanea.db&amp;quot;
SECRET_KEY_BASE = &amp;quot;${secret:SECRET_KEY_BASE}&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is the entire deploy configuration. The app type tells Vela how to start it (&lt;code&gt;beam&lt;&#x2F;code&gt; runs Elixir releases, &lt;code&gt;binary&lt;&#x2F;code&gt; runs compiled executables). The health path tells it where to check. The strategy tells it how to swap traffic. The &lt;code&gt;pre_start&lt;&#x2F;code&gt; hook runs database migrations before the new instance starts, and if migrations fail, the deploy aborts and the old instance keeps running.&lt;&#x2F;p&gt;
&lt;p&gt;Environment variables support two substitution patterns: &lt;code&gt;${data_dir}&lt;&#x2F;code&gt; expands to the app’s persistent data directory (which survives deploys), and &lt;code&gt;${secret:KEY}&lt;&#x2F;code&gt; pulls from the server-side secret store. Secrets never live in your repo.&lt;&#x2F;p&gt;
&lt;p&gt;Deploying looks like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;MIX_ENV=prod mix release
vela deploy .&amp;#x2F;_build&amp;#x2F;prod&amp;#x2F;rel&amp;#x2F;cyanea
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Two commands. The artifact goes up, the health check passes, traffic swaps, done.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;zero-downtime-deploys&quot;&gt;Zero-downtime deploys&lt;&#x2F;h3&gt;
&lt;p&gt;Vela supports two deploy strategies, and the choice matters.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Blue-green&lt;&#x2F;strong&gt; is the default. The new instance starts alongside the old one on a fresh port. Vela runs a health check against it (30 retries, one per second, five-second timeout per attempt). Once the health check passes, the reverse proxy atomically swaps the route table entry for that domain to point at the new port. The old instance gets a configurable drain period to finish in-flight requests, then receives SIGTERM. If it does not exit within the drain window, SIGKILL.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;Time ──────────────────────────────────────────►

Old instance     ████████████████████░░░░  (draining)
New instance              ░░░░████████████████████
                          ▲   ▲
                     start │   │ health passes, swap
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Zero downtime. The user never sees a blip. This works for stateless apps and apps backed by PostgreSQL (where both instances can connect to the same database simultaneously).&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Sequential&lt;&#x2F;strong&gt; is for SQLite apps. You cannot have two processes writing to the same SQLite database (WAL mode helps, but concurrent writers from separate instances is asking for trouble). So Vela stops the old instance first, starts the new one, health checks it, and activates it. Sub-second blip. Acceptable for apps where the alternative is write contention.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;Time ──────────────────────────────────────────►

Old instance     ████████████████████
New instance                          ░░░░████████████████████
                                 ▲   ▲
                            stop │   │ start + health check
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The decision is per-app, configured in the manifest. Cyanea uses sequential (SQLite). Archipelag uses blue-green (PostgreSQL).&lt;&#x2F;p&gt;
&lt;h3 id=&quot;process-supervision&quot;&gt;Process supervision&lt;&#x2F;h3&gt;
&lt;p&gt;Vela does not just start your app and walk away. It supervises it. If a process crashes, Vela detects the exit (via non-blocking &lt;code&gt;try_wait&lt;&#x2F;code&gt; on the child process handle), logs it, and restarts from the stored launch configuration:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;rust&quot; class=&quot;language-rust &quot;&gt;&lt;code class=&quot;language-rust&quot; data-lang=&quot;rust&quot;&gt;pub async fn check_and_restart(&amp;amp;mut self) -&amp;gt; Vec&amp;lt;String&amp;gt; {
    let mut to_restart = Vec::new();

    for (key, process) in &amp;amp;mut self.running {
        match process.child.try_wait() {
            Ok(Some(status)) if !status.success() =&amp;gt; {
                &amp;#x2F;&amp;#x2F; Process exited unexpectedly
                to_restart.push((
                    key.clone(),
                    process.launch_config.clone(),
                ));
            }
            _ =&amp;gt; {}
        }
    }

    for (key, config) in to_restart {
        &amp;#x2F;&amp;#x2F; Restart on same port if available, allocate new otherwise
        self.restart_from_config(&amp;amp;key, &amp;amp;config).await;
    }
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Each app’s &lt;code&gt;LaunchConfig&lt;&#x2F;code&gt; (release directory, binary name, app type, environment variables, data directory) is stored so that restarts use the exact same configuration. The daemon also persists app state to disk, so if Vela itself restarts (server reboot, daemon upgrade), it restores all running apps from their saved configurations.&lt;&#x2F;p&gt;
&lt;p&gt;This is the kind of feature that would take a day to specify and a week to implement if you were writing it from scratch. With Claude, it took about an hour of iteration, including the edge cases around port reallocation and the pending&#x2F;active state split during deploys.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;built-in-services&quot;&gt;Built-in services&lt;&#x2F;h3&gt;
&lt;p&gt;Both of my apps have service dependencies. Archipelag needs PostgreSQL and NATS. Rather than managing these separately, Vela handles service provisioning directly:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;toml&quot; class=&quot;language-toml &quot;&gt;&lt;code class=&quot;language-toml&quot; data-lang=&quot;toml&quot;&gt;[services.postgres]
version = &amp;quot;17&amp;quot;
databases = [&amp;quot;archipelag_prod&amp;quot;]

[services.nats]
version = &amp;quot;2.10&amp;quot;
jetstream = true
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;On first deploy, Vela installs PostgreSQL (via apt), creates the database with a generated password, and injects &lt;code&gt;DATABASE_URL&lt;&#x2F;code&gt; into the app’s environment. For NATS, it downloads the binary, generates a config, and starts it as a supervised child process with &lt;code&gt;NATS_URL&lt;&#x2F;code&gt; injected. Service credentials persist across deploys and daemon restarts.&lt;&#x2F;p&gt;
&lt;p&gt;This was one of those features where the AI really earned its keep. The NATS lifecycle management alone, downloading the right binary for the platform, generating config, supervising the process, health-checking the monitoring endpoint, persisting credentials, involved touching six or seven modules. Claude handled the plumbing while I focused on the design decisions.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-reverse-proxy&quot;&gt;The reverse proxy&lt;&#x2F;h3&gt;
&lt;p&gt;Vela embeds its own reverse proxy built on hyper. It handles TLS termination (auto-provisioned via Let’s Encrypt ACME HTTP-01, or static certificates for Cloudflare setups), domain-based routing, WebSocket upgrades, and HTTP-to-HTTPS redirects.&lt;&#x2F;p&gt;
&lt;p&gt;The routing model is simple. A thread-safe hash map from domain to port:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;rust&quot; class=&quot;language-rust &quot;&gt;&lt;code class=&quot;language-rust&quot; data-lang=&quot;rust&quot;&gt;pub struct RouteTable {
    routes: Arc&amp;lt;RwLock&amp;lt;HashMap&amp;lt;String, u16&amp;gt;&amp;gt;&amp;gt;,
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;When a request arrives, Vela extracts the Host header, looks up the port, and forwards the request to &lt;code&gt;localhost:{port}&lt;&#x2F;code&gt;. When a deploy swaps traffic, it is a single write-lock on the hash map to update the port number. Atomic. No configuration reload. No proxy restart.&lt;&#x2F;p&gt;
&lt;p&gt;For WebSocket connections (which both Phoenix apps use for LiveView), Vela detects the &lt;code&gt;Upgrade: websocket&lt;&#x2F;code&gt; header and switches to raw TCP tunneling with bidirectional I&#x2F;O. This was important for my use case, Phoenix LiveView is WebSocket-native, and if the proxy does not handle upgrades correctly, the entire UI breaks.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;from-empty-box-to-production-in-a-day&quot;&gt;From empty box to production in a day&lt;&#x2F;h2&gt;
&lt;p&gt;Here is the timeline of the actual migration. I bought the Hetzner server and within about 48 hours, both apps were running in production with HTTPS, process supervision, automated backups, and daily health reports.&lt;&#x2F;p&gt;
&lt;p&gt;The sequence went roughly like this:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hardware validation&lt;&#x2F;strong&gt;: Check NVMe drive health, run memory tests, verify RAID configuration. The drives had about 25,000 power-on hours (these are auction servers, they have been used), but SMART health passed and wear levels were well within acceptable range.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;OS provisioning&lt;&#x2F;strong&gt;: Debian, RAID 1 across both NVMe drives. Straightforward.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Server hardening&lt;&#x2F;strong&gt;: Firewall rules, SSH hardening (key-only auth, non-default port, rate limiting), automatic security updates, intrusion detection. This is the part I am deliberately vague about. If you are running a public-facing server, hardening is non-negotiable, but I am not going to publish my exact firewall configuration.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Vela installation&lt;&#x2F;strong&gt;: Download the binary, create a config file, install the systemd service. Five minutes.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;First app deployed (Cyanea)&lt;&#x2F;strong&gt;: Built the Elixir release on the server, set secrets, ran migrations, deployed. The entire build-and-deploy cycle for a Phoenix app with a Rust NIF took about fifteen minutes, most of which was compilation.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Second app deployed (Archipelag)&lt;&#x2F;strong&gt;: Same flow, plus provisioning PostgreSQL and restoring a database dump from Fly, plus setting up NATS. About thirty minutes.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;TLS certificates&lt;&#x2F;strong&gt;: Updated DNS records, Let’s Encrypt certificates issued automatically. Vela handles the ACME challenge internally, no Certbot, no cron job, no manual cert management.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Monitoring&lt;&#x2F;strong&gt;: A daily health report script that checks system metrics, service status, and app health, then emails a summary. Simple but effective.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The most time-consuming part was not the tooling. It was migrating the PostgreSQL data from Fly and verifying that both apps behaved correctly in their new environment. The infrastructure setup itself, the part that would have taken weeks without Vela, took hours.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-broader-thesis&quot;&gt;The broader thesis&lt;&#x2F;h2&gt;
&lt;p&gt;Here is what I think is happening, and I think it is bigger than my personal infrastructure bill.&lt;&#x2F;p&gt;
&lt;p&gt;The cloud won because it sold a bundle: compute, networking, storage, deployment, monitoring, scaling, security, all integrated, all managed. The alternative was building each piece yourself, and the labor cost made that prohibitive for small teams. Managed infrastructure was cheaper than an ops engineer.&lt;&#x2F;p&gt;
&lt;p&gt;AI changes that equation. Not by making the cloud cheaper, but by making bespoke tooling economically viable.&lt;&#x2F;p&gt;
&lt;p&gt;Consider what I got with Vela. A deployment tool that does exactly what I need and nothing more. No container orchestration, because I do not use containers. No multi-region routing, because I run in one location. No autoscaling, because two apps do not need to autoscale. Every feature exists because I needed it. Every feature works with my specific stack (Elixir&#x2F;BEAM, Rust, SQLite, PostgreSQL, NATS). The tool is tailored to my workload the way a bespoke suit is tailored to a body.&lt;&#x2F;p&gt;
&lt;p&gt;This kind of custom tooling used to be a luxury. You needed either a platform team that could invest weeks of engineering time, or the rare individual who was both a skilled systems programmer and willing to spend their evenings writing deployment tools instead of building products. The economics did not make sense for a solo founder or a two-person team.&lt;&#x2F;p&gt;
&lt;p&gt;With AI, the cost of building bespoke tooling drops by an order of magnitude. Not to zero, you still need to know what you want, you still need to test and iterate, you still need to understand the domain well enough to evaluate the output. But the gap between “I know what I need” and “I have a working implementation” shrinks from weeks to hours.&lt;&#x2F;p&gt;
&lt;p&gt;And when bespoke tooling is cheap, the cloud’s bundle becomes less compelling. You do not need the managed Kubernetes service if you can build a deployment tool that fits your exact needs. You do not need the managed database service if you can install PostgreSQL yourself and the AI helps you set up backups, monitoring, and failover. You do not need the managed TLS service if your deployment tool handles ACME natively.&lt;&#x2F;p&gt;
&lt;p&gt;What you are left paying for is compute and bandwidth. And for compute and bandwidth, bare metal is drastically cheaper than the cloud.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-api-is-bespoke&quot;&gt;The API is bespoke&lt;&#x2F;h2&gt;
&lt;p&gt;There is a subtlety here that I think is worth calling out. When people talk about the cloud’s advantages, they often point to the API-driven experience. Infrastructure as code. Declarative configuration. Programmable everything. And that is real. The cloud’s API layer is genuinely valuable.&lt;&#x2F;p&gt;
&lt;p&gt;But the API does not have to come from a cloud provider. It can come from your own tooling.&lt;&#x2F;p&gt;
&lt;p&gt;Vela gives me an API-driven experience. I declare my app’s configuration in a TOML manifest. I run a single command to deploy. I can check status, stream logs, manage secrets, trigger backups, and roll back releases, all from my laptop, all through a CLI that speaks SSH to a daemon on the server. The experience is not worse than Fly or Heroku. In some ways it is better, because the tool does exactly what I need and nothing else, and when something goes wrong, I can read the source code.&lt;&#x2F;p&gt;
&lt;p&gt;The difference is that my “API” is a 5,000-line Rust binary instead of a multi-billion-dollar cloud platform. And that is fine. I do not need the platform. I need the interface. AI lets me build the interface.&lt;&#x2F;p&gt;
&lt;p&gt;This is, I think, the pattern that will play out more broadly. The cloud’s value was never just compute. It was the operational layer on top of compute, the tooling that made raw hardware usable. AI makes it possible to build that operational layer yourself, tailored to your needs, at a fraction of the cost. The cloud becomes optional. The server becomes a commodity. The differentiator is the tooling, and the tooling is something AI can help you build.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-the-numbers-look-like&quot;&gt;What the numbers look like&lt;&#x2F;h2&gt;
&lt;p&gt;Let me be concrete about costs, because this is ultimately an economic argument.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Kubernetes on DigitalOcean&lt;&#x2F;strong&gt; (my original setup):&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Managed K8s cluster: ~$12&#x2F;month (control plane fee)&lt;&#x2F;li&gt;
&lt;li&gt;Worker nodes (2x smallest): ~$24&#x2F;month&lt;&#x2F;li&gt;
&lt;li&gt;Managed PostgreSQL: ~$15&#x2F;month&lt;&#x2F;li&gt;
&lt;li&gt;Load balancer: ~$12&#x2F;month&lt;&#x2F;li&gt;
&lt;li&gt;Persistent volumes, bandwidth, extras: ~$15&#x2F;month&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$78-80&#x2F;month&lt;&#x2F;strong&gt; (and this was after I trimmed it)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Fly.io&lt;&#x2F;strong&gt; (the middle ground):&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Two Phoenix apps (shared-cpu-1x, 256MB each): ~$14&#x2F;month&lt;&#x2F;li&gt;
&lt;li&gt;Managed Postgres: ~$25&#x2F;month&lt;&#x2F;li&gt;
&lt;li&gt;Managed NATS: ~$20&#x2F;month&lt;&#x2F;li&gt;
&lt;li&gt;Bandwidth, extras: ~$10&#x2F;month&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$70&#x2F;month&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The managed services were the killer. Fly’s compute pricing is fair, but managed Postgres and managed NATS added up fast. And that was at near-zero traffic. Egress pricing on Fly is metered, so if either app had started getting real user load, the bandwidth bill alone would have pushed the total well past a hundred dollars a month.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Hetzner bare metal&lt;&#x2F;strong&gt; (current):&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Dedicated server (auction): EUR 38&#x2F;month (~$42)&lt;&#x2F;li&gt;
&lt;li&gt;That is it. PostgreSQL, NATS, TLS, everything runs on the box.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$42&#x2F;month&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The Hetzner box is cheaper than Fly right now, and the gap only widens as usage grows. But the raw dollar comparison understates the difference. Look at what I am getting. 128 GB of RAM versus 512 MB. Multi-core CPU versus shared fractional cores. Two terabytes of NVMe storage versus a few gigs. Bandwidth that is essentially unlimited (Hetzner includes 20 TB of traffic) versus metered egress that scales with every user you add.&lt;&#x2F;p&gt;
&lt;p&gt;The capacity gap is the real story. On Fly, scaling from two apps to ten means linearly increasing costs, more VMs, more managed database instances, more bandwidth charges. On my Hetzner box, scaling from two apps to ten means… nothing. The resources are already there. I paid for them. PostgreSQL, NATS, any other service I want to run, it all fits on the same box with room to spare.&lt;&#x2F;p&gt;
&lt;p&gt;And there is no surprise bill. No bandwidth overage. No “your database exceeded the row limit” fee. No managed service add-on creep. Thirty-eight euros a month, every month, regardless of what I run on it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-this-does-not-apply&quot;&gt;When this does not apply&lt;&#x2F;h2&gt;
&lt;p&gt;I would be dishonest if I pretended bare metal is the right answer for everyone. It is not.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;If you need multi-region presence&lt;&#x2F;strong&gt;, the cloud still wins. Running your own hardware in three continents is a different kind of problem. Edge computing, CDN-native architectures (which I &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;raskell.io&#x2F;articles&#x2F;edge-systems-are-the-new-backend&#x2F;&quot;&gt;wrote about previously&lt;&#x2F;a&gt;), and platforms like Fly or Cloudflare Workers are the right tools for workloads that need to be close to users worldwide.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;If you need elastic scaling&lt;&#x2F;strong&gt;, bare metal does not flex. A server has fixed resources. If your traffic spikes 10x for an hour, you cannot add capacity on demand. You can over-provision (and at these prices, generous over-provisioning is affordable), but it is not the same as true elasticity.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;If you do not understand the operational basics&lt;&#x2F;strong&gt;, bare metal will bite you. Server hardening, backup strategies, disk monitoring, security patching, these are your responsibility. The cloud abstracts them away. On bare metal, a missed security update is your problem. A full disk is your problem. A failed drive (RAID helps, but is not magic) is your problem.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;If your team is large and needs guardrails&lt;&#x2F;strong&gt;, managed platforms provide consistency and governance that bare metal does not. Kubernetes is complex, but it is complex in a standardized way. Everyone knows how to deploy to K8s. Everyone knows how to debug a pod. Your custom Vela setup is legible to exactly the people who built it.&lt;&#x2F;p&gt;
&lt;p&gt;The sweet spot for bare metal, especially AI-assisted bare metal, is small teams building products that need reliability but not scale, performance but not elasticity, control but not standardization. Solo founders. Two-person startups. Side projects that might become real businesses.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-learned&quot;&gt;What I learned&lt;&#x2F;h2&gt;
&lt;p&gt;The migration took about 48 hours from “I have an empty server” to “both apps are in production with HTTPS, monitoring, and automated backups.” Most of that time was data migration and validation, not infrastructure setup.&lt;&#x2F;p&gt;
&lt;p&gt;Vela is now at version 0.5.0 with a feature list I am genuinely proud of: blue-green and sequential deploys, process supervision with auto-restart, built-in reverse proxy with auto-TLS, service dependency management (Postgres and NATS), secret management, log streaming, rollbacks, remote builds, scheduled backups, deploy hooks, and machine-readable status output for monitoring integration.&lt;&#x2F;p&gt;
&lt;p&gt;I built most of it in a few focused sessions with Claude Code. Not because the code is trivial, it is about 4,000 lines of Rust with async IPC, Unix socket communication, ACME certificate management, process lifecycle handling, and a reverse proxy with WebSocket support. But because the problem domain is well-understood, the requirements were clear, and AI is remarkably good at turning clear requirements into working implementations.&lt;&#x2F;p&gt;
&lt;p&gt;The thing I keep coming back to: the cloud was never selling compute. It was selling convenience. And convenience used to require a company with thousands of engineers to build platforms that abstracted away the hard parts. Now, a developer with a clear idea of what they need and an AI that can write systems code can build a fit-for-purpose operational layer in a weekend.&lt;&#x2F;p&gt;
&lt;p&gt;That does not make the cloud irrelevant. It makes the cloud optional for a much larger class of workloads than it was before.&lt;&#x2F;p&gt;
&lt;p&gt;Buy a server. Build your tools. Ship your product. The infrastructure should be boring. With AI, it finally can be.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references-and-further-reading&quot;&gt;References and further reading&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;tools-and-platforms&quot;&gt;Tools and platforms&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;vela&quot;&gt;Vela&lt;&#x2F;a&gt; - The bare-metal deployment tool built in this article&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.hetzner.com&#x2F;sb&#x2F;&quot;&gt;Hetzner Server Auction&lt;&#x2F;a&gt; - Refurbished dedicated servers at steep discounts&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;fly.io&quot;&gt;fly.io&lt;&#x2F;a&gt; - The managed platform I migrated from&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.nomadproject.io&#x2F;&quot;&gt;Nomad&lt;&#x2F;a&gt; - HashiCorp’s minimal workload orchestrator&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;opentofu.org&#x2F;&quot;&gt;OpenTofu&lt;&#x2F;a&gt; - Community fork of Terraform after the BSL relicense&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;cloudflare&#x2F;pingora&quot;&gt;Pingora&lt;&#x2F;a&gt; - Cloudflare’s Rust framework for building programmable proxies&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hyper.rs&#x2F;&quot;&gt;hyper&lt;&#x2F;a&gt; - Rust HTTP library powering Vela’s reverse proxy&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;letsencrypt.org&#x2F;&quot;&gt;Let’s Encrypt&lt;&#x2F;a&gt; - Free TLS certificates, automated via ACME in Vela&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nats.io&#x2F;&quot;&gt;NATS&lt;&#x2F;a&gt; - Lightweight messaging system used by Archipelag&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;frameworks-and-runtimes&quot;&gt;Frameworks and runtimes&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.phoenixframework.org&#x2F;&quot;&gt;Phoenix Framework&lt;&#x2F;a&gt; - Elixir web framework powering both apps&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.erlang.org&#x2F;&quot;&gt;Erlang&#x2F;OTP&lt;&#x2F;a&gt; - The BEAM virtual machine that runs Phoenix and Elixir&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.rust-lang.org&#x2F;&quot;&gt;Rust&lt;&#x2F;a&gt; - Systems language Vela and Zentinel are written in&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;projects-referenced&quot;&gt;Projects referenced&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&quot;&gt;Zentinel&lt;&#x2F;a&gt; - Security-first reverse proxy built on Pingora&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&quot;&gt;Archipelag&lt;&#x2F;a&gt; - Distributed compute platform&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cyanea.bio&quot;&gt;Cyanea&lt;&#x2F;a&gt; - Bioinformatics platform&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;anthropics&#x2F;claude-code&quot;&gt;Claude Code&lt;&#x2F;a&gt; - AI coding tool used to build Vela&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>Edge Systems Are the New Backend</title>
          <pubDate>Wed, 11 Feb 2026 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/edge-systems-are-the-new-backend/</link>
          <guid>https://raskell.io/articles/edge-systems-are-the-new-backend/</guid>
          <description xml:base="https://raskell.io/articles/edge-systems-are-the-new-backend/">&lt;p&gt;A request arrives at your system. In the next 50 milliseconds, before any application code runs, this happens: TLS termination, route matching, WAF inspection against 285 detection rules, JWT validation, rate limit evaluation, request body validation against a JSON schema, and trace context generation. The request either dies at the edge or arrives at your backend pre-authenticated, pre-validated, and pre-authorized.&lt;&#x2F;p&gt;
&lt;p&gt;Five years ago, your backend did all of this. Every service validated its own tokens, enforced its own rate limits, ran its own security checks. Today, the backend might not even exist in the form you expect. It might be a static site served from edge nodes, a thin persistence API, or a headless CMS that publishes content to a CDN and never handles a user request directly.&lt;&#x2F;p&gt;
&lt;p&gt;Something shifted. Not just at the edge. On both ends.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-three-tier-past&quot;&gt;The three-tier past&lt;&#x2F;h2&gt;
&lt;p&gt;The architecture most of us learned was simple. Client, backend, database. The browser rendered HTML, maybe ran some jQuery. The backend did everything: authentication, authorization, business logic, rendering, validation, rate limiting, session management. The database stored state. Clean separation, one direction, easy to reason about.&lt;&#x2F;p&gt;
&lt;p&gt;This model worked because the browser was dumb. It could render markup and submit forms. Any real computation had to happen on the server. The backend was fat by necessity, not by design.&lt;&#x2F;p&gt;
&lt;p&gt;Microservices made it worse. Consider a typical setup: a user service, an order service, a payment service, a notification service, an inventory service. Each one needs to validate JWTs. Each one needs to enforce rate limits. Each one needs input validation, request logging, and error handling. That is five services times six concerns. Thirty implementations of logic that should exist exactly once.&lt;&#x2F;p&gt;
&lt;p&gt;Now multiply. Real organizations have 15, 50, 200 services. Each team implements auth slightly differently. One uses a shared library, one copied the code two years ago, one rolled their own because the library did not support their token format. The rate limiting configurations drift. The logging formats diverge. A security patch to the JWT validation logic means PRs across every repository, coordinated deployments, and someone asking “did we get all of them?”&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;                 ┌──────────┬──────────┬──────────┐
                 │ Users    │ Orders   │ Payments │
                 │ Service  │ Service  │ Service  │
                 ├──────────┼──────────┼──────────┤
  Auth           │ ✓ (v2.1) │ ✓ (v1.9) │ ✓ (v2.0)│
  Rate limiting  │ ✓ (lib)  │ ✓ (copy) │ ✗ (none)│
  Validation     │ ✓        │ ✓        │ ✓       │
  WAF&amp;#x2F;Security   │ ✗        │ ✗        │ ✗       │
  Logging        │ JSON     │ text     │ JSON    │
  Tracing        │ ✓        │ ✗        │ ✓       │
                 └──────────┴──────────┴──────────┘
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Libraries helped. Service meshes helped more. But the complexity was still distributed across every service, in every team’s codebase, in every deployment pipeline. The mesh moved networking concerns to a sidecar. It did not move application-level concerns like auth, validation, or security inspection.&lt;&#x2F;p&gt;
&lt;p&gt;The edge was an afterthought. A reverse proxy. TLS termination. Maybe Varnish for caching. Maybe a CDN for static assets. It was infrastructure plumbing, not a place where decisions happened.&lt;&#x2F;p&gt;
&lt;p&gt;That model is over.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;two-migrations-one-hollowing&quot;&gt;Two migrations, one hollowing&lt;&#x2F;h2&gt;
&lt;p&gt;Here is the thing I keep coming back to: business logic is migrating in two directions simultaneously.&lt;&#x2F;p&gt;
&lt;p&gt;Upward, to the edge. Infrastructure concerns like auth, WAF, and rate limiting now execute at the edge layer, before requests reach any backend. But it goes further than that. Edge Workers run actual application code. Containers deploy at the edge. Server-side rendering happens at edge nodes 50ms from the user, not in a data center 200ms away.&lt;&#x2F;p&gt;
&lt;p&gt;Downward, to the client. The browser is no longer dumb. WebAssembly runs near-native code. WebGPU puts the GPU to work on ML inference and image processing. Web Workers handle background computation. Service Workers intercept network requests and serve cached responses offline. CRDTs let the client own its data and sync when it feels like it.&lt;&#x2F;p&gt;
&lt;p&gt;The backend is caught in the middle. Squeezed from both sides. And what remains is not a “backend” in any traditional sense. It is a persistence layer. A place where data rests and syncs. The interesting work happens elsewhere.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-moved-to-the-edge&quot;&gt;What moved to the edge&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;infrastructure-concerns&quot;&gt;Infrastructure concerns&lt;&#x2F;h3&gt;
&lt;p&gt;The first wave was obvious. Cross-cutting concerns that every service needed are better handled once, at the point of entry.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Authentication.&lt;&#x2F;strong&gt; Validating a JWT does not require application context. The token is self-contained: a signature, an issuer, an expiry, a set of claims. Parse it, verify the signature against a JWKS endpoint, check the expiry, extract the claims, attach them as headers. Done. The backend receives &lt;code&gt;X-User-Id: alice&lt;&#x2F;code&gt; and &lt;code&gt;X-User-Role: admin&lt;&#x2F;code&gt; instead of a raw Bearer token it has to decode itself.&lt;&#x2F;p&gt;
&lt;p&gt;This is not hypothetical. Here is what this looks like in practice:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;kdl&quot; class=&quot;language-kdl &quot;&gt;&lt;code class=&quot;language-kdl&quot; data-lang=&quot;kdl&quot;&gt;agent &amp;quot;auth&amp;quot; {
    type &amp;quot;auth&amp;quot;
    grpc address=&amp;quot;http:&amp;#x2F;&amp;#x2F;localhost:50051&amp;quot;
    events &amp;quot;request_headers&amp;quot;
    timeout-ms 100
    failure-mode &amp;quot;closed&amp;quot;
    max-concurrent-calls 100
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That agent handles JWT, OIDC, SAML, mTLS, and API key validation. Every route behind it gets authentication for free. Every backend service trusts the edge to have done the work. The auth agent crashes? Failure mode is “closed”. Requests stop, but the proxy stays up.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Rate limiting.&lt;&#x2F;strong&gt; Token bucket algorithms with per-client keys. The edge layer sees every request before the backend does. It is the natural place to enforce rate limits because it can reject bad traffic before it consumes backend resources. A rejected request at the edge costs microseconds. A rejected request at the backend costs a database query, a connection slot, and whatever work happened before the check.&lt;&#x2F;p&gt;
&lt;p&gt;There are two flavors. Local rate limiting uses in-process token buckets. Fast, no network hops, but each edge node tracks its own counters. If you have 10 edge nodes and a limit of 100 requests per second, each node allows 100, so the effective limit is 1,000. For most use cases, this is fine. Abuse does not distribute itself evenly across your infrastructure.&lt;&#x2F;p&gt;
&lt;p&gt;Distributed rate limiting uses a shared store (Redis, typically). Accurate across nodes, but adds a network hop per request. The tradeoff is latency versus precision. I default to local rate limiting and switch to distributed only when the use case demands exact global limits, like API billing or token budgets for LLM inference.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Security inspection.&lt;&#x2F;strong&gt; WAFs used to be appliances. Expensive, opaque, binary. A request was either blocked or allowed. Modern WAFs use anomaly scoring. Each rule contributes a score, and the total determines the action:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;Score 0-9:    Allow
Score 10-24:  Log (warning, investigate later)
Score 25+:    Block
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is a fundamentally different model than binary block&#x2F;allow. It lets you tune aggressively without breaking legitimate traffic. I run 285 detection rules at the edge and process 912K requests per second on clean traffic. That is 30x faster than ModSecurity’s C implementation. The performance gap matters because it means WAF inspection can happen on every request, not just suspicious ones.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;API validation.&lt;&#x2F;strong&gt; If your API has a JSON Schema, why validate request bodies in your application code? Validate at the edge. Reject malformed requests before they consume a connection, a goroutine, a database transaction. The backend receives only structurally valid payloads.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Observability.&lt;&#x2F;strong&gt; Trace context should originate at the edge, not at the application. The edge is where the request enters your system. It is where you assign a trace ID, start the clock, and record the first span. If you originate traces in your application, you miss everything that happened before: TLS negotiation time, WAF processing time, the fact that the request sat in a rate limit queue for 50ms. Starting traces at the edge gives you the full picture.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-isolation-problem&quot;&gt;The isolation problem&lt;&#x2F;h3&gt;
&lt;p&gt;You cannot put all of this in a monolithic proxy. That is how you end up with nginx and 47 modules where nobody understands the interaction effects. A WAF bug should not take down your routing. A slow auth provider should not block rate limit checks.&lt;&#x2F;p&gt;
&lt;p&gt;The answer is process isolation. Thin dataplane, crash-isolated external agents. Each agent runs as a separate process with its own failure domain:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;┌──────────────────────────────────────────┐
│ Edge Proxy (thin dataplane)              │
│ Routing │ TLS │ Caching │ Load Balancing │
└─────┬──────────┬──────────┬──────────────┘
      │          │          │
      ▼          ▼          ▼
   [WAF]      [Auth]    [Rate Limit]
  process     process     process
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Each agent gets its own concurrency semaphore. A slow WAF cannot starve auth. Each agent has a circuit breaker. Three failures in 30 seconds and the circuit opens. Each agent has a configurable failure mode, and this is where the design gets interesting:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;kdl&quot; class=&quot;language-kdl &quot;&gt;&lt;code class=&quot;language-kdl&quot; data-lang=&quot;kdl&quot;&gt;agent &amp;quot;waf&amp;quot; {
    type &amp;quot;waf&amp;quot;
    timeout-ms 100
    failure-mode &amp;quot;closed&amp;quot;
    max-concurrent-calls 50
    circuit-breaker {
        failure-threshold 5
        success-threshold 2
        timeout-seconds 30
    }
}

agent &amp;quot;rate-limit&amp;quot; {
    type &amp;quot;rate-limit&amp;quot;
    timeout-ms 50
    failure-mode &amp;quot;open&amp;quot;
    max-concurrent-calls 200
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The WAF fails closed. If it crashes or times out, requests are blocked. You lose availability to preserve security. Rate limiting fails open. If it crashes, requests are allowed. You lose rate enforcement to preserve availability. These are explicit choices per agent, not global defaults. The operator decides which tradeoff to make for each concern, and the decision is visible in the config, not buried in code.&lt;&#x2F;p&gt;
&lt;p&gt;Agents return decisions. The proxy merges them. A blocking decision from any agent wins. Otherwise, header mutations accumulate. The model is simple: agents advise, the proxy decides. No agent can override another agent’s block. No agent can force a request through. The proxy owns the final call.&lt;&#x2F;p&gt;
&lt;p&gt;This is not a workaround. It is the fundamental design choice. Complex logic lives outside the core, behind process boundaries. The proxy stays small, fast, and boring. The agents handle the interesting work in isolation. A bug in a Lua scripting agent does not corrupt the routing table. A memory leak in the WAF agent does not exhaust the proxy’s memory. The process boundary is the blast radius.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;edge-workers-business-logic-at-the-edge&quot;&gt;Edge Workers: business logic at the edge&lt;&#x2F;h3&gt;
&lt;p&gt;Infrastructure concerns were the first wave. The second wave is actual business logic.&lt;&#x2F;p&gt;
&lt;p&gt;Cloudflare Workers, Deno Deploy, Fastly Compute, Vercel Edge Functions. These are not just “serverless at the CDN.” They are full compute environments running at edge nodes around the world. V8 isolates spin up in under 5ms. Cold starts are measured in single-digit milliseconds, not seconds. Your code runs 50ms from the user instead of 200ms away in us-east-1.&lt;&#x2F;p&gt;
&lt;p&gt;The constraints matter, because they shape what belongs here. Typical Edge Worker limits: 10-50ms CPU time per request (not wall time, actual CPU), 128MB memory, no raw TCP sockets, no persistent file system. You get a request, key-value storage, and the ability to make sub-requests to origins. That is it. These constraints are not bugs. They are what makes sub-millisecond cold starts possible. V8 isolates are cheap because they are small and short-lived.&lt;&#x2F;p&gt;
&lt;p&gt;What fits within these constraints is surprisingly broad:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;API routing and transformation.&lt;&#x2F;strong&gt; A request comes in for &lt;code&gt;&#x2F;api&#x2F;v2&#x2F;users&lt;&#x2F;code&gt;. The edge Worker rewrites it, fans out to two backend services (user profiles from one, preferences from another), merges the responses, and returns a single payload. The backend services are simple data sources. The edge Worker is the API layer.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;A&#x2F;B testing and feature flags.&lt;&#x2F;strong&gt; Read the experiment cookie, hash the user ID, assign a variant, route to the right origin or rewrite the response. No round trip to a feature flag service. The decision happens in microseconds at the node closest to the user.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Personalization.&lt;&#x2F;strong&gt; Look up the user’s segment in KV storage, inject the right content block, set cache headers accordingly. The backend generated all variants at build time. The edge picks the right one per request.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Server-side rendering.&lt;&#x2F;strong&gt; Render HTML at the edge node closest to the user. Frameworks like Next.js and Remix already support this. React Server Components run at the edge. The “server” in server-side rendering is not your server. It is an edge node in 300 locations.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Authentication and session management.&lt;&#x2F;strong&gt; Validate tokens, refresh sessions, set secure cookies. The auth flow never touches your origin. Cloudflare Workers KV or Durable Objects store session state at the edge.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The pattern: compute that depends on request context but not on deep application state moves to the edge. If you can do it with a request, a key-value lookup, and a response, it probably belongs here. If it needs a complex database query or a multi-step transaction, it does not.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;containers-at-the-edge&quot;&gt;Containers at the edge&lt;&#x2F;h3&gt;
&lt;p&gt;Edge Workers hit a ceiling when you need persistent connections, large memory, or long-running processes. For those workloads, containers at the edge.&lt;&#x2F;p&gt;
&lt;p&gt;Fly.io, Railway, and Lambda@Edge deploy containers or full processes to edge locations worldwide. Your application runs with real file systems, TCP connections, and whatever runtime you need. But it runs close to users, not in a centralized data center. Latency drops from 200ms to 20ms.&lt;&#x2F;p&gt;
&lt;p&gt;The interesting problem is data gravity. Compute is easy to distribute. Data is not. If your container runs in Tokyo but your database is in Frankfurt, you have not solved the latency problem. You have moved it from the user-to-server hop to the server-to-database hop. The solutions are still maturing: read replicas at the edge (Turso, Neon), embedded databases that sync (LiteFS, libSQL), and eventually-consistent stores designed for multi-region (DynamoDB Global Tables, CockroachDB).&lt;&#x2F;p&gt;
&lt;p&gt;This model makes sense when compute and data can be co-located:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Regional APIs&lt;&#x2F;strong&gt; that comply with data residency requirements. Run the container and the database replica in the same region. GDPR data stays in the EU. Japanese user data stays in Japan.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Real-time applications&lt;&#x2F;strong&gt; where 200ms round trips kill the experience. Collaborative editing, multiplayer, live dashboards. A WebSocket server 20ms away feels instant. One 200ms away feels sluggish.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Stateful edge compute&lt;&#x2F;strong&gt; where you need more than a request&#x2F;response cycle. Background processing, scheduled jobs, long-running connections.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The line between “edge” and “origin” blurs. If your container runs in 30 regions and handles requests locally with a local database replica, is that an edge deployment or a distributed backend? The distinction stops mattering. What matters is that the compute and the data are close to the user.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-moved-to-the-client&quot;&gt;What moved to the client&lt;&#x2F;h2&gt;
&lt;p&gt;The other half of the migration goes downward. The browser is not the thin client it used to be.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;webassembly&quot;&gt;WebAssembly&lt;&#x2F;h3&gt;
&lt;p&gt;WASM runs at near-native speed in every modern browser. Not “fast for JavaScript.” Actually fast. Compiled from Rust, C++, Go, or any language with an LLVM backend. Sandboxed, portable, deterministic.&lt;&#x2F;p&gt;
&lt;p&gt;What this enables:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Image and video processing&lt;&#x2F;strong&gt; in the browser. No upload to a server, no round trip, no privacy concern. The pixels never leave the device.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Document parsing and transformation.&lt;&#x2F;strong&gt; PDF rendering, spreadsheet computation, file format conversion. Libraries compiled to WASM and running client-side.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Cryptographic operations.&lt;&#x2F;strong&gt; End-to-end encryption where the server never sees plaintext. Key derivation, signing, verification, all in the browser.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Compute offloading from the backend.&lt;&#x2F;strong&gt; This is the one that changes how you think about server sizing.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Full relational databases in the browser.&lt;&#x2F;strong&gt; This is the one that changes architectures.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;I build on this pattern directly. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cyanea.bio&quot;&gt;Cyanea&lt;&#x2F;a&gt;, a bioinformatics platform, uses WASM to offload computation from the backend to the client’s browser. Sequence analysis, structure visualization, dataset filtering, these are CPU-intensive operations that traditionally require beefy server infrastructure. Instead, the computation runs right there on the researcher’s device. The backend stays thin: it stores datasets and coordinates collaboration, but the heavy lifting happens in the browser. This means I can run the platform on a modest server and still deliver real computational capability, because the “compute fleet” is the users’ own machines.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&quot;&gt;Archipelag&lt;&#x2F;a&gt; takes the same idea in a different direction. It is a distributed compute platform where users contribute their browser’s idle compute power via WASM. The workloads compile to WASM modules, ship to participating browsers, execute in the sandbox, and return results. The browser is not just a client consuming a service, it is a compute node in a distributed system. The WASM sandbox is what makes this safe: untrusted code runs in a constrained environment with no access to the host file system, network, or memory beyond what is explicitly granted.&lt;&#x2F;p&gt;
&lt;p&gt;SQLite compiled to WASM (via projects like sql.js, wa-sqlite, or the official SQLite WASM build) gives the browser a real relational database. Not a key-value store. Not IndexedDB’s awkward object store API. Actual SQL with joins, indexes, transactions, and triggers. Backed by the Origin Private File System (OPFS) for persistence, it survives page reloads and browser restarts.&lt;&#x2F;p&gt;
&lt;p&gt;The implications are significant. Your application can run complex queries locally. Filter, sort, aggregate, full-text search. All instant, all offline. The server becomes a sync endpoint. It ships a database snapshot down and accepts change sets back. The client does the querying. The server does the storing.&lt;&#x2F;p&gt;
&lt;p&gt;This pattern scales down elegantly. A note-taking app with SQLite-in-WASM needs no backend API for reads. A project management tool can filter and search 10,000 tasks without a network request. A CMS authoring interface can work fully offline and sync when the author reconnects. The read path is local. The write path syncs eventually.&lt;&#x2F;p&gt;
&lt;p&gt;WASI (WebAssembly System Interface) extends this further. It gives WASM modules controlled access to file systems, clocks, and network sockets outside the browser. WASM becomes a universal runtime: the same binary runs in the browser, at the edge (Cloudflare Workers use WASM under the hood), and on bare metal. Write once, deploy to every layer of the stack.&lt;&#x2F;p&gt;
&lt;p&gt;The pattern: anything that is CPU-bound, privacy-sensitive, or latency-sensitive is a candidate for client-side WASM. If the computation does not need server-side state, it should not round-trip to a server.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;webgpu&quot;&gt;WebGPU&lt;&#x2F;h3&gt;
&lt;p&gt;WebGPU landed in Chrome in 2023, in Firefox and Safari shortly after, and it changes the math on what the client can compute. This is not WebGL with a new name. WebGL exposes a graphics pipeline. WebGPU exposes compute shaders. Direct, general-purpose GPU computation from JavaScript or WASM.&lt;&#x2F;p&gt;
&lt;p&gt;The immediate application is ML inference. Run a language model, an image classifier, or a recommendation engine on the user’s GPU. No server call, no API cost per token, no latency. The model weights download once (cached by the browser) and run locally. Privacy by default, because the data never leaves the device.&lt;&#x2F;p&gt;
&lt;p&gt;This is not theoretical. Stable Diffusion generates images in the browser via WebGPU. Small language models (Phi-2, Gemma 2B, Llama 3.2 1B) run at usable speeds on consumer hardware. MediaPipe runs pose detection, face tracking, and hand gesture recognition in real time. The trajectory is clear: models get smaller through distillation and quantization, consumer GPUs get faster, and the gap between “cloud inference” and “local inference” narrows every quarter.&lt;&#x2F;p&gt;
&lt;p&gt;Both Cyanea and Archipelag use WebGPU alongside WASM. In Cyanea, WebGPU accelerates molecular visualization and large-scale dataset operations, the kind of parallel computation that bioinformatics demands but that would be prohibitively expensive to run server-side for every user session. In Archipelag, WebGPU-capable nodes can take on GPU-accelerated workloads from the compute pool, turning a user’s idle GPU into a productive resource. The combination of WASM for general compute and WebGPU for parallel workloads gives the browser a compute profile that would have required dedicated server hardware five years ago.&lt;&#x2F;p&gt;
&lt;p&gt;But inference is not the only use case. WebGPU handles any parallel computation: physics simulations for games, signal processing for audio applications, particle systems for data visualization, and large-scale matrix operations. Anything you would reach for CUDA or Metal for on native can now run in the browser. The compute budget of the client just increased by orders of magnitude.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;web-workers-and-service-workers&quot;&gt;Web Workers and Service Workers&lt;&#x2F;h3&gt;
&lt;p&gt;Web Workers give you background threads. Heavy computation does not block the UI. Parse a large file, run a simulation, index a search corpus. All off the main thread, all without janking the interface.&lt;&#x2F;p&gt;
&lt;p&gt;Service Workers sit between the browser and the network. They intercept every fetch request and decide what to do: serve from cache, go to network, do both and race them. This enables:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Offline-first applications.&lt;&#x2F;strong&gt; The app works without a network connection. Data syncs when connectivity returns.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Background sync.&lt;&#x2F;strong&gt; Queue mutations while offline, replay them when online.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Push notifications.&lt;&#x2F;strong&gt; Wake the app without the user having it open.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Intelligent caching.&lt;&#x2F;strong&gt; Cache API responses, serve stale data while revalidating, pre-fetch resources the user is likely to need.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The Service Worker is the client-side equivalent of the edge proxy. It intercepts, caches, validates, and routes. It makes the client self-sufficient.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;local-first-and-crdts&quot;&gt;Local-first and CRDTs&lt;&#x2F;h3&gt;
&lt;p&gt;Here is where it gets interesting. If the client has compute (WASM, WebGPU, Web Workers) and storage (IndexedDB, OPFS) and offline capability (Service Workers), why does it need a server at all?&lt;&#x2F;p&gt;
&lt;p&gt;CRDTs (Conflict-free Replicated Data Types) answer the consistency question. Multiple clients can edit the same data independently, offline, with no coordination. When they reconnect, their changes merge automatically without conflicts. No server-mediated locking. No “last write wins” data loss. Mathematical guarantees that concurrent edits converge to the same state.&lt;&#x2F;p&gt;
&lt;p&gt;The architecture:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;Client A (offline)     Client B (offline)
    │                      │
    ├── Local edits        ├── Local edits
    │   (CRDT ops)         │   (CRDT ops)
    │                      │
    └──────┐      ┌────────┘
           ▼      ▼
      ┌──────────────┐
      │ Sync service  │  (thin, stateless)
      │ (persistence  │
      │  + relay)     │
      └──────────────┘
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The sync service is not a backend. It stores operations and relays them between clients. It does not run business logic. It does not validate (the CRDT handles consistency). It does not transform (the merge function is built into the data type). It is a persistence layer with a WebSocket attached.&lt;&#x2F;p&gt;
&lt;p&gt;I build systems like this. The concrete model: a document is a flat &lt;code&gt;HashMap&amp;lt;EntityId, Entity&amp;gt;&lt;&#x2F;code&gt; where each entity holds CRDT-typed fields. The field types determine how concurrent edits merge:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;CRDT type&lt;&#x2F;th&gt;&lt;th&gt;Merge behavior&lt;&#x2F;th&gt;&lt;th&gt;Use case&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;LwwRegister&amp;lt;T&amp;gt;&lt;&#x2F;td&gt;&lt;td&gt;Last writer wins (by timestamp)&lt;&#x2F;td&gt;&lt;td&gt;Simple values: name, status, URL&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;GrowOnlySet&amp;lt;T&amp;gt;&lt;&#x2F;td&gt;&lt;td&gt;Union of both sides&lt;&#x2F;td&gt;&lt;td&gt;Tags, labels, immutable references&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;ObservedRemoveSet&amp;lt;T&amp;gt;&lt;&#x2F;td&gt;&lt;td&gt;Add wins over concurrent remove&lt;&#x2F;td&gt;&lt;td&gt;Collaborator lists, mutable collections&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;MaxRegister&lt;&#x2F;td&gt;&lt;td&gt;Higher value wins&lt;&#x2F;td&gt;&lt;td&gt;Version counters, progress indicators&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;MinRegister&lt;&#x2F;td&gt;&lt;td&gt;Lower value wins&lt;&#x2F;td&gt;&lt;td&gt;Earliest timestamps, priority values&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;Each field carries a hybrid logical clock (HLC) timestamp. The HLC combines physical time with a logical counter, so causality is preserved even when wall clocks drift. Two clients edit the same field at the “same” time? The HLC ordering is deterministic. Both clients converge to the same value without coordination.&lt;&#x2F;p&gt;
&lt;p&gt;The merge function has three properties that make this work: it is associative (grouping does not matter), commutative (order does not matter), and idempotent (applying the same operation twice has no additional effect). These are not implementation details. They are the mathematical foundation that makes server-free consistency possible. You can sync operations in any order, from any number of clients, through any number of intermediate relays, and every replica converges to the same state.&lt;&#x2F;p&gt;
&lt;p&gt;The client owns its data. The server is optional. When the server exists, it persists operations and relays them. It does not arbitrate, transform, or validate beyond authentication.&lt;&#x2F;p&gt;
&lt;p&gt;This is not a niche pattern for collaborative text editors. Any application where users create and modify data can benefit. Notes, task managers, project planning tools, CMS authoring, form builders. The question is not “should this be local-first?” The question is “does this need a server, and if so, for what?”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-the-backend-becomes&quot;&gt;What the backend becomes&lt;&#x2F;h2&gt;
&lt;p&gt;If the edge handles infrastructure concerns and business logic that depends on request context, and the client handles computation, rendering, and local state, what is left for the backend?&lt;&#x2F;p&gt;
&lt;p&gt;A persistence layer.&lt;&#x2F;p&gt;
&lt;p&gt;The backend becomes the place where data rests between sessions and syncs between devices. Not an application server. A persistence layer.&lt;&#x2F;p&gt;
&lt;p&gt;Consider the spectrum of what “backend” looks like now:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Static sites.&lt;&#x2F;strong&gt; This site is an example. raskell.io is built with Zola. Markdown files compile to HTML at build time and deploy to edge CDN nodes. No application server. No database. No runtime process. The “backend” is a git repository and a CI pipeline. Content lives as files. Serving happens at the edge. The total monthly infrastructure cost is the price of a domain name.&lt;&#x2F;p&gt;
&lt;p&gt;This is not limited to blogs. Documentation sites, marketing pages, product landing pages, e-commerce storefronts with pre-rendered product pages. Any content that changes at author-time rather than request-time can be static. The headless CMS (Contentful, Sanity, Strapi, or just a git repo) publishes content. The static site generator builds HTML. The CDN serves it. The “backend” runs at build time, not at request time.&lt;&#x2F;p&gt;
&lt;p&gt;I take this further than most. All of my projects are CDN-first, even the ones with dedicated backends. The principle: if the backend goes down, the user should still see something useful. The static layer is the safety net.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;kurumi&quot;&gt;Kurumi&lt;&#x2F;a&gt;, a local-first second brain app, is the purest expression of this. It is a Progressive Web App served entirely from CDN edge nodes with Service Workers handling offline capability. There is no backend server. Notes sync between devices through CRDTs when connectivity exists, but the app works fully offline. The entire “infrastructure” is a static deployment and an optional sync relay.&lt;&#x2F;p&gt;
&lt;p&gt;But the CDN-first pattern also applies to applications that have real backends. Cyanea has a Phoenix&#x2F;Elixir backend that manages datasets, user accounts, and collaboration. But the public-facing surface, the landing pages, category pages, trending spaces and protocols and datasets, is statically generated. The backend exports JSON snapshots of its database objects on a timed interval. A static site generator picks up those snapshots and rebuilds the public pages: what labs are active, which protocols are trending, which datasets were recently published. The result is a set of HTML pages sitting on a CDN that stay current without depending on the backend being up at the moment a visitor arrives.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;┌──────────────────┐     JSON export      ┌──────────────┐
│  Cyanea Backend  │ ──── (interval) ────&amp;gt; │  Static Site  │
│  (Phoenix&amp;#x2F;BEAM)  │                       │  Generator    │
│                  │                       │  (Zola)       │
│  - datasets      │                       │               │
│  - protocols     │                       │  → CDN edge   │
│  - labs          │                       │    nodes      │
│  - spaces        │                       │               │
└──────────────────┘                       └──────────────┘
         │                                        │
         │ dynamic app                    static pages
         ▼                                        ▼
   app.cyanea.bio                          cyanea.bio
   (logged-in users,                  (public, always up,
    real-time features)                fast, no backend
                                       dependency)
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is a genuinely resilient architecture. The backend can be down for maintenance, mid-deploy, or experiencing load, and the public site keeps serving. The static pages are never stale by more than one generation interval. For a site where “trending this week” is sufficient freshness, that interval can be hours. The CDN handles traffic spikes that would overwhelm a backend. The backend handles the dynamic work that requires real-time data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Thin persistence APIs.&lt;&#x2F;strong&gt; For applications with dynamic data, the backend shrinks to a database with an API in front of it. Accept writes, serve reads, enforce schema constraints. GraphQL or REST over Postgres. No rendering. No business logic beyond data integrity. The API exists so that clients and edge workers have somewhere to store and retrieve state.&lt;&#x2F;p&gt;
&lt;p&gt;The interesting shift: even the persistence API is getting thinner. Services like Supabase, PlanetScale, and Turso expose the database directly over HTTP or WebSockets with built-in auth. Your “backend” becomes a hosted database with row-level security policies. No application code at all.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Sync relays.&lt;&#x2F;strong&gt; For local-first applications, the backend is even simpler. Accept CRDT operations from clients, persist them to durable storage, fan them out to other connected clients via WebSocket. No merge logic (the CRDT handles that). No transformation. No validation beyond authentication. The relay does not understand the data. It stores and forwards.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Event logs.&lt;&#x2F;strong&gt; Append-only storage. Clients sync by replaying events from their last known position. The log is the source of truth. Everything else (search indexes, analytics dashboards, recommendation models) is a materialized view built asynchronously. The hot path is the append. The read path is the replay. Both are simple.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Batch processors.&lt;&#x2F;strong&gt; The one place where traditional backend compute survives: jobs that require access to the full dataset. Analytics aggregation, report generation, search index building, ML model training. These run on schedules or triggers, not in the request path. They read from the event log or the database, compute, and write results back. The user never waits for them.&lt;&#x2F;p&gt;
&lt;p&gt;The common thread: the backend does not touch the hot path. User requests hit the edge and the client. The backend runs in the background, on its own schedule, when no one is waiting.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-architecture-that-makes-this-work&quot;&gt;The architecture that makes this work&lt;&#x2F;h2&gt;
&lt;p&gt;Pushing logic to the edge and the client is not free. Both environments have constraints, and ignoring them is how you build fragile systems.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;at-the-edge-bounded-resources&quot;&gt;At the edge: bounded resources&lt;&#x2F;h3&gt;
&lt;p&gt;Every operation at the edge needs explicit limits. No open-ended computations, no unbounded queues, no surprise behavior. This is not just good practice. It is existential. The edge proxy sits between the internet and your infrastructure. If it behaves unpredictably, everything behind it suffers.&lt;&#x2F;p&gt;
&lt;p&gt;Concretely:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Resource&lt;&#x2F;th&gt;&lt;th&gt;Bound&lt;&#x2F;th&gt;&lt;th&gt;Why&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Agent concurrency&lt;&#x2F;td&gt;&lt;td&gt;Per-agent semaphore (default: 100)&lt;&#x2F;td&gt;&lt;td&gt;Prevents noisy neighbor between agents&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Agent timeout&lt;&#x2F;td&gt;&lt;td&gt;100ms default&lt;&#x2F;td&gt;&lt;td&gt;Prevents latency cascade&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Connection pool&lt;&#x2F;td&gt;&lt;td&gt;Explicit max (default: 10K)&lt;&#x2F;td&gt;&lt;td&gt;Prevents file descriptor exhaustion&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Request body&lt;&#x2F;td&gt;&lt;td&gt;Streaming, not buffered&lt;&#x2F;td&gt;&lt;td&gt;Prevents memory exhaustion&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Route cache&lt;&#x2F;td&gt;&lt;td&gt;LRU with size limit&lt;&#x2F;td&gt;&lt;td&gt;Prevents unbounded growth&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Rate limit queues&lt;&#x2F;td&gt;&lt;td&gt;Bounded with max delay&lt;&#x2F;td&gt;&lt;td&gt;Prevents request pile-up&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;If you cannot articulate the bound for every resource your edge system uses, you do not have an architecture. You have an accident waiting for load.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;on-the-client-isolation-and-sandboxing&quot;&gt;On the client: isolation and sandboxing&lt;&#x2F;h3&gt;
&lt;p&gt;The client has different constraints. Battery life, memory pressure, the user closing the tab at any moment.&lt;&#x2F;p&gt;
&lt;p&gt;WASM runs in a sandbox. No file system access, no network access, no shared memory (unless explicitly granted). This is the security model that makes client-side compute viable. Untrusted code (your own, running on someone else’s device) cannot escape the sandbox.&lt;&#x2F;p&gt;
&lt;p&gt;Web Workers run in separate threads with message-passing. No shared mutable state. No locks. No data races. The isolation is enforced by the runtime, not by programmer discipline.&lt;&#x2F;p&gt;
&lt;p&gt;Service Workers have a lifecycle managed by the browser. They can be terminated at any time to save resources. Your offline logic must handle graceful shutdown. This means: durable state in IndexedDB, idempotent sync operations, no in-memory state that cannot be reconstructed.&lt;&#x2F;p&gt;
&lt;p&gt;CRDTs provide consistency guarantees without coordination. But they are not magic. They consume memory (tombstones for deleted items, version vectors for causal ordering). They need garbage collection. They need careful schema design because not every data model maps cleanly to CRDT primitives. A counter works. A last-writer-wins register works. A rich text document with formatting, comments, and embedded media requires careful thought.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-trust-boundary&quot;&gt;The trust boundary&lt;&#x2F;h3&gt;
&lt;p&gt;Here is the part most edge-computing articles skip: trust.&lt;&#x2F;p&gt;
&lt;p&gt;If the edge handles auth, the backend trusts the edge to have done auth correctly. If the client handles business logic, the server trusts the client to have computed correctly. These are real trust boundaries with real failure modes.&lt;&#x2F;p&gt;
&lt;p&gt;At the edge, trust is earned through:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Failure isolation.&lt;&#x2F;strong&gt; Agent crashes do not take down the proxy. Bad config is validated before activation.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Observability.&lt;&#x2F;strong&gt; Every decision is logged, metered, and traceable. If the WAF blocked a request, you can see exactly which rules fired and why.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Bounded behavior.&lt;&#x2F;strong&gt; No surprise modes. Every resource has explicit limits. Every failure mode is configured, not assumed.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;On the client, trust is conditional:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Never trust the client for security decisions.&lt;&#x2F;strong&gt; Validate at the edge or the backend. Client-side checks are UX, not security.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Trust the client for its own data.&lt;&#x2F;strong&gt; If the user is editing their own document, the client is authoritative. CRDTs handle consistency. The server persists, it does not arbitrate.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Verify at the boundary.&lt;&#x2F;strong&gt; When client data syncs to the server, validate schema and authorization. Trust the merge, verify the input.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;when-not-to-do-this&quot;&gt;When not to do this&lt;&#x2F;h2&gt;
&lt;p&gt;Not everything belongs at the edge or on the client. Here is what stays in the backend:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Multi-service transactions.&lt;&#x2F;strong&gt; If an operation needs to read from three databases, check inventory, charge a payment, and send a notification, that is a backend workflow. Distributed transactions need coordination, and coordination needs a central authority.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Heavy data joins.&lt;&#x2F;strong&gt; If your query joins six tables with complex filters and aggregations, it runs next to the database, not at an edge node 200ms away from the data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Regulatory requirements.&lt;&#x2F;strong&gt; Some industries mandate that data processing happens in specific locations, on specific infrastructure, with specific audit trails. Edge deployment may not satisfy these constraints.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Small teams with simple needs.&lt;&#x2F;strong&gt; If you have one backend, ten users, and no latency problems, this architecture is overhead. A Django app behind nginx is fine. Optimize when you have a reason to optimize, not before.&lt;&#x2F;p&gt;
&lt;p&gt;The edge handles cross-cutting concerns and request-context computation. The client handles local state and user-facing compute. The backend handles coordination, persistence, and anything that needs the full dataset. Know which is which.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-this-is-going&quot;&gt;Where this is going&lt;&#x2F;h2&gt;
&lt;p&gt;Five years ago, the stack was: browser (thin) renders server-generated HTML, backend (fat) runs everything, database stores state. The mental model was request&#x2F;response, and the backend was the center of gravity.&lt;&#x2F;p&gt;
&lt;p&gt;The stack now:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│ Client                                               │
│ WASM │ WebGPU │ Web Workers │ Service Workers │ CRDT │
│ (compute, render, offline, local state)              │
└────────────────────┬────────────────────────────────┘
                     │
┌────────────────────┴────────────────────────────────┐
│ Edge                                                 │
│ Proxy │ Workers │ Containers │ KV │ Durable Objects  │
│ (auth, WAF, routing, SSR, API aggregation, policy)   │
└────────────────────┬────────────────────────────────┘
                     │
┌────────────────────┴────────────────────────────────┐
│ Backend                                              │
│ Database │ Sync relay │ Event log │ Batch processing  │
│ (persistence, coordination, async compute)           │
└─────────────────────────────────────────────────────┘
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The client is fat. The edge is fat. The backend is thin. The center of gravity moved to both ends simultaneously.&lt;&#x2F;p&gt;
&lt;p&gt;Every year, this accelerates. Models get smaller and run on consumer GPUs. WASM runtimes get faster and gain more system APIs through WASI. Edge platforms add durable storage, queues, and cron triggers. CRDTs mature from academic curiosities to production libraries. SQLite-in-the-browser goes from experiment to default architecture for offline-capable apps.&lt;&#x2F;p&gt;
&lt;p&gt;The backend will not disappear. Data needs to live somewhere durable, and cross-device sync needs a relay. Coordination problems need a central authority. Batch processing needs access to the full dataset. But the backend’s role is narrowing to exactly these things. It is becoming infrastructure, not application. Plumbing, not logic.&lt;&#x2F;p&gt;
&lt;p&gt;I find myself building systems where the most interesting engineering happens at the boundaries. A reverse proxy (&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&quot;&gt;Zentinel&lt;&#x2F;a&gt;) that inspects 912K requests per second through 285 WAF rules, authenticates with sub-millisecond latency, and routes with crash-isolated agents. A bioinformatics platform (&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cyanea.bio&quot;&gt;Cyanea&lt;&#x2F;a&gt;) where the browser runs the computation and the backend exports JSON for statically generated pages. A distributed compute platform (&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&quot;&gt;Archipelag&lt;&#x2F;a&gt;) where users’ browsers are the compute fleet via WASM and WebGPU. A note-taking app (&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;kurumi&quot;&gt;Kurumi&lt;&#x2F;a&gt;) that works fully offline with CRDTs and never touches a server for reads. Between all of them, a database, a sync relay, or just a CDN. Necessary and boring.&lt;&#x2F;p&gt;
&lt;p&gt;The backend is not dead. It is just not where the interesting work happens anymore.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references-and-further-reading&quot;&gt;References and further reading&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;edge-platforms-and-proxies&quot;&gt;Edge platforms and proxies&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;workers.cloudflare.com&#x2F;&quot;&gt;Cloudflare Workers&lt;&#x2F;a&gt; - V8 isolate-based edge compute&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;deno.com&#x2F;deploy&quot;&gt;Deno Deploy&lt;&#x2F;a&gt; - Edge runtime built on the Deno JavaScript runtime&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.fastly.com&#x2F;products&#x2F;edge-compute&quot;&gt;Fastly Compute&lt;&#x2F;a&gt; - Wasm-based edge compute platform&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;vercel.com&#x2F;docs&#x2F;functions&#x2F;edge-functions&quot;&gt;Vercel Edge Functions&lt;&#x2F;a&gt; - Edge compute integrated with Next.js&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;fly.io&quot;&gt;fly.io&lt;&#x2F;a&gt; - Container-based edge deployment platform&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;cloudflare&#x2F;pingora&quot;&gt;Pingora&lt;&#x2F;a&gt; - Cloudflare’s Rust framework for programmable proxies&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&quot;&gt;Zentinel&lt;&#x2F;a&gt; - Security-first reverse proxy with crash-isolated agents&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;client-side-compute&quot;&gt;Client-side compute&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;webassembly.org&#x2F;&quot;&gt;WebAssembly&lt;&#x2F;a&gt; - Portable binary instruction format for the web&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;wasi.dev&#x2F;&quot;&gt;WASI&lt;&#x2F;a&gt; - WebAssembly System Interface for running Wasm outside the browser&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.w3.org&#x2F;TR&#x2F;webgpu&#x2F;&quot;&gt;WebGPU specification&lt;&#x2F;a&gt; - W3C standard for GPU compute in the browser&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;developers.google.com&#x2F;mediapipe&quot;&gt;MediaPipe&lt;&#x2F;a&gt; - ML inference framework running client-side via Wasm and WebGPU&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;sqlite.org&#x2F;wasm&quot;&gt;SQLite Wasm&lt;&#x2F;a&gt; - Official SQLite build targeting WebAssembly&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;sql.js.org&#x2F;&quot;&gt;sql.js&lt;&#x2F;a&gt; - SQLite compiled to Wasm via Emscripten&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;File_System_API&#x2F;Origin_private_file_system&quot;&gt;Origin Private File System&lt;&#x2F;a&gt; - MDN reference for persistent browser storage&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;Service_Worker_API&quot;&gt;Service Worker API&lt;&#x2F;a&gt; - MDN reference for offline-capable web apps&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;Web_Workers_API&quot;&gt;Web Workers API&lt;&#x2F;a&gt; - MDN reference for background threads in the browser&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;crdts-and-local-first&quot;&gt;CRDTs and local-first&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=x7drE24geUw&quot;&gt;CRDTs: The Hard Parts&lt;&#x2F;a&gt; - Martin Kleppmann’s talk on practical CRDT challenges&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.inkandswitch.com&#x2F;local-first&#x2F;&quot;&gt;Local-first software&lt;&#x2F;a&gt; - Ink and Switch research paper on local-first architectures&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;automerge.org&#x2F;&quot;&gt;Automerge&lt;&#x2F;a&gt; - CRDT library for collaborative applications&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;yjs.dev&#x2F;&quot;&gt;Yjs&lt;&#x2F;a&gt; - High-performance CRDT framework for the web&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hal.inria.fr&#x2F;inria-00555588&#x2F;document&quot;&gt;A comprehensive study of CRDTs&lt;&#x2F;a&gt; - Shapiro et al., the foundational survey paper&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;databases-at-the-edge&quot;&gt;Databases at the edge&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;turso.tech&#x2F;&quot;&gt;Turso&lt;&#x2F;a&gt; - SQLite-compatible database with edge replicas&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;neon.tech&#x2F;&quot;&gt;Neon&lt;&#x2F;a&gt; - Serverless PostgreSQL with branching&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;superfly&#x2F;litefs&quot;&gt;LiteFS&lt;&#x2F;a&gt; - Distributed SQLite replication by Fly.io&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.cockroachlabs.com&#x2F;&quot;&gt;CockroachDB&lt;&#x2F;a&gt; - Distributed SQL database designed for multi-region&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;supabase.com&#x2F;&quot;&gt;Supabase&lt;&#x2F;a&gt; - Open-source Firebase alternative built on PostgreSQL&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;projects-referenced&quot;&gt;Projects referenced&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cyanea.bio&quot;&gt;Cyanea&lt;&#x2F;a&gt; - Bioinformatics platform using Wasm and WebGPU for client-side compute&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&quot;&gt;Archipelag&lt;&#x2F;a&gt; - Distributed compute platform with browser-based Wasm nodes&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;kurumi&quot;&gt;Kurumi&lt;&#x2F;a&gt; - Local-first second brain app with CRDT sync&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;conflux&quot;&gt;Conflux&lt;&#x2F;a&gt; - Schema-aware CRDT engine for deterministic merge&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>Looking back on 2025</title>
          <pubDate>Wed, 31 Dec 2025 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/looking-back-on-2025/</link>
          <guid>https://raskell.io/articles/looking-back-on-2025/</guid>
          <description xml:base="https://raskell.io/articles/looking-back-on-2025/">&lt;p&gt;I spent part of this year on the shores of Okinawa. The water there is something else entirely, this impossible azure that shifts to turquoise in the shallows, so clear you can see the coral formations from the surface. I found myself thinking about systems while I was there, the way you do when you’re floating in salt water with nothing pressing to attend to.&lt;&#x2F;p&gt;
&lt;p&gt;Between swims, I read Tim Berners-Lee’s “This is for everyone.” I’ve been building web software for over a decade now, and I thought I understood what the web was. But reading TBL’s words while watching that reef ecosystem do its thing, thousands of species in constant exchange, no central coordinator, just emergent complexity from simple rules, something shifted in how I saw it all.&lt;&#x2F;p&gt;
&lt;p&gt;The web TBL imagined was supposed to work like that reef. A commons. Many small nodes, each doing their own thing, connected through open protocols. Information flowing freely. The beauty of it wasn’t in any single node but in the connections between them, the way the whole became more than the sum of its parts. The same principle that makes a reef resilient makes a network powerful: diversity, redundancy, local adaptation.&lt;&#x2F;p&gt;
&lt;p&gt;What we built instead looks more like industrial aquaculture. Five platforms. Algorithmic monoculture. Content optimized for engagement metrics rather than usefulness. We took a system designed for decentralization and built the most centralized information infrastructure in human history.&lt;&#x2F;p&gt;
&lt;p&gt;I keep thinking about how that happened. The web itself never changed. HTTP still works the same way, HTML still does what it always did. What changed was the economics. Publishing became free, but being &lt;em&gt;found&lt;&#x2F;em&gt; became expensive. The platforms positioned themselves as the gatekeepers of attention, and suddenly you couldn’t reach people without paying the toll, whether in ad spend or in algorithmic compliance or in the slow erosion of doing whatever it took to game SEO.&lt;&#x2F;p&gt;
&lt;p&gt;The thing about monocultures is they’re efficient right up until they’re not. A reef can lose a species and adapt. A monoculture gets one disease and collapses. We’ve been watching the web’s monoculture show stress fractures for years. The enshittification of platforms, the SEO content farms drowning out signal with noise, the way social media stopped being social and started being a feed of engagement-optimized content from strangers.&lt;&#x2F;p&gt;
&lt;p&gt;Then 2025 happened, and AI started breaking things in interesting ways.&lt;&#x2F;p&gt;
&lt;p&gt;The obvious take is that AI makes the content problem worse. And superficially, that’s true. If you thought SEO spam was bad before, wait until you see what happens when generating ten thousand pages of plausible-sounding garbage costs essentially nothing. The content farms went into overdrive. Social platforms filled with synthetic engagement.&lt;&#x2F;p&gt;
&lt;p&gt;But here’s the thing I keep coming back to: maybe that’s the fever that breaks the infection.&lt;&#x2F;p&gt;
&lt;p&gt;The old economics of the web depended on a particular scarcity. Human attention is finite, and the platforms controlled access to it. You wanted eyeballs, you played their game. SEO worked because Google was the gateway and you could optimize for what Google wanted. Platform distribution mattered because that’s where the people were.&lt;&#x2F;p&gt;
&lt;p&gt;AI disrupts this in ways that I think are genuinely interesting. When an AI assistant can synthesize information from across the web and deliver it directly to the user, the value of ranking first on Google diminishes. Why click through to a content farm when the answer is already in front of you? When AI agents can find and surface relevant content directly, you don’t need to be on the platform where the eyeballs gather. The middleman’s leverage starts to evaporate.&lt;&#x2F;p&gt;
&lt;p&gt;And crucially: when everyone can generate infinite content at zero marginal cost, content quantity becomes worthless. What matters is provenance. Accuracy. Usefulness. The things that are actually hard. The things that require a human perspective, or at least require &lt;em&gt;being right&lt;&#x2F;em&gt; in ways that matter.&lt;&#x2F;p&gt;
&lt;p&gt;I find myself unexpectedly optimistic about what comes next.&lt;&#x2F;p&gt;
&lt;p&gt;If AI breaks the distribution stranglehold that platforms have, the economics of the web could flip in interesting directions. The old model needed scale because reaching people was expensive. But if AI handles discovery, finding relevant content and bringing it to users, then maybe you don’t need scale anymore. Maybe small becomes viable again.&lt;&#x2F;p&gt;
&lt;p&gt;Think about what this means concretely. A static site costs nearly nothing to run. No databases to scale, no servers to babysit, just files sitting on edge nodes around the world. If you don’t need to capture user data for ad-driven personalization, you don’t need the complexity of the surveillance stack. If you don’t need platform distribution, you don’t need to play platform games.&lt;&#x2F;p&gt;
&lt;p&gt;There’s another piece to this that I think most people are missing: edge computing changes what personalization can mean. The conventional wisdom is that personalization requires surveillance, that you need to know everything about a user to show them relevant content. But that’s only true if personalization happens in a centralized database somewhere. If personalization happens at the edge, at the moment of request, you can adapt content to context without ever needing to know who the user is. The edge function doesn’t need a profile. It just needs to know what was asked for and what context it’s being asked in.&lt;&#x2F;p&gt;
&lt;p&gt;This is the architecture I keep thinking about: static content at the origin, edge functions that adapt it anonymously, AI agents that find and surface it based on actual relevance rather than SEO gaming. No surveillance required. No platform dependency. No scaling costs that force you into growth-at-all-costs mode.&lt;&#x2F;p&gt;
&lt;p&gt;It looks more like a reef than a fish farm.&lt;&#x2F;p&gt;
&lt;p&gt;I don’t want to oversell this. The transition, if it happens, won’t be clean. The platforms aren’t going to quietly cede control. The incentives that built the current web are still operating. And AI itself could go in directions that make things worse rather than better. There are plenty of dystopian paths from here.&lt;&#x2F;p&gt;
&lt;p&gt;But when I think about what I want to build toward, it’s that reef model. Many small, specialized nodes. Interconnected through open protocols. Resilient because distributed. Sustainable because the economics work at small scale.&lt;&#x2F;p&gt;
&lt;p&gt;This site is part of that bet. Static content, no tracking, no platform dependencies. The tools I’m working on (Zentinel, Sango, Ushio) are all about making edge infrastructure more accessible, making it easier to build and operate systems that are distributed and independent.&lt;&#x2F;p&gt;
&lt;p&gt;2025 was the year AI started breaking the old model. I don’t know exactly what grows in its place. But floating in that Okinawan water, watching the reef do what reefs do, I got a sense of what healthy systems look like. Diverse. Interconnected. Resilient. Not optimized for any single metric, but somehow working anyway.&lt;&#x2F;p&gt;
&lt;p&gt;That’s what I’m betting on.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references-and-further-reading&quot;&gt;References and further reading&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.w3.org&#x2F;People&#x2F;Berners-Lee&#x2F;&quot;&gt;Tim Berners-Lee&lt;&#x2F;a&gt; - Creator of the World Wide Web and author of “This is for everyone”&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Enshittification&quot;&gt;Enshittification&lt;&#x2F;a&gt; - Cory Doctorow’s term for the pattern of platform decay&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;indieweb.org&#x2F;&quot;&gt;IndieWeb&lt;&#x2F;a&gt; - Community building the independent web with open standards&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.getzola.org&#x2F;&quot;&gt;Zola&lt;&#x2F;a&gt; - Static site generator used to build this site&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&quot;&gt;Zentinel&lt;&#x2F;a&gt; - Security-first reverse proxy for the open web&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;sango&quot;&gt;Sango&lt;&#x2F;a&gt; - Edge diagnostics CLI for TLS, HTTP, and security header analysis&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;ushio&quot;&gt;Ushio&lt;&#x2F;a&gt; - Deterministic edge traffic replay tool&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>Mise ate my Makefile</title>
          <pubDate>Sun, 14 Dec 2025 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/mise-ate-my-makefile/</link>
          <guid>https://raskell.io/articles/mise-ate-my-makefile/</guid>
          <description xml:base="https://raskell.io/articles/mise-ate-my-makefile/">&lt;p&gt;I maintain around forty repositories across four GitHub organizations. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&quot;&gt;Zentinel&lt;&#x2F;a&gt; alone accounts for over thirty: the core proxy, a Rust SDK, and a growing collection of agents for WAF inspection, auth, rate limiting, GraphQL security, and a dozen other edge concerns. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;archipelag.io&quot;&gt;Archipelag&lt;&#x2F;a&gt; spans an Elixir coordinator, a Rust node agent, Python and TypeScript SDKs, mobile agents in Kotlin and Swift, and infrastructure-as-code repos. &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cyanea.bio&quot;&gt;Cyanea&lt;&#x2F;a&gt; is Elixir with Rust NIFs and a separate Rust bioinformatics library. Then there are the standalone tools: &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;conflux&quot;&gt;Conflux&lt;&#x2F;a&gt; (Rust CRDT engine), &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;sango&quot;&gt;Sango&lt;&#x2F;a&gt; (Rust edge diagnostics CLI), &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;shiioo&quot;&gt;Shiioo&lt;&#x2F;a&gt; (Rust agentic orchestrator), &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;vela&quot;&gt;Vela&lt;&#x2F;a&gt; (Rust bare-metal deployment), &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;refrakt&quot;&gt;Refrakt&lt;&#x2F;a&gt; (Gleam web framework), &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;kurumi&quot;&gt;Kurumi&lt;&#x2F;a&gt; (Svelte local-first app), and this site you are reading (Zola).&lt;&#x2F;p&gt;
&lt;p&gt;The languages span Rust, Elixir, Gleam, Python, TypeScript, Kotlin, Swift, and whatever shell scripts accumulated over the years. Every project needs a toolchain. Most need task automation. All of them need to be approachable for a contributor who clones the repo for the first time.&lt;&#x2F;p&gt;
&lt;p&gt;The Makefile approach was breaking down. So was everything else I tried.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-was-failing&quot;&gt;What was failing&lt;&#x2F;h2&gt;
&lt;p&gt;The standard setup for most of my Rust projects was a Makefile with targets for &lt;code&gt;build&lt;&#x2F;code&gt;, &lt;code&gt;test&lt;&#x2F;code&gt;, &lt;code&gt;clippy&lt;&#x2F;code&gt;, &lt;code&gt;fmt&lt;&#x2F;code&gt;, and &lt;code&gt;release&lt;&#x2F;code&gt;. Simple enough for one repo. The problem surfaces when you maintain thirty of them.&lt;&#x2F;p&gt;
&lt;p&gt;GNU Make and BSD Make disagree on syntax in ways that cause silent failures. A Makefile that works on my Linux CI runner breaks on a contributor’s macOS laptop because of a conditional or a shell invocation difference. The fix is always “use GNU make,” but that means documenting it, adding a check, and fielding issues from people who forget.&lt;&#x2F;p&gt;
&lt;p&gt;Worse, Makefiles cannot declare tool dependencies. A Rust project needs a specific Rust version, maybe &lt;code&gt;protoc&lt;&#x2F;code&gt; for gRPC, maybe &lt;code&gt;cargo-watch&lt;&#x2F;code&gt; for development convenience. The Makefile assumes these tools exist. When they do not, the developer gets a cryptic error five minutes into their first build.&lt;&#x2F;p&gt;
&lt;p&gt;So projects accumulated scaffolding:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;.rust-version
.tool-versions
Makefile
scripts&amp;#x2F;setup.sh
scripts&amp;#x2F;ci.sh
scripts&amp;#x2F;release.sh
.envrc
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Six files to express what amounts to: “this project uses Rust 1.83, needs protoc, and has five things you can run.” Multiply that by forty repos and you have a maintenance surface that nobody wants to touch. The &lt;code&gt;scripts&#x2F;&lt;&#x2F;code&gt; folder in particular had a way of growing silently. Someone adds a helper. Someone else copies it from another project with modifications. Six months later you have three slightly different versions of the same release script across three orgs.&lt;&#x2F;p&gt;
&lt;p&gt;The Elixir projects had it worse. Elixir needs Erlang&#x2F;OTP at a specific version, then Elixir itself at a matching version, then Node for asset compilation in Phoenix, then possibly Rust for NIFs (Cyanea compiles Rust bioinformatics code into the BEAM release). Four tool dependencies before you write a line of application code. &lt;code&gt;asdf&lt;&#x2F;code&gt; handled the version management, but slowly and without task automation, so you still needed a Makefile on top.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-not-nix&quot;&gt;Why not Nix&lt;&#x2F;h2&gt;
&lt;p&gt;I gave Nix a serious try. The promise is appealing: declare your entire development environment in a single file, get reproducible builds, never worry about system state. The Nix shell concept is genuinely elegant.&lt;&#x2F;p&gt;
&lt;p&gt;In practice, the cost was too high for my use case. Nix’s learning curve is steep even for experienced engineers. The language is its own thing. The documentation assumes you already understand the Nix store model. When something breaks, the error messages point at derivation hashes, not at the thing you actually did wrong.&lt;&#x2F;p&gt;
&lt;p&gt;The bigger issue was onboarding. If a contributor wants to fix a typo in a Zentinel agent’s README, asking them to install Nix and understand flakes is a non-starter. The tool that manages your development environment should not itself become a project you have to learn. Nix solves a harder problem than I have. I do not need bit-for-bit reproducible builds across machines. I need “install Rust 1.83 and run the tests.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-not-asdf&quot;&gt;Why not asdf&lt;&#x2F;h2&gt;
&lt;p&gt;asdf was my default for years. It handled the version management problem well enough. The plugin system meant I could manage Rust, Elixir, Erlang, Node, and Python versions with a single &lt;code&gt;.tool-versions&lt;&#x2F;code&gt; file.&lt;&#x2F;p&gt;
&lt;p&gt;Three things pushed me away.&lt;&#x2F;p&gt;
&lt;p&gt;First, speed. asdf is shell scripts. Every invocation pays the cost of sourcing plugins, resolving versions, and shimming binaries. On a fast machine you barely notice. On CI, where you run &lt;code&gt;asdf install&lt;&#x2F;code&gt; in a fresh environment, the overhead adds up. Mise is a compiled Rust binary. It is meaningfully faster at both installation and version resolution.&lt;&#x2F;p&gt;
&lt;p&gt;Second, no task automation. asdf manages tool versions. That is all it does. You still need Make or a scripts folder for project tasks. That means two tools, two configuration surfaces, two things to document.&lt;&#x2F;p&gt;
&lt;p&gt;Third, plugin quality varied. The core plugins for Node and Ruby were solid. Plugins for less mainstream tools could be stale, broken, or missing. Mise started as an asdf-compatible rewrite and inherited the plugin ecosystem, but its built-in backends for common tools (Rust, Node, Python, Go, Erlang, Elixir) are faster and more reliable than shelling out to plugins.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-mise-actually-does&quot;&gt;What mise actually does&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;mise.jdx.dev&#x2F;&quot;&gt;Mise&lt;&#x2F;a&gt; is a single Rust binary that combines tool version management and task running into one configuration file per project. It does asdf’s job and Make’s job in a single tool.&lt;&#x2F;p&gt;
&lt;p&gt;Here is this site’s configuration. The entire thing:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;toml&quot; class=&quot;language-toml &quot;&gt;&lt;code class=&quot;language-toml&quot; data-lang=&quot;toml&quot;&gt;# mise.toml (raskell.io)
[tools]
zola = &amp;quot;0.19&amp;quot;

[env]
_.file = &amp;quot;.env&amp;quot;

[tasks.serve]
description = &amp;quot;Start the Zola development server&amp;quot;
run = &amp;quot;zola serve&amp;quot;

[tasks.build]
description = &amp;quot;Build the site for production&amp;quot;
run = &amp;quot;zola build&amp;quot;

[tasks.check]
description = &amp;quot;Check the site for errors without building&amp;quot;
run = &amp;quot;zola check&amp;quot;

[tasks.new]
description = &amp;quot;Create a new article&amp;quot;
run = &amp;quot;&amp;quot;&amp;quot;
#!&amp;#x2F;usr&amp;#x2F;bin&amp;#x2F;env bash
if [ -z &amp;quot;$1&amp;quot; ]; then
  echo &amp;quot;Usage: mise run new &amp;lt;article-slug&amp;gt;&amp;quot;
  exit 1
fi
SLUG=&amp;quot;$1&amp;quot;
DATE=$(date +%Y-%m-%d)
FILE=&amp;quot;content&amp;#x2F;articles&amp;#x2F;${SLUG}.md&amp;quot;
cat &amp;gt; &amp;quot;$FILE&amp;quot; &amp;lt;&amp;lt; ARTICLE
+++
title = &amp;quot;&amp;quot;
date = ${DATE}
description = &amp;quot;&amp;quot;
[taxonomies]
tags = []
categories = []
[extra]
author = &amp;quot;Raffael&amp;quot;
+++
ARTICLE
echo &amp;quot;Created $FILE&amp;quot;
&amp;quot;&amp;quot;&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;One file. Declares the tool (Zola 0.19), loads environment variables, and defines every task a contributor needs. &lt;code&gt;mise install&lt;&#x2F;code&gt; sets up the toolchain. &lt;code&gt;mise tasks&lt;&#x2F;code&gt; shows what is available. &lt;code&gt;mise run serve&lt;&#x2F;code&gt; starts the dev server. No Makefile. No scripts folder. No documentation page explaining how to get Zola at the right version.&lt;&#x2F;p&gt;
&lt;p&gt;For a Rust project like &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;shiioo&quot;&gt;Shiioo&lt;&#x2F;a&gt; (the agentic orchestrator), the configuration is larger but follows the same pattern:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;toml&quot; class=&quot;language-toml &quot;&gt;&lt;code class=&quot;language-toml&quot; data-lang=&quot;toml&quot;&gt;# .mise.toml (shiioo)
[tools]
rust = &amp;quot;latest&amp;quot;

[env]
RUST_LOG = &amp;quot;info&amp;quot;
RUST_BACKTRACE = &amp;quot;1&amp;quot;
_.path = [&amp;quot;.&amp;#x2F;target&amp;#x2F;release&amp;quot;, &amp;quot;.&amp;#x2F;target&amp;#x2F;debug&amp;quot;]

[tasks.build]
description = &amp;quot;Build all crates in release mode&amp;quot;
run = &amp;quot;cargo build --release&amp;quot;

[tasks.test]
description = &amp;quot;Run all tests&amp;quot;
run = &amp;quot;cargo test&amp;quot;

[tasks.clippy]
description = &amp;quot;Run clippy lints&amp;quot;
run = &amp;quot;cargo clippy --all-targets -- -D warnings&amp;quot;

[tasks.fmt]
description = &amp;quot;Format code with rustfmt&amp;quot;
run = &amp;quot;cargo fmt --all&amp;quot;

[tasks.ci]
description = &amp;quot;CI pipeline: format check, clippy, test&amp;quot;
depends = [&amp;quot;fmt-check&amp;quot;, &amp;quot;clippy&amp;quot;, &amp;quot;test&amp;quot;]

[tasks.dev]
description = &amp;quot;Full development build and run&amp;quot;
depends = [&amp;quot;fmt&amp;quot;, &amp;quot;check&amp;quot;, &amp;quot;test&amp;quot;]
run = &amp;quot;cargo run -p shiioo-server&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;depends&lt;&#x2F;code&gt; key is where mise replaces the one thing Make was genuinely good at: task dependency ordering. &lt;code&gt;mise run ci&lt;&#x2F;code&gt; runs format checking, then clippy, then tests, in sequence. If clippy fails, tests do not run. It is not as expressive as Make’s file-based dependency graph, but for project automation tasks (as opposed to build tasks, which cargo or mix handle), it covers what I actually need.&lt;&#x2F;p&gt;
&lt;p&gt;For a multi-language project like Cyanea, the value is even clearer. The Elixir app needs Erlang, Elixir, Node, and Rust. One &lt;code&gt;[tools]&lt;&#x2F;code&gt; section pins all four. One &lt;code&gt;mise install&lt;&#x2F;code&gt; gets a contributor from zero to a working environment. Without mise, that setup involved installing asdf, adding four plugins, running &lt;code&gt;asdf install&lt;&#x2F;code&gt;, then installing direnv for environment variables, then reading the Makefile to figure out how to run things. With mise, it is two commands: &lt;code&gt;mise install&lt;&#x2F;code&gt; and &lt;code&gt;mise run dev&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-cross-project-pattern&quot;&gt;The cross-project pattern&lt;&#x2F;h2&gt;
&lt;p&gt;The real payoff is not in any single project. It is the consistency across all of them.&lt;&#x2F;p&gt;
&lt;p&gt;Every repo in every org follows the same contract:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Clone the repo&lt;&#x2F;li&gt;
&lt;li&gt;Run &lt;code&gt;mise install&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Run &lt;code&gt;mise tasks&lt;&#x2F;code&gt; to see what is available&lt;&#x2F;li&gt;
&lt;li&gt;Run &lt;code&gt;mise run dev&lt;&#x2F;code&gt; or &lt;code&gt;mise run test&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That is it. Whether the project is a Rust reverse proxy with thirty modules, an Elixir Phoenix application with LiveView and a NATS integration, a Gleam web framework, or a static site built with Zola, the entry point is identical. The person cloning the repo does not need to know which build system the project uses internally. They do not need to read a CONTRIBUTING.md to find out whether it is &lt;code&gt;make test&lt;&#x2F;code&gt; or &lt;code&gt;cargo test&lt;&#x2F;code&gt; or &lt;code&gt;mix test&lt;&#x2F;code&gt;. It is always &lt;code&gt;mise run test&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This matters more than it sounds. When you maintain projects across four orgs and multiple languages, the cognitive overhead per context switch is the actual bottleneck. I work on Zentinel (Rust) in the morning, switch to Archipelag (Elixir) after lunch, then fix something on this site (Zola) in the evening. Without a consistent interface, each switch means recalling which project uses which conventions. With mise, the interface is always the same. The implementation behind &lt;code&gt;mise run test&lt;&#x2F;code&gt; differs (cargo, mix, zola check), but I do not care about that. I type the same command and the right thing happens.&lt;&#x2F;p&gt;
&lt;p&gt;For new contributors, the effect is more pronounced. Zentinel’s agent ecosystem has over twenty Rust repos. A contributor who submits a PR to the WAF agent and then wants to help with the auth agent does not need to learn a new setup process. Same structure, same task names, same workflow. The consistency compounds.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-mise-handles-that-make-does-not&quot;&gt;What mise handles that Make does not&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Environment variables.&lt;&#x2F;strong&gt; Mise loads environment from the config file or from &lt;code&gt;.env&lt;&#x2F;code&gt; files, scoped to the project directory. When I &lt;code&gt;cd&lt;&#x2F;code&gt; into a project, the right environment is active. When I leave, it deactivates. No direnv, no &lt;code&gt;.envrc&lt;&#x2F;code&gt;, no &lt;code&gt;source .env&lt;&#x2F;code&gt; in every shell session.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Tool installation.&lt;&#x2F;strong&gt; &lt;code&gt;mise install&lt;&#x2F;code&gt; in a fresh clone gets every tool the project needs at the exact specified version. Make cannot do this. Make assumes the tools exist. That assumption breaks on new machines, in CI, and for every new contributor.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Task discovery.&lt;&#x2F;strong&gt; &lt;code&gt;mise tasks&lt;&#x2F;code&gt; lists every available task with its description. Make has &lt;code&gt;make help&lt;&#x2F;code&gt; patterns, but those are conventions, not built-in features. With mise, discoverability is the default.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;File-based tasks.&lt;&#x2F;strong&gt; Any executable file in &lt;code&gt;.mise&#x2F;tasks&#x2F;&lt;&#x2F;code&gt; becomes a task automatically. No registration, no config entry needed. For tasks that outgrow a one-liner in TOML but do not warrant a standalone script in &lt;code&gt;scripts&#x2F;&lt;&#x2F;code&gt;, this is the right middle ground. The task is discoverable through &lt;code&gt;mise tasks&lt;&#x2F;code&gt; but lives as a normal shell script you can test independently.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-breaks&quot;&gt;What breaks&lt;&#x2F;h2&gt;
&lt;p&gt;Mise is not perfect. Honest assessment after running it across forty repos:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Dynamic dependencies.&lt;&#x2F;strong&gt; Make can express “rebuild this if that file changed.” Mise tasks are imperative: they run or they do not. If you need file-level dependency tracking, you still need a build system (cargo, mix, webpack). Mise orchestrates tasks. It does not replace the build tool.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Ecosystem maturity.&lt;&#x2F;strong&gt; Mise is younger than Make and asdf. The documentation is good but not exhaustive. Some features (like hooks and watch mode) are recent additions. The pace of development is fast, which means features arrive quickly but occasionally change between minor versions.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Team familiarity.&lt;&#x2F;strong&gt; Make is universal. Every engineer has encountered a Makefile. Mise is still relatively unknown. Introducing it to a team requires a short pitch, but the pitch is easy: “it is Make plus asdf in one tool, configured in TOML.”&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Complex shell tasks.&lt;&#x2F;strong&gt; When a task grows beyond a few lines, the inline TOML string syntax gets awkward. The workaround is file-based tasks in &lt;code&gt;.mise&#x2F;tasks&#x2F;&lt;&#x2F;code&gt;, which works well but means the task definition lives in two places (TOML for metadata and task list, shell file for implementation).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-migration&quot;&gt;The migration&lt;&#x2F;h2&gt;
&lt;p&gt;If you are moving an existing project, here is the approach I settled on after migrating across all four orgs:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Add a &lt;code&gt;mise.toml&lt;&#x2F;code&gt; (or &lt;code&gt;.mise.toml&lt;&#x2F;code&gt;) at the project root. Start with just &lt;code&gt;[tools]&lt;&#x2F;code&gt; to declare the required versions.&lt;&#x2F;li&gt;
&lt;li&gt;Move the most-used Make targets to &lt;code&gt;[tasks]&lt;&#x2F;code&gt; one at a time. Keep the Makefile around until everything is ported.&lt;&#x2F;li&gt;
&lt;li&gt;Add &lt;code&gt;[env]&lt;&#x2F;code&gt; entries to replace &lt;code&gt;.envrc&lt;&#x2F;code&gt; or &lt;code&gt;.env.example&lt;&#x2F;code&gt; files.&lt;&#x2F;li&gt;
&lt;li&gt;Move standalone scripts from &lt;code&gt;scripts&#x2F;&lt;&#x2F;code&gt; to &lt;code&gt;.mise&#x2F;tasks&#x2F;&lt;&#x2F;code&gt; as file-based tasks.&lt;&#x2F;li&gt;
&lt;li&gt;Delete the Makefile last.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Do not try to migrate everything at once. Start with the three tasks developers use daily (usually &lt;code&gt;dev&lt;&#x2F;code&gt;, &lt;code&gt;test&lt;&#x2F;code&gt;, and &lt;code&gt;build&lt;&#x2F;code&gt;). The rest can move incrementally. I also settled on a few naming conventions that help across projects: use clear verb-noun prefixes like &lt;code&gt;db-reset&lt;&#x2F;code&gt;, &lt;code&gt;cache-clear&lt;&#x2F;code&gt;, &lt;code&gt;test-unit&lt;&#x2F;code&gt;. Consistent naming makes task discovery predictable even before you run &lt;code&gt;mise tasks&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-bottom-line&quot;&gt;The bottom line&lt;&#x2F;h2&gt;
&lt;p&gt;Mise is not a revolutionary tool. It does not do anything that was previously impossible. You could always install the right Rust version, write a Makefile, set up direnv, and maintain a scripts folder. What mise does is collapse all of that into a single file that is readable, portable, and consistent.&lt;&#x2F;p&gt;
&lt;p&gt;The compound effect is what matters. Forty repositories, four organizations, six languages, one pattern. Clone, install, run. No guessing which build system this particular project uses. No debugging a Makefile that works on Linux but breaks on macOS. No explaining to a contributor that they need asdf plus three plugins plus direnv plus GNU make before they can run the tests.&lt;&#x2F;p&gt;
&lt;p&gt;Every new project starts with a &lt;code&gt;mise.toml&lt;&#x2F;code&gt;. Setup takes two commands instead of a page of instructions. Contributors do not message me asking how to run things. They run &lt;code&gt;mise tasks&lt;&#x2F;code&gt; and figure it out.&lt;&#x2F;p&gt;
&lt;p&gt;That is the tool working.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references-and-further-reading&quot;&gt;References and further reading&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;mise.jdx.dev&#x2F;&quot;&gt;mise&lt;&#x2F;a&gt; - Official documentation and installation guide&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;jdx&#x2F;mise&quot;&gt;mise source code&lt;&#x2F;a&gt; - GitHub repository and issue tracker&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;asdf-vm.com&#x2F;&quot;&gt;asdf&lt;&#x2F;a&gt; - The version manager mise was originally inspired by&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nixos.org&#x2F;&quot;&gt;Nix&lt;&#x2F;a&gt; - Reproducible builds and development environments&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.gnu.org&#x2F;software&#x2F;make&#x2F;&quot;&gt;GNU Make&lt;&#x2F;a&gt; - The build tool mise replaces for task automation&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;toml.io&#x2F;&quot;&gt;TOML specification&lt;&#x2F;a&gt; - The configuration format mise uses&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;direnv.net&#x2F;&quot;&gt;direnv&lt;&#x2F;a&gt; - Environment variable manager that mise’s &lt;code&gt;[env]&lt;&#x2F;code&gt; section replaces&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;shiioo&quot;&gt;Shiioo&lt;&#x2F;a&gt; - Real-world mise configuration referenced in this article&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;mise-hx&quot;&gt;mise-hx&lt;&#x2F;a&gt; - Example of a custom mise plugin (for the hx Haskell toolchain)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>Disk space maintenance on Void Linux</title>
          <pubDate>Wed, 01 May 2024 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/disk-space-void-linux-maintenance/</link>
          <guid>https://raskell.io/articles/disk-space-void-linux-maintenance/</guid>
          <description xml:base="https://raskell.io/articles/disk-space-void-linux-maintenance/">&lt;h2 id=&quot;monday-morning-surprise&quot;&gt;Monday morning surprise&lt;&#x2F;h2&gt;
&lt;p&gt;As I spent most time doing stuff with my computer rather than configuring my beloved Linux distribution, Void Linux, I have developed the tendency to not really bother about Void at all until something crucial becomes unusable. After almost two years of having switched from Arch to Void, I have actually never encountered any major problem and felt I had made the right decision.&lt;&#x2F;p&gt;
&lt;p&gt;I checked my disk usage out of curiosity if the 250GB solid-state disk would be enough. And there came the surprise:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;$ df -H
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        8.4G     0  8.4G   0% &amp;#x2F;dev
tmpfs           8.4G  1.9M  8.4G   1% &amp;#x2F;dev&amp;#x2F;shm
tmpfs           8.4G  1.4M  8.4G   1% &amp;#x2F;run
&amp;#x2F;dev&amp;#x2F;nvme0n1p3  138G  117G   21G  85% &amp;#x2F;
efivarfs        158k   85k   69k  56% &amp;#x2F;sys&amp;#x2F;firmware&amp;#x2F;efi&amp;#x2F;efivars
cgroup          8.4G     0  8.4G   0% &amp;#x2F;sys&amp;#x2F;fs&amp;#x2F;cgroup
&amp;#x2F;dev&amp;#x2F;nvme0n1p4  366G   34G  332G  10% &amp;#x2F;home
&amp;#x2F;dev&amp;#x2F;nvme0n1p1  536M  152k  536M   1% &amp;#x2F;boot&amp;#x2F;efi
tmpfs           8.4G   25k  8.4G   1% &amp;#x2F;tmp
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;My root partition was full, way too full in my opinion. Did I miss something? Is Void not what I was looking for after all? I don’t enjoy baby sitting my OS &lt;em&gt;du jour&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-painless-solution&quot;&gt;The painless solution&lt;&#x2F;h2&gt;
&lt;p&gt;After a quick Brave search, I ended up finding what I was looking for. Some kind fellow software engineer from China didn’t shy away to make a blog post about his journey when he faced the very same problem. Out of annoyance of having to deal with that, I copy-pasted as quickly as possible, not minding what kind of side-effects I might run into, these three commands.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;1-cleaning-the-package-cache&quot;&gt;1. Cleaning the package cache&lt;&#x2F;h3&gt;
&lt;p&gt;All the knowledge I was lacking was to be found with the man page of &lt;code&gt;xbps-remove&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;# xbps-remove -yO
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;man.voidlinux.org&#x2F;xbps-remove.1#O,&quot;&gt;man page&lt;&#x2F;a&gt; of &lt;code&gt;xbps-remove&lt;&#x2F;code&gt; tells us the &lt;code&gt;-O&lt;&#x2F;code&gt; parameter takes care of &lt;em&gt;cleaning the cache directory removing obsolete binary packages.&lt;&#x2F;em&gt; Obsolete binary packages? Good riddance! I was surprised this to learn that solely this step freed up almost half of my used root partition disk space.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;2-removing-orphaned-packages&quot;&gt;2. Removing orphaned packages&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;# xbps-remove -yo
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Here the same &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;man.voidlinux.org&#x2F;xbps-remove.1#o,&quot;&gt;man page&lt;&#x2F;a&gt; tells us that the &lt;code&gt;-o&lt;&#x2F;code&gt; parameter takes care of &lt;em&gt;removing installed package orphans that were installed automatically (as dependencies) and are not currently dependencies of any installed package.&lt;&#x2F;em&gt; As before, good riddance!&lt;&#x2F;p&gt;
&lt;h3 id=&quot;3-purging-old-unused-kernels&quot;&gt;3. Purging old, unused kernels&lt;&#x2F;h3&gt;
&lt;p&gt;This one is interesting. While I knew about the circumstance that the people behind Void had developed their own package management ecosystem, I hadn’t fully realized there were other utilities that came along with the upstream Void installation which were there for me to manage my beloved OS. So, apparently, one of these is a &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;void-linux&#x2F;void-packages&#x2F;blob&#x2F;master&#x2F;srcpkgs&#x2F;base-files&#x2F;files&#x2F;vkpurge&quot;&gt;shell script&lt;&#x2F;a&gt; name &lt;code&gt;vkpurge&lt;&#x2F;code&gt;, I must assume as a short name for &lt;code&gt;Void&#x27;s Kernel purging&lt;&#x2F;code&gt; tool. I like this type of naming heavily implying its functionality.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;# vkpurge rm all
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It performed as expected. Old kernel files (and modules?) were indeed purged and freed up even more disk space. I should add that this step is optional as it is always useful to have some old kernels at hand when things hit the fan (which for me, they haven’t in a very, very long time).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;result&quot;&gt;Result&lt;&#x2F;h2&gt;
&lt;p&gt;I couldn’t be happier.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;shell&quot; class=&quot;language-shell &quot;&gt;&lt;code class=&quot;language-shell&quot; data-lang=&quot;shell&quot;&gt;$ df -H
Filesystem      Size  Used Avail Use% Mounted on
...
&amp;#x2F;dev&amp;#x2F;nvme0n1p3  138G   45G   93G  33% &amp;#x2F;
...
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;renewal-of-faith&quot;&gt;Renewal of faith&lt;&#x2F;h2&gt;
&lt;p&gt;Overall, why am I even writing this if some other fellow engineer already figured this out? Simply, because I would therefore be able to explain why I have enjoyed my journey with Void as my go-to Linux distribution. It keeps things simple. Some well-documented utilities. As simple that a simple Brave search suffices to find the answer to my problems.&lt;&#x2F;p&gt;
&lt;p&gt;This very aspect of Void is worthwhile highlighing. I remember more arcane Linux distributions that had me in their grip in figuring things out. Many Googles searches were necessary and even more trial and errors attempts to get simple things fixed.&lt;&#x2F;p&gt;
&lt;p&gt;Now back to my Monday morning.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references-and-further-reading&quot;&gt;References and further reading&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;voidlinux.org&#x2F;&quot;&gt;Void Linux&lt;&#x2F;a&gt; - Official project site&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.voidlinux.org&#x2F;xbps&#x2F;index.html&quot;&gt;Void Linux Handbook: XBPS&lt;&#x2F;a&gt; - Official documentation for the XBPS package manager&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;man.voidlinux.org&#x2F;xbps-remove.1&quot;&gt;xbps-remove(1) man page&lt;&#x2F;a&gt; - Manual page for package removal and cache cleaning&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;void-linux&#x2F;void-packages&#x2F;blob&#x2F;master&#x2F;srcpkgs&#x2F;base-files&#x2F;files&#x2F;vkpurge&quot;&gt;vkpurge source&lt;&#x2F;a&gt; - Shell script for purging old kernels on Void Linux&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.voidlinux.org&#x2F;about&#x2F;faq.html&quot;&gt;Void Linux FAQ&lt;&#x2F;a&gt; - Common questions about running and maintaining Void&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;hr &#x2F;&gt;
&lt;div class=&quot;footnote-definition&quot; id=&quot;1&quot;&gt;&lt;sup class=&quot;footnote-definition-label&quot;&gt;1&lt;&#x2F;sup&gt;
&lt;p&gt;Painting in header image is “Seaside” by Aleksandr Deyneka&lt;&#x2F;p&gt;
&lt;&#x2F;div&gt;
</description>
      </item>
      <item>
          <title>All beginning is Haskell</title>
          <pubDate>Mon, 06 Mar 2023 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/all-beginning-is-haskell/</link>
          <guid>https://raskell.io/articles/all-beginning-is-haskell/</guid>
          <description xml:base="https://raskell.io/articles/all-beginning-is-haskell/">&lt;p&gt;This site is called raskell.io. That is not an accident.&lt;&#x2F;p&gt;
&lt;p&gt;I started learning Haskell because I liked mathematics and someone told me there was a programming language built on top of it. Not “inspired by” in the loose way that every language claims some mathematical foundation. Actually built on lambda calculus, category theory, and type theory, in a way where the math is not decoration but structure.&lt;&#x2F;p&gt;
&lt;p&gt;What I did not expect was how thoroughly it would rewire the way I think about building software. Not because Haskell is the best language for every task. It is not, and I write far more Rust than Haskell these days. But because Haskell teaches you to think about programs in a way that makes you better at everything else.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-haskell-actually-teaches-you&quot;&gt;What Haskell actually teaches you&lt;&#x2F;h2&gt;
&lt;p&gt;Most introductions to Haskell talk about pure functions, immutability, and monads. They are not wrong, but they miss the point. The point is not any single feature. It is how those features combine into a way of thinking about programs as compositions of well-typed transformations.&lt;&#x2F;p&gt;
&lt;p&gt;In an imperative language, you think about sequences of steps. Do this, then that, then check a condition, then loop. The program is a recipe. In Haskell, you think about transformations. What goes in, what comes out, what shape does the data have at each stage. The program is a pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;This sounds abstract until you see it in practice. Suppose you need to process a list of user records: filter out inactive users, extract their email addresses, and normalize them to lowercase.&lt;&#x2F;p&gt;
&lt;p&gt;In an imperative style, you write a loop with conditions and mutations. In Haskell:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;haskell&quot; class=&quot;language-haskell &quot;&gt;&lt;code class=&quot;language-haskell&quot; data-lang=&quot;haskell&quot;&gt;activeEmails :: [User] -&amp;gt; [Email]
activeEmails = map (normalize . email) . filter isActive
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;One line. Read it right to left: filter active users, then map over the result, extracting and normalizing emails. The type signature tells you what goes in (&lt;code&gt;[User]&lt;&#x2F;code&gt;) and what comes out (&lt;code&gt;[Email]&lt;&#x2F;code&gt;). No mutation. No intermediate variables. No place for off-by-one errors or null pointer exceptions.&lt;&#x2F;p&gt;
&lt;p&gt;The type signature is not just documentation. It is a contract enforced by the compiler. If &lt;code&gt;isActive&lt;&#x2F;code&gt; expects a &lt;code&gt;User&lt;&#x2F;code&gt; and you pass it a &lt;code&gt;String&lt;&#x2F;code&gt;, the program will not compile. If &lt;code&gt;normalize&lt;&#x2F;code&gt; returns an &lt;code&gt;Email&lt;&#x2F;code&gt; but you try to use it as a &lt;code&gt;String&lt;&#x2F;code&gt;, the program will not compile. The compiler is your first reviewer, and it is tireless.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;types-as-design-tools&quot;&gt;Types as design tools&lt;&#x2F;h2&gt;
&lt;p&gt;The deeper lesson is that types are not just error catchers. They are design tools.&lt;&#x2F;p&gt;
&lt;p&gt;When I design a system in Haskell, I start with the types. What are the entities? What are the relationships? What transformations are valid? The type system forces you to be precise about these questions before you write any logic. This precision surfaces design problems early, when they are cheap to fix.&lt;&#x2F;p&gt;
&lt;p&gt;Consider modeling a document that can be in one of several states:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;haskell&quot; class=&quot;language-haskell &quot;&gt;&lt;code class=&quot;language-haskell&quot; data-lang=&quot;haskell&quot;&gt;data Document
  = Draft { content :: Text, author :: UserId }
  | UnderReview { content :: Text, author :: UserId, reviewer :: UserId }
  | Published { content :: Text, author :: UserId, publishedAt :: UTCTime }
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is an algebraic data type. Each variant carries exactly the data that makes sense for that state. A &lt;code&gt;Draft&lt;&#x2F;code&gt; has no reviewer. A &lt;code&gt;Published&lt;&#x2F;code&gt; document has a timestamp. You cannot accidentally access a reviewer on a draft because the type system will not let you. The invalid state is unrepresentable.&lt;&#x2F;p&gt;
&lt;p&gt;This pattern, making illegal states unrepresentable, is perhaps the most valuable idea I took from Haskell. I use it in Rust constantly. Rust’s &lt;code&gt;enum&lt;&#x2F;code&gt; with associated data is directly descended from Haskell’s algebraic data types, and the same design principle applies: encode your invariants in the type system and let the compiler enforce them.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-monad-is-not-the-point&quot;&gt;The monad is not the point&lt;&#x2F;h2&gt;
&lt;p&gt;Every Haskell introduction eventually gets to monads, usually with a metaphor involving burritos or boxes. I will skip the metaphor.&lt;&#x2F;p&gt;
&lt;p&gt;A monad is a pattern for sequencing computations that carry some context. The &lt;code&gt;IO&lt;&#x2F;code&gt; monad carries the context of interacting with the outside world. The &lt;code&gt;Maybe&lt;&#x2F;code&gt; monad carries the context of possible failure. The &lt;code&gt;State&lt;&#x2F;code&gt; monad carries the context of mutable state. The pattern is the same in each case: take a value in a context, apply a function that produces a new value in a context, get back a combined context.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;haskell&quot; class=&quot;language-haskell &quot;&gt;&lt;code class=&quot;language-haskell&quot; data-lang=&quot;haskell&quot;&gt;lookupUser :: UserId -&amp;gt; IO (Maybe User)
lookupUser uid = do
  conn &amp;lt;- getConnection
  result &amp;lt;- query conn &amp;quot;SELECT * FROM users WHERE id = ?&amp;quot; [uid]
  return (listToMaybe result)
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;IO&lt;&#x2F;code&gt; monad here sequences database operations. The &lt;code&gt;Maybe&lt;&#x2F;code&gt; handles the case where no user is found. The types tell you both things at a glance: this function does I&#x2F;O and might not return a result.&lt;&#x2F;p&gt;
&lt;p&gt;The point of monads is not that they are clever. The point is that they make effects explicit and composable. In most languages, a function can do I&#x2F;O, throw exceptions, mutate global state, or launch missiles, and you cannot tell from its signature. In Haskell, the type signature tells you exactly what effects a function can have. &lt;code&gt;Int -&amp;gt; Int&lt;&#x2F;code&gt; is pure. &lt;code&gt;Int -&amp;gt; IO Int&lt;&#x2F;code&gt; does I&#x2F;O. &lt;code&gt;Int -&amp;gt; Maybe Int&lt;&#x2F;code&gt; can fail. The information is right there, enforced by the compiler.&lt;&#x2F;p&gt;
&lt;p&gt;This discipline, making effects explicit, changed how I design APIs even in languages that do not enforce it. When I write a Rust function that returns &lt;code&gt;Result&amp;lt;T, E&amp;gt;&lt;&#x2F;code&gt;, I am using the same pattern: making failure explicit in the type rather than hiding it behind an exception. Rust learned this from Haskell, and so did I.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-i-am-still-building-haskell-tooling&quot;&gt;Why I am still building Haskell tooling&lt;&#x2F;h2&gt;
&lt;p&gt;If Haskell taught me so much, why do I mostly write Rust?&lt;&#x2F;p&gt;
&lt;p&gt;The honest answer: Haskell’s ecosystem has gaps. The language itself is excellent. GHC is one of the most sophisticated compilers ever built. The type system is unmatched in its expressiveness among production languages. But the surrounding infrastructure, the package management, the build tooling, the deployment story, has not kept pace.&lt;&#x2F;p&gt;
&lt;p&gt;Dependency management in Haskell is fragmented. Cabal and Stack coexist with overlapping but incompatible approaches. Build times are long. Cross-compilation is painful. Setting up a Haskell development environment from scratch still involves more friction than it should in 2026.&lt;&#x2F;p&gt;
&lt;p&gt;This is why hx exists. hx is a Haskell toolchain CLI that I am building in Rust. The choice of implementation language is deliberate. Haskell’s tooling problems are partly caused by tooling that is itself written in Haskell, creating bootstrap problems and long compile times for the tools themselves. A Rust binary starts instantly, compiles to a single static executable, and cross-compiles trivially. The tool should not have the same dependencies as the thing it manages.&lt;&#x2F;p&gt;
&lt;p&gt;hx is distributed through &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;raskell.io&#x2F;articles&#x2F;mise-ate-my-makefile&#x2F;&quot;&gt;mise&lt;&#x2F;a&gt; (naturally), as well as through AUR, Homebrew, Scoop, and Chocolatey. The goal is that setting up a Haskell project should be as frictionless as setting up a Rust project: one command to install the toolchain, one command to build.&lt;&#x2F;p&gt;
&lt;p&gt;On the other end of the spectrum, bhc (the Basel Haskell Compiler) is an experiment in taking Haskell in a direction GHC was never designed for: compiling Haskell for low-latency runtimes without a garbage collector. The target is workloads like tensor pipelines and real-time systems where GC pauses are not acceptable. bhc is early and ambitious, but it comes from the same conviction: Haskell’s ideas deserve better infrastructure than they currently have.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-haskell-in-my-rust&quot;&gt;The Haskell in my Rust&lt;&#x2F;h2&gt;
&lt;p&gt;I write Rust the way Haskell taught me to think.&lt;&#x2F;p&gt;
&lt;p&gt;Rust’s ownership model is not the same as Haskell’s purity, but it serves a similar purpose: it forces you to think about data flow explicitly. In Haskell, you cannot mutate a value because the language will not let you. In Rust, you can mutate, but the borrow checker forces you to be explicit about who owns the data and who can see it. Both languages make you think before you write.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;conflux&quot;&gt;Conflux&lt;&#x2F;a&gt;, my CRDT engine, uses algebraic data types for its merge semantics. Each CRDT field type (LwwRegister, GrowOnlySet, ObservedRemoveSet) is an enum variant with associated data, exactly the pattern I described above. The merge function is associative, commutative, and idempotent. These are mathematical properties that I learned to care about from Haskell, where such properties are often expressed as type class laws.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&quot;&gt;Zentinel&lt;&#x2F;a&gt;, the reverse proxy, uses Rust’s type system to enforce that WAF decisions are handled in the correct pipeline stage. An &lt;code&gt;AgentDecision&lt;&#x2F;code&gt; is either &lt;code&gt;Allow&lt;&#x2F;code&gt;, &lt;code&gt;Block&lt;&#x2F;code&gt;, or &lt;code&gt;Modify&lt;&#x2F;code&gt;, and the proxy’s merge logic ensures that a &lt;code&gt;Block&lt;&#x2F;code&gt; from any agent cannot be overridden. The pattern is a monoid (decisions combine associatively with &lt;code&gt;Block&lt;&#x2F;code&gt; as the absorbing element), though nobody would call it that in the Rust codebase. The concept came from Haskell. The implementation is pure Rust.&lt;&#x2F;p&gt;
&lt;p&gt;Even &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;shiioo&quot;&gt;Shiioo&lt;&#x2F;a&gt;, the agentic orchestrator, uses Haskell-influenced patterns. DAG workflows are compositions of typed transformations. Events are algebraic data types with exhaustive pattern matching. The event-sourcing model treats state as a fold over an event stream. &lt;code&gt;foldl&lt;&#x2F;code&gt; in Haskell, &lt;code&gt;Iterator::fold&lt;&#x2F;code&gt; in Rust. Same idea, different syntax.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-raskell&quot;&gt;Why “raskell”&lt;&#x2F;h2&gt;
&lt;p&gt;The name is a portmanteau. Raffael plus Haskell. I chose it because Haskell is where my engineering thinking started to take its current shape. Not the first language I learned, but the first one that changed how I think about all the others.&lt;&#x2F;p&gt;
&lt;p&gt;I do not believe you need to write Haskell to benefit from Haskell. But I believe that learning it, really learning it, not just reading about monads but building something real with algebraic data types and type classes and higher-order functions, will make you a better engineer in whatever language you actually use.&lt;&#x2F;p&gt;
&lt;p&gt;All beginning is Haskell. The rest is implementation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references-and-further-reading&quot;&gt;References and further reading&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;learning-haskell&quot;&gt;Learning Haskell&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.haskell.org&#x2F;&quot;&gt;Haskell Language&lt;&#x2F;a&gt; - Official site with documentation and community links&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;learnyouahaskell.com&#x2F;&quot;&gt;Learn You a Haskell for Great Good!&lt;&#x2F;a&gt; - Approachable illustrated introduction&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;book.realworldhaskell.org&#x2F;&quot;&gt;Real World Haskell&lt;&#x2F;a&gt; - Practical Haskell for production use&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;wiki.haskell.org&#x2F;&quot;&gt;Haskell Wiki&lt;&#x2F;a&gt; - Community-maintained reference and tutorials&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;wiki.haskell.org&#x2F;Typeclassopedia&quot;&gt;Typeclassopedia&lt;&#x2F;a&gt; - Comprehensive guide to Haskell’s type class hierarchy&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;type-systems-and-theory&quot;&gt;Type systems and theory&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Haskell_Curry&quot;&gt;Haskell Curry&lt;&#x2F;a&gt; - The logician the language is named after&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Lambda_calculus&quot;&gt;Lambda calculus&lt;&#x2F;a&gt; - Alonzo Church’s formal system underlying Haskell&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Hindley%E2%80%93Milner_type_system&quot;&gt;Hindley-Milner type system&lt;&#x2F;a&gt; - The type inference algorithm at the core of Haskell and ML&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;blog.janestreet.com&#x2F;effective-ml-revisited&#x2F;&quot;&gt;Making illegal states unrepresentable&lt;&#x2F;a&gt; - Yaron Minsky’s influential talk on using types for correctness&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;wiki.haskell.org&#x2F;Algebraic_data_type&quot;&gt;Algebraic data types&lt;&#x2F;a&gt; - Haskell wiki reference on sum and product types&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;monads-and-effects&quot;&gt;Monads and effects&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;homepages.inf.ed.ac.uk&#x2F;wadler&#x2F;papers&#x2F;marktoberdorf&#x2F;baastad.pdf&quot;&gt;Philip Wadler, “Monads for functional programming”&lt;&#x2F;a&gt; - The foundational paper on monads in programming&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;wiki.haskell.org&#x2F;All_About_Monads&quot;&gt;All About Monads&lt;&#x2F;a&gt; - Haskell wiki guide to monadic programming&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;haskell-tooling&quot;&gt;Haskell tooling&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.haskell.org&#x2F;ghc&#x2F;&quot;&gt;GHC&lt;&#x2F;a&gt; - The Glasgow Haskell Compiler&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.haskell.org&#x2F;cabal&#x2F;&quot;&gt;Cabal&lt;&#x2F;a&gt; - Haskell’s build and package system&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.haskellstack.org&#x2F;&quot;&gt;Stack&lt;&#x2F;a&gt; - Alternative build tool with curated package sets&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;mise-hx&quot;&gt;mise-hx&lt;&#x2F;a&gt; - mise plugin for the hx Haskell toolchain CLI&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;projects-referenced&quot;&gt;Projects referenced&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;conflux&quot;&gt;Conflux&lt;&#x2F;a&gt; - CRDT engine using algebraic data types for merge semantics&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zentinelproxy.io&quot;&gt;Zentinel&lt;&#x2F;a&gt; - Reverse proxy with monoid-based decision merging&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;shiioo&quot;&gt;Shiioo&lt;&#x2F;a&gt; - Agentic orchestrator using event-sourced state folds&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>My OpenBSD journey: Getting it virtualized with libvirt (1)</title>
          <pubDate>Mon, 06 Feb 2023 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/my-openbsd-journey-getting-it-virtualized-with-libvirt-1/</link>
          <guid>https://raskell.io/articles/my-openbsd-journey-getting-it-virtualized-with-libvirt-1/</guid>
          <description xml:base="https://raskell.io/articles/my-openbsd-journey-getting-it-virtualized-with-libvirt-1/">&lt;h2 id=&quot;void-linux-as-my-daily-driver&quot;&gt;Void Linux as my daily driver&lt;&#x2F;h2&gt;
&lt;p&gt;Around six months ago, I decided to ditch my long in the tooth Arch-based setup on my belovest Thinkpad X1 Carbon. I’ve been very loyal over the years, and almost came to belive that Arch will be a constant in my adult life. While I kept up with upcoming technologies, I somehow lost track of the ever so diversifying landscape of Linux distributions. It took me a while of constantly coming across a generically named reference to what seemed to be yet another Linux distribution. That outwardly generic sounding name, &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;voidlinux.org&#x2F;&quot;&gt;Void Linux&lt;&#x2F;a&gt;, kept poking my curiosity by supposedly feeling like Arch Linux in the old days, while sharing some substantial DNA with the BSD operating systems. Yet, that’s another story I might tell another day, but to remain brief, the BSDs, in particular the infamous OpenBSD with its quite infamous lead developer Theo De Raadt, always were what I considered the endgame. The holy grail of Unix operating systems, so did I think over the decades, FreeBSD, NetBSD and OpenBSD, have always been on my personal radar and I felt I had to earn the intellectual capacity to be able to properly put them at use one day. Last year, when I made the (almost painless) switch from Arch Linux to Void Linux, the simplicity and especially the barebone experience of Void reignited the fascination and the admiration I always had for the BSD operating systems and their philosophy.&lt;&#x2F;p&gt;
&lt;p&gt;While I could write (and definitely will in the near future) about how my journey onto Void Linux and how it has been so far, I preferred to write down, in some like diary, and document every step on how one can approach and ultimately use &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.openbsd.org&#x2F;&quot;&gt;OpenBSD&lt;&#x2F;a&gt; in 2023. Big disclaimer, I’ve yet to install OpenBSD on some baremetal server I ordered some days ago, but dabbled around in the meantime with OS-level virtualization in order to get it running. That’s what brings me to &lt;em&gt;libvirt&lt;&#x2F;em&gt; and my surprise to learn that I wouldn’t need some full-fledged virtualization solution like the ones offered by VirtualBox or VMWare to efficiently run a virtualized OpenBSD machine. So, ok, let’s recap, so far we got the following bill of materials:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Void Linux as the host system&lt;&#x2F;li&gt;
&lt;li&gt;OpenBSD to be virtualized on that host system&lt;&#x2F;li&gt;
&lt;li&gt;libvirt as the glue that makes virtualization feel like black magic&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;from-void-to-openbsd&quot;&gt;From Void to OpenBSD&lt;&#x2F;h3&gt;
&lt;p&gt;Before I tell you more about the history of how things went down while setting up OpenBSD, let me give you some basic notions about both Void Linux, being one out of many Linux distributions for the sake of simplicity representing them all as it ended up being my distribution of choice, and OpenBSD. As already mentioned earlier, Void Linux and OpenBSD are both Unix-like operating systems, but they feature enough differences to make them noteworthy. Here are a few similarities and differences between the two:&lt;&#x2F;p&gt;
&lt;p&gt;What is similar:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Both are free and open-source operating systems.&lt;&#x2F;li&gt;
&lt;li&gt;Both use a package manager for software management. Void Linux uses XBPS, while OpenBSD uses pkg_add.&lt;&#x2F;li&gt;
&lt;li&gt;Both prioritize security and stability in their development and design.&lt;&#x2F;li&gt;
&lt;li&gt;Both feature a version control based package repository, meaning that changes in build definition are managed by pull requests from users.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;What is different:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;License: Void Linux is licensed under the MIT License, while OpenBSD is licensed under the ISC License.&lt;&#x2F;li&gt;
&lt;li&gt;Philosophy: OpenBSD prioritizes security and privacy, while Void Linux prioritizes simplicity and modularity.&lt;&#x2F;li&gt;
&lt;li&gt;Package Management: Void Linux uses the XBPS binary package manager, while OpenBSD uses pkg_add (also binary). OpenBSD additionally has a ports system for building from source.&lt;&#x2F;li&gt;
&lt;li&gt;Package Repository: Void Linux has a large and diverse repository, while OpenBSD has a smaller and more curated repository.&lt;&#x2F;li&gt;
&lt;li&gt;Init System: Void Linux uses runit as its init system, while OpenBSD uses rc. None uses the infamous systemd init system.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;what-is-openbsd-in-a-nutshell&quot;&gt;What is OpenBSD in a nutshell&lt;&#x2F;h2&gt;
&lt;p&gt;OpenBSD is a free and open-source operating system that focuses on security, standardization, and robustness. It is based on the Berkeley Software Distribution (BSD) Unix operating system and is developed by a global community of volunteers. OpenBSD aims to provide a secure platform for both personal and enterprise use by implementing strong security features, including access control mechanisms, encryption, and auditing.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Theo_de_Raadt&quot;&gt;Theo de Raadt&lt;&#x2F;a&gt; is the founder and lead developer of OpenBSD. His main objective with OpenBSD is to create a secure operating system that is free from backdoors, vulnerabilities, and other security weaknesses. He is committed to auditing the source code of the operating system and third-party software included with it, to identify and remove any potential security risks. De Raadt is also dedicated to improving the overall quality of the codebase and ensuring compatibility with a wide range of hardware and software.&lt;&#x2F;p&gt;
&lt;p&gt;What makes OpenBSD really special and stand out is that is developed a suite of tools that got adopted by other OSs like Linux, macOS or even Windows. One of the most famous instances of such adoption is the now de facto standard openssh suite. It actually emerged from within the development circle of the OpenBSD project. OpenBSD also implemented a wide range of OS features that are by now considered staples among other OSs, things like Linux-based OS-level containerization done via the means of cgroups, something that OpenBSD already pioneered and solved with a different spin many years before Linux with pledge and unveil. Go check out &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;why-openbsd.rocks&#x2F;fact&#x2F;freezero&#x2F;&quot;&gt;Why OpenBSD rocks&lt;&#x2F;a&gt; to get a feel what makes OpenBSD so unique and interesting.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;virtualization-with-libvirt&quot;&gt;Virtualization with libvirt&lt;&#x2F;h3&gt;
&lt;p&gt;So now, let’s get back to our virtualization endeavour where we would like to virtualize OpenBSD on a Void Linux installation. If you happen to be using another Linux distribution, most of the individual steps would be very similar. That brings me to the next technology we should explain a bit more here, and that is libvirt.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;libvirt.org&#x2F;&quot;&gt;libvirt&lt;&#x2F;a&gt; is an open-source virtualization management library that provides a simple and unified API for managing virtualization technologies, including KVM, QEMU, Xen, and others. It aims to simplify the process of creating, managing, and migrating virtual machines, storage, and networks, and to make it easier for administrators to manage virtual environments.&lt;&#x2F;p&gt;
&lt;p&gt;To virtualize an operating system like OpenBSD with libvirt, you need to follow these steps:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Install libvirt and the virtualization technology you want to use, such as KVM.&lt;&#x2F;li&gt;
&lt;li&gt;Download the OpenBSD iso file and place it in a location accessible by libvirt.&lt;&#x2F;li&gt;
&lt;li&gt;Create a new virtual machine in libvirt with the OpenBSD ISO as the installation media. This can be done through the command line or using a graphical user interface such as virt-manager.&lt;&#x2F;li&gt;
&lt;li&gt;Configure the virtual machine, including the amount of memory, CPU, and disk space, to meet the requirements of OpenBSD.&lt;&#x2F;li&gt;
&lt;li&gt;Start the virtual machine and install OpenBSD as you would on a physical machine.&lt;&#x2F;li&gt;
&lt;li&gt;Once the installation is complete, you can configure the virtual network, storage, and other settings as required.&lt;&#x2F;li&gt;
&lt;li&gt;Finally, you can use the libvirt API or the command line to manage and control the virtual machine, including starting, stopping, migrating, and snapshotting.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h3 id=&quot;step-by-step-guide&quot;&gt;Step-by-step guide&lt;&#x2F;h3&gt;
&lt;p&gt;Let’s first install the &lt;code&gt;libvirt&lt;&#x2F;code&gt; package and some related packages which we need in order to connect via VNC. The VNC will provide us with the possibility to use the graphical interface of the running OpenBSD instance.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;$ sudo xbps-install -S dbus qemu libvirt virt-manager virt-viewer tigervnc
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now, we need to add our user, in my case &lt;code&gt;raskell&lt;&#x2F;code&gt;, to the &lt;code&gt;libvirt&lt;&#x2F;code&gt; group which got simultanously created with the installation of libvirt.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;$ sudo usermod -aG libvirt raskell
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;OpenBSD features a release cycle of six months. We would need to update our system every six month to keep up with the latest packages. During a given release, only security and bug fix patches are applied to the curated packages maintained by pkg_add. Therefore, in February 2023, we’re using the &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.openbsd.org&#x2F;72.html&quot;&gt;OpenBSD 7.2&lt;&#x2F;a&gt; release version. As I’m living in Switzerland, I chose to pull the iso image from a Swiss mirror, in this case from &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;mirror.ungleich.ch&#x2F;pub&#x2F;OpenBSD&#x2F;7.2&#x2F;&quot;&gt;&lt;code&gt;mirror.ungleich.ch&#x2F;pub&#x2F;OpenBSD&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; (check what mirror is closest to you to get the best download rate).&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# cd &amp;#x2F;var&amp;#x2F;lib&amp;#x2F;libvirt&amp;#x2F;boot&amp;#x2F;
# sudo wget https:&amp;#x2F;&amp;#x2F;mirror.ungleich.ch&amp;#x2F;pub&amp;#x2F;OpenBSD&amp;#x2F;7.2&amp;#x2F;amd64&amp;#x2F;install72.iso
--2023-01-12 20:48:15--  https:&amp;#x2F;&amp;#x2F;mirror.ungleich.ch&amp;#x2F;pub&amp;#x2F;OpenBSD&amp;#x2F;7.2&amp;#x2F;amd64&amp;#x2F;install72.iso
Resolving mirror.ungleich.ch (mirror.ungleich.ch)... 2a0a:e5c0:2:2:400:c8ff:fe68:bef3, 185.203.114.135
Connecting to mirror.ungleich.ch (mirror.ungleich.ch)|2a0a:e5c0:2:2:400:c8ff:fe68:bef3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 583352320 (556M) [application&amp;#x2F;octet-stream]
Saving to: ‘install72.iso.1’

install72.iso.1                      100%[====================================================================&amp;gt;] 556.33M  7.11MB&amp;#x2F;s    in 77s     

2023-01-12 20:49:33 (7.19 MB&amp;#x2F;s) - ‘install72.iso.1’ saved [583352320&amp;#x2F;583352320]
sudo wget https:&amp;#x2F;&amp;#x2F;mirror.ungleich.ch&amp;#x2F;pub&amp;#x2F;OpenBSD&amp;#x2F;7.2&amp;#x2F;amd64&amp;#x2F;SHA256
--2023-01-12 20:47:38--  https:&amp;#x2F;&amp;#x2F;mirror.ungleich.ch&amp;#x2F;pub&amp;#x2F;OpenBSD&amp;#x2F;7.2&amp;#x2F;amd64&amp;#x2F;SHA256
Resolving mirror.ungleich.ch (mirror.ungleich.ch)... 2a0a:e5c0:2:2:400:c8ff:fe68:bef3, 185.203.114.135
Connecting to mirror.ungleich.ch (mirror.ungleich.ch)|2a0a:e5c0:2:2:400:c8ff:fe68:bef3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1992 (1.9K) [application&amp;#x2F;octet-stream]
Saving to: ‘SHA256.1’

SHA256.1                             100%[====================================================================&amp;gt;]   1.95K  --.-KB&amp;#x2F;s    in 0s      

2023-01-12 20:47:39 (742 MB&amp;#x2F;s) - ‘SHA256.1’ saved [1992&amp;#x2F;1992]
# grep install63.iso SHA256 &amp;gt; &amp;#x2F;tmp&amp;#x2F;x
# sha256sum -c &amp;#x2F;tmp&amp;#x2F;x
# rm &amp;#x2F;tmp&amp;#x2F;x
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Before we can start the virtualization server and get running our OpenBSD instance, we need to define the configuraiton on how to virtualize and ultimately boot the system with. This is done with &lt;code&gt;virt-install&lt;&#x2F;code&gt;. Noteworthy here is that we &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.qemu.org&#x2F;&quot;&gt;QEMU&lt;&#x2F;a&gt; as our emulation solution of choice, we allocate up to 4GB of RAM and 4 CPU cores to the machine.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;$ sudo virt-install \
      --name=openbsd \
      --virt-type=qemu \
      --memory=2048,maxmemory=4096 \
      --vcpus=2,maxvcpus=4 \
      --cpu host \
      --os-variant=openbsd7.0 \
      --cdrom=&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;libvirt&amp;#x2F;boot&amp;#x2F;install72.iso \
      --network=bridge=virbr0,model=virtio \
      --graphics=vnc \
      --disk path=&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;libvirt&amp;#x2F;images&amp;#x2F;openbsd.qcow2,size=40,bus=virtio,format=qcow2
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Once, it is up and running, we can use a vnc solution to connect to the running machine. In this case, I chose &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.libvirt.org&#x2F;manpages&#x2F;virsh.html&quot;&gt;virsh&lt;&#x2F;a&gt; to do the job. virsh is a command-line interface tool for managing virtualization environments created with libvirt. It allows us to manage virtual machines, storage pools, and network interfaces, as well as other virtualization components, from the command line.&lt;&#x2F;p&gt;
&lt;p&gt;To establish a VNC connection with a running libvirt virtualized OpenBSD instance, you can use the following steps:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Start the virtual machine in libvirt: You can start the virtual machine using the virsh command &lt;strong&gt;&lt;code&gt;virsh start &amp;lt;vm-name&amp;gt;&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, where &lt;strong&gt;&lt;code&gt;&amp;lt;vm-name&amp;gt;&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; is the name of the virtual machine you want to start.&lt;&#x2F;li&gt;
&lt;li&gt;Find the VNC display: Once the virtual machine is running, you can find the VNC display number for the virtual machine using the virsh command &lt;strong&gt;&lt;code&gt;virsh vncdisplay &amp;lt;vm-name&amp;gt;&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Connect to the VNC display: You can connect to the VNC display using a VNC client, such as &lt;strong&gt;&lt;code&gt;vncviewer&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, and specify the IP address of the host running the virtual machine and the VNC display number. For example, if the host’s IP address is &lt;strong&gt;&lt;code&gt;192.168.0.100&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; and the VNC display number is &lt;strong&gt;&lt;code&gt;:0&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, the command to connect would be &lt;strong&gt;&lt;code&gt;vncviewer 192.168.0.100:0&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Authenticate to the VNC server: You may need to enter a password to authenticate to the VNC server. The password is set when the virtual machine is created in libvirt.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;With these steps, you can establish a VNC connection with a running libvirt virtualized OpenBSD instance and interact with the virtual machine’s graphical user interface.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;$ virsh dumpxml openbsd | grep vnc
&amp;lt;graphics type=&amp;#x27;vnc&amp;#x27; port=&amp;#x27;5900&amp;#x27; autoport=&amp;#x27;yes&amp;#x27; listen=&amp;#x27;127.0.0.1&amp;#x27;&amp;gt;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Like I did, most of you would like to interact with a graphical interface such as with X11. For that, we yet another tool, a so-called VNC viewer. A very simple implementation of such a vnc viewer is &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;tigervnc.org&#x2F;&quot;&gt;tigervnc&lt;&#x2F;a&gt; (simply install it with &lt;code&gt;$ sudo xbps-install -S tigervnc&lt;&#x2F;code&gt;).&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;$ sudo virsh --connect
qemu:&amp;#x2F;&amp;#x2F;&amp;#x2F;system start openbsd
Domain &amp;#x27;openbsd&amp;#x27; started
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;references-and-further-reading&quot;&gt;References and further reading&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;openbsd&quot;&gt;OpenBSD&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.openbsd.org&#x2F;&quot;&gt;OpenBSD&lt;&#x2F;a&gt; - Official project site&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.openbsd.org&#x2F;faq&#x2F;&quot;&gt;OpenBSD FAQ&lt;&#x2F;a&gt; - Comprehensive installation and usage guide&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;why-openbsd.rocks&#x2F;&quot;&gt;Why OpenBSD Rocks&lt;&#x2F;a&gt; - Collection of OpenBSD innovations and features&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.openssh.com&#x2F;&quot;&gt;OpenSSH&lt;&#x2F;a&gt; - The SSH suite that originated from the OpenBSD project&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Theo_de_Raadt&quot;&gt;Theo de Raadt&lt;&#x2F;a&gt; - OpenBSD founder and lead developer&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;man.openbsd.org&#x2F;pledge.2&quot;&gt;pledge(2)&lt;&#x2F;a&gt; - OpenBSD’s system call for restricting process capabilities&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;man.openbsd.org&#x2F;unveil.2&quot;&gt;unveil(2)&lt;&#x2F;a&gt; - OpenBSD’s system call for restricting filesystem visibility&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;void-linux&quot;&gt;Void Linux&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;voidlinux.org&#x2F;&quot;&gt;Void Linux&lt;&#x2F;a&gt; - Official project site&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.voidlinux.org&#x2F;&quot;&gt;Void Linux Handbook&lt;&#x2F;a&gt; - Official documentation&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.voidlinux.org&#x2F;xbps&#x2F;index.html&quot;&gt;XBPS package manager&lt;&#x2F;a&gt; - Void’s binary package management system&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;virtualization&quot;&gt;Virtualization&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;libvirt.org&#x2F;&quot;&gt;libvirt&lt;&#x2F;a&gt; - Virtualization management library and API&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.qemu.org&#x2F;&quot;&gt;QEMU&lt;&#x2F;a&gt; - Open-source machine emulator and virtualizer&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.libvirt.org&#x2F;manpages&#x2F;virsh.html&quot;&gt;virsh(1)&lt;&#x2F;a&gt; - Command-line interface for managing libvirt guests&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;tigervnc.org&#x2F;&quot;&gt;TigerVNC&lt;&#x2F;a&gt; - VNC implementation for remote desktop access&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;guides&quot;&gt;Guides&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;voidlinux&#x2F;comments&#x2F;ghwvv5&#x2F;guide_how_to_setup_qemukvm_emulation_on_void_linux&#x2F;&quot;&gt;[Guide] How to setup QEMU&#x2F;KVM emulation on Void Linux&lt;&#x2F;a&gt; - Community guide on r&#x2F;voidlinux&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.cyberciti.biz&#x2F;faq&#x2F;kvmvirtualization-virt-install-openbsd-unix-guest&#x2F;&quot;&gt;KVM virt-install: Install OpenBSD as Guest Operating System&lt;&#x2F;a&gt; - Step-by-step KVM installation guide&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.skreutz.com&#x2F;posts&#x2F;autoinstall-openbsd-on-qemu&#x2F;&quot;&gt;Auto-install OpenBSD on QEMU&lt;&#x2F;a&gt; - Automated OpenBSD installation on QEMU&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
      <item>
          <title>Hello and outlook</title>
          <pubDate>Tue, 20 Sep 2022 00:00:00 +0000</pubDate>
          <author>Unknown</author>
          <link>https://raskell.io/articles/hello-and-outlook/</link>
          <guid>https://raskell.io/articles/hello-and-outlook/</guid>
          <description xml:base="https://raskell.io/articles/hello-and-outlook/">&lt;h2 id=&quot;hello-world&quot;&gt;Hello world&lt;&#x2F;h2&gt;
&lt;p&gt;This is the first post on raskell.io. My name is Raffael. I build software, mostly in the space of platform automation, edge infrastructure, and applied security. I also have a long-running interest in open-source software, operating systems, and the kind of hardware you find in recycling bins and give a second life to.&lt;&#x2F;p&gt;
&lt;p&gt;This site exists to document real work and share patterns that other engineers can use. Not thought leadership. Not personal branding. Just notes from the workshop.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-to-expect&quot;&gt;What to expect&lt;&#x2F;h2&gt;
&lt;p&gt;At the time of writing, I planned to cover:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Linux and BSD systems, particularly &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;voidlinux.org&#x2F;&quot;&gt;Void Linux&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.openbsd.org&#x2F;&quot;&gt;OpenBSD&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Open-source tooling and infrastructure&lt;&#x2F;li&gt;
&lt;li&gt;Platform automation and operability&lt;&#x2F;li&gt;
&lt;li&gt;Whatever hard problem I happened to be stuck on&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Looking back from 2026, the scope grew to include edge systems, AI-assisted engineering, and a few deep dives I did not expect to write. The thread that held it together was always the same: systems under pressure, and the tools that keep them running.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;references&quot;&gt;References&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;voidlinux.org&#x2F;&quot;&gt;Void Linux&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.openbsd.org&#x2F;&quot;&gt;OpenBSD&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;raskell-io&#x2F;www.raskell.io&quot;&gt;raskell.io source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</description>
      </item>
    </channel>
</rss>
