loke.dev
Header image for The Spec-First Manifesto: Why I Stopped Prompting for Logic and Started Prompting for Assertions

The Spec-First Manifesto: Why I Stopped Prompting for Logic and Started Prompting for Assertions

If you’re tired of fixing the subtle bugs in 'working' AI code, it's time to stop asking for features and start feeding your LLM failing tests instead.

· 4 min read

The most dangerous code you’ll ever ship is the code that looks correct, passes a manual smoke test, and was written by an LLM that didn't actually understand your requirements. We’ve all been there: you prompt for a feature, the AI spits out 50 lines of beautiful-looking TypeScript, you paste it in, and it *works*—until a user enters a negative integer or a null string, and the whole thing turns into a hallucination-fueled fever dream.

I’ve stopped asking AI to "write a function that does X." It’s a sucker’s game. Instead, I’ve moved to a Spec-First workflow: I give the LLM the assertions it must satisfy before I let it write a single line of application logic.

The "Vibes-Based" Prompting Trap

When we prompt for logic, we’re asking the LLM to be the architect, the builder, and the inspector all at once. If you say, *"Write a Python function to calculate the pro-rated refund for a subscription,"* the LLM will give you a "code-shaped object." It will handle the happy path perfectly.

But it will almost certainly forget that leap years exist, or that some months have 31 days, or that the refund shouldn't exceed the original payment. You then spend the next hour "debugging" the AI by sending follow-up prompts like *"Wait, it fails when the user cancels on the last day of the month."*

You're playing a game of whack-a-mole where the mallet is a chat interface. It’s exhausting.

Flip the Script: Prompt for Assertions

The shift is simple: Don't describe the solution; define the boundaries.

I start by writing a test suite—or, if I'm feeling particularly lazy, I ask the LLM to write the *tests* based on a list of edge cases I’ve brainstormed. Once I have a suite of failing tests, I feed those tests back to the LLM and say: *"Make these pass. Do not change the tests."*

Here is a real-world example of what I mean. Suppose I need a utility to format currency for an international marketplace.

The "Old" Way (Prompting for Logic):

"Write a JS function that formats numbers as currency based on a locale string and a currency code."

The "Spec-First" Way (Prompting for Assertions):
I provide the LLM with a Vitest file that looks like this:

import { describe, it, expect } from 'vitest';
import { formatMoney } from './formatter';

describe('formatMoney', () => {
  it('handles standard USD formatting', () => {
    expect(formatMoney(10.5, 'USD', 'en-US')).toBe('$10.50');
  });

  it('rounds up to the nearest cent correctly', () => {
    expect(formatMoney(10.555, 'USD', 'en-US')).toBe('$10.56');
  });

  it('handles JPY with zero decimal places', () => {
    // Japanese Yen doesn't use subunits like cents
    expect(formatMoney(1000.5, 'JPY', 'ja-JP')).toBe('¥1,001');
  });

  it('throws an error for invalid currency codes', () => {
    expect(() => formatMoney(10, 'WAIT_WHAT', 'en-US')).toThrow();
  });
});

When you paste those failing tests into a tool like Claude or GPT-4o, the "reasoning" engine shifts gears. It’s no longer trying to guess what a currency formatter looks like; it’s solving a mathematical constraint problem. It has to find the specific implementation of Intl.NumberFormat that satisfies every single expect statement.

Why This Actually Works

LLMs are world-class at "filling in the blanks." When you provide tests, you are creating a high-resolution mold. The code it pours into that mold has nowhere to go but exactly where you specified.

1. Constraint-Based Reasoning: By seeing the JPY example, the AI realizes it can't just use a generic .toFixed(2). It’s forced to look for a more robust solution.
2. No More "Polite Lies": An LLM might tell you its code handles edge cases. A test runner doesn't care about politeness. If the code is wrong, the terminal stays red.
3. Refactoring is Free: If I want to change the implementation from a library-heavy version to a vanilla JS version, I just swap the prompt. Since the tests stay the same, I have 100% confidence the new "AI-optimized" code still works.

The "Cheat Code" Gotcha

There is one catch: LLMs are inherently "lazy" (or efficient, depending on your outlook). If you give it three tests, it might write a function that literally just contains three if statements returning hardcoded strings.

To avoid this, I use the Constraint Sandwich:
1. The Bread (Top): "Write a generic, production-ready implementation of this function."
2. The Meat: The failing test suite.
3. The Bread (Bottom): "Ensure the logic is algorithmic and doesn't hardcode the test cases. Handle generic inputs that follow the same patterns."

Stop Prompting, Start Asserting

We’ve spent decades learning that Test-Driven Development (TDD) makes for better human-written code. It turns out that TDD is even more vital for AI-generated code.

If you can't write a test for the logic you want, you don't actually know what you're asking the AI to build. You’re just wishing for code. Stop wishing, start asserting, and let the machine do the heavy lifting of making the dots connect.