Structured feedback vs. vibes: why “looks good to me” is not a review

If you’re reviewing AI-generated content right now, chances are the process looks something like this: someone drops a draft in Google Docs, you skim it, leave a few comments, and type “LGTM” at the bottom. Or maybe you’re more diligent. Maybe you spend 30 minutes leaving specific, thoughtful notes. Either way, the outcome is identical. Next week, a new draft lands. You leave the same notes again.

That cycle isn’t a discipline problem. You’re not failing because you’re lazy or because your team doesn’t care about quality. You’re failing because your review process produces data that disappears the moment someone clicks “resolve.” Your 100th content review takes exactly as long as your first, because unstructured feedback can’t compound. That’s an architecture problem, not an effort problem.

Think about how code review works. A developer submits a pull request. Another developer leaves inline comments attached to specific lines of code. Those comments persist in the commit history. Automated checks run before a human ever looks at it. The whole system produces structured, searchable, permanent data. Now think about how content review works. Someone types “looks good” in a Google Doc. That’s it. That’s the entire system.

The bottleneck nobody updated

The State of Docs 2026 survey (n=1,131) found that 56% of docs professionals who use AI now spend more time reviewing and editing than writing. The bottleneck flipped. Production got fast. Review didn’t.

And the tooling reflects this perfectly. 76% of those same respondents use AI regularly for drafting, but only 44% have any AI-specific guidelines for review. A tech writer on r/technicalwriting described the result: “Tech writing team of two supporting 50+ engineers. Recently, a lot of them started using AI to generate API docs, READMEs, and internal wiki content. In theory, this should help; engineers create drafts, and we refine them. But in practice, the output is all over the place. Different tone, structure, and depth depending on the person.”

Volume went vertical. Review infrastructure stayed exactly where it was: comment boxes, Slack threads, vibes.

What LGTM actually costs

“LGTM” stands for “Looks Good to Me.” In code review, it means a reviewer glanced at a pull request and approved it without substantive comment. Researchers at Bilkent University and UCLA studied this across five large-scale open-source projects and found that 64.7% of pull requests received comment-free reviews. They coined a term for it: “LGTM smell.” Comment-free reviews exhibited this smell 3.5 times more often than reviews with actual comments.

Here’s the thing: those comment-free PRs also had statistically more late commits. The code changed after it was supposedly approved. The approval meant nothing. A gate without a guard.

When someone types “looks good” on a content draft, the same thing happens. They’ve produced a binary signal with zero information about what they actually checked, what they let slide, or what they’d flag next time. It’s approval theater.

IBM’s Systems Science Institute documented the cost of this pattern in software: bugs caught late in the cycle cost 30 to 100 times more to fix than bugs caught at the review stage. CISQ puts the total annual cost of poor software quality at $2.4 trillion. Content doesn’t carry the same dollar figures, but the mechanism is identical. Problems caught downstream compound into organizational debt that’s hard to measure until it’s already expensive.

As Ana Gotter wrote in Search Engine Land’s QA workflow guide: “Without defined criteria, your QA process becomes subjective and inconsistent, which is the enemy of quality at scale. At five articles a month, you might not notice. At 50, the cracks will begin to show.”

Why vibe review can’t compound

Unstructured feedback fails in three specific ways, and they all trace back to the same root cause: the format.

A thumbs-up tells you nothing about which parts of a draft passed, which were borderline, which got ignored entirely. You can’t run statistics on thumbs-ups. You can’t detect that the same voice-consistency issue shows up in 70% of your drafts, or compare this week’s review to last month’s. No structure means no aggregation.

Then there’s persistence. The moment a Google Docs comment gets “resolved,” the judgment behind it vanishes. Not filed. Not searchable. Not transferable to the next draft. Every review starts from zero because your tools don’t remember what you decided last time. This is why your 100th review feels exactly like your first. It literally is.

And your AI agent can’t read any of it. Not the resolved Google Doc comment, not the Slack thread, not the verbal note from a standup. Vibe-based review produces feedback in a format only humans can parse, and only at the moment it’s produced. The agent that generates your next draft starts with zero institutional memory. Every improvement you’ve ever articulated exists only in your head.

A common objection: “LGTM is fine sometimes. Over-reviewing is bureaucracy.” A researcher with a Google DeepMind background put it bluntly: “Strong systems + 1 LGTM set = enough.” But this confuses review overhead with review quality. The argument isn’t for more rounds. It’s that whatever time you do spend reviewing should produce data that persists and compounds. One structured review produces more lasting value than ten rubber stamps.

What structured feedback actually changes

Structured feedback inverts all three of those failures. The mechanism is straightforward.

When you flag an accuracy issue in paragraph four, that finding stays attached to paragraph four even when the surrounding text gets rewritten. It doesn’t drift or disappear the moment someone edits nearby. That’s block-level anchoring, and it’s the first thing you lose with Google Docs comments.

When a finding gets resolved, the resolution is recorded. What was flagged, how it was addressed, who decided. The judgment survives. That’s the difference between organizational memory and a sticky note you threw away.

And because findings are schema-versioned JSON, any agent can parse them. Your AI agent can load last week’s findings before drafting this week’s content. It can check whether a draft violates rules your team already decided on. The feedback loop closes automatically, without anyone manually re-explaining the same six notes.

This is where it gets interesting. Findings become rules. Rules compile into rulepacks. Your agent loads the rulepack before it starts writing. The next draft already avoids the issues you flagged last time. Every review makes the next draft better. Every rule makes the next review faster. Same principle behind compound interest: small, persistent gains building on themselves.

But the compounding only works if the feedback format supports it. A Google Docs comment can’t become a rule. A Slack message can’t compile into a rulepack. A thumbs-up emoji can’t train your next draft to be better than your last one. The format is the bottleneck. Vibes can’t compound.

What we’ve seen in practice

After three weeks in pilot, recurring review notes dropped to zero. Editor time got cut in half. Quality went up.

Not because the reviewers got better at their jobs. Because the system stopped deleting their judgment. The rules they’d been repeating for months were finally captured in a format the pipeline could enforce before the draft ever reached a human inbox.

Go back to the code review analogy. Code review produces structured data: diffs with line references, inline comments anchored to specific changes, automated status checks. Imagine if code review was just “LGTM in Slack.” Nobody would accept that. But that’s what pretty much every content team runs today.

The difference between content that improves over time and content that stays stuck at “good enough” isn’t the talent of your reviewers. It’s whether their judgment gets captured in a format that compounds or a format that gets thrown away. This is the core review problem AI created: production got solved, but nobody built the infrastructure for the judgment that’s supposed to follow.

We build that infrastructure. If you want to see what structured review actually looks like, start with the free tier.