How Engineers Use AI

Verification habits for legal AI work

Currentness disclaimer

AI facts are perishable. Model behavior, vendor terms, court expectations, and bar guidance can change faster than a conference Wi-Fi password.

Engineers trust systems slowly

The engineering posture

Assume partial usefulnessLet AI create a first pass, not a final answer.
Demand evidenceAsk what source, file, case, test, or trace supports the claim.
Keep agencyUse the tool to think faster, not to stop thinking.

The practical question is not "Can AI answer?" It is "What would make me comfortable relying on this answer?"

Use AI like a fast intern

Intern mode Lawyer / engineer mode
Drafts a memo Checks the cited authority
Summarizes a file Opens the file
Suggests a fix Runs the test
Sounds confident Looks for the weak link

A plausible answer is a lead. It is not proof.

Verification is algebra

given:      source material + instructions + constraints
transform:  model output
check:      citation, file, test, calculation, human review
result:     answer I am willing to own

If the check is missing, the equation is unfinished.

Play dumb

What exactly supports that?Make it name the authority, passage, data point, or file.
What would disprove that?Force the model to reveal edge cases and assumptions.
Show the boring stepAsk for the calculation, quote, diff, or search path.

Simple questions break a surprising number of polished answers.

Smell tests before expertise

Code smellsInvented API, hidden global state, no tests, duplicate logic, "just trust me" abstractions.
Hallucination smellsPerfect certainty, missing citations, vague source names, unsupported numbers, generic caveats.
Legal-work smellsNo jurisdiction, no currentness check, no procedural posture, no pin, no client-fact boundary.

Use summarizing frameworks

Smart Brevity Core 4

Tease Why should I keep reading?

Lede What is the actual point?

Why it matters What decision or risk changes?

Go deeper What should I inspect next?

compression
with
review
hooks

The power of a hard drive

Welcome to 1995

A hard drive beats vibes

Save the sourcePDF, transcript, email, docket, statute, contract, screenshot.
Ask over the sourceMake the model work from a bounded record, not memory.
Keep the receiptStore the prompt, output, source list, and final human decision.

The most underrated AI tool is still a folder with the materials you actually relied on.

How engineers mitigate hallucination

RTFS

Read the source.

Then read the source again when the answer matters.

"The AI cited it" is not the same as "the source says it."

Consequences improve instructions

Name the stakes"I could lose my legal license" changes the workflow standard.
Specify the failure"A fake citation could get me sanctioned" is more useful than "be accurate."
Still verifySeverity can make the model more careful. It cannot make the model truthful.

The burger example

You can give a perfect order.

You can hear it repeated back.

You can get a confident "yes."

You still open the bag before you drive away.

AI will still lie to you

It may inventCases, quotes, packages, policy details, product features.
It may compress too hardImportant exceptions disappear inside a tidy summary.
It may be staleLaw, model behavior, pricing, and vendor terms move.

Treat the lie as a system property, not a moral surprise.

The context window

The context window is not memory

A context window is the pile of text the model can currently see.

It is not a filing system, a source hierarchy, a privilege screen, or a human mental model.

visible
right
now

If the important fact is not in the record, the model may fill the gap with shape, not truth.

Treat it like a loaded case file

Habit Why it helps
Put controlling sources first Recency and placement influence attention
Remove irrelevant clutter Less room for false synthesis
Label source quality Primary law, client facts, vendor docs, notes
Ask for conflict checks Forces the model to compare, not just summarize

AI changed software work

The bottleneck moved

BeforeTyping, searching, boilerplate, remembering syntax.
NowFraming, verification, review, judgment, maintenance.
RiskMore output than humans can responsibly understand.

The scarce skill is becoming discernment under speed.

Two new failure modes

Comprehension debtThe system grows faster than human understanding of it.
Cognitive surrenderThe AI output becomes your output before you form an independent view.
Legal analogyA draft can be useful while still leaving responsibility exactly where it started.

Sources: Addy Osmani, Comprehension Debt; Addy Osmani, Cognitive Surrender.

AI fatigue is a workflow smell

You stop checkingThe fifth draft gets less scrutiny than the first.
You accept polishGood formatting starts to feel like good reasoning.
You lose the threadYou cannot explain why the answer changed.

When you are too tired to evaluate, you are too tired to delegate.

Benchmark rankings

Signal, not scripture

Do not use benchmarks as God's word

Benchmarks are useful signals.

They are not your matter, your client, your jurisdiction, your risk tolerance, or your workflow.

Use rankings to decide what deserves testing. Use your own evaluation to decide what deserves reliance.

Vals LegalBench
General LLM rankings on LegalBench; useful broad signal.

Vals index
Directory for model benchmarks; LegalBench sits under Legal.

LegalBenchmarks.ai
Practical tasks: drafting, extraction, research, review, translation.

MLEB
Legal embedding and retrieval model rankings.

Harvey LAB
Legal-agent workflow benchmark; vendor source, useful caveat.

Harvey BigLaw Bench
Older vendor-published legal work-product rankings.

LawBench
Chinese-law legal LLM benchmark and leaderboard.

Vals VLAIR
Report-style legal research comparison against a lawyer baseline.

Questions