How to Verify AI-Generated Financial Models [Part 4 of 5]

Verification: The Discipline That Separates Amateurs from Professionals

About this Series

This five-part series explores the craft of building Excel models with AI, from foundational skills to advanced techniques for developing full financial models.

You've just watched an LLM generate a complete real estate pro forma in thirty seconds. The structure looks right. The numbers seem reasonable. The formulas appear sophisticated.

And there's a very good chance something is wrong.

This isn't a criticism of LLMs. Instead, it's a recognition that LLM-generated models fail differently than human-built models—and most people haven't developed the instincts to catch these failures. The professional who masters verification will consistently produce reliable work. The one who doesn't will eventually send a broken model to an investment committee.

This tutorial is about developing that discipline.

Why LLM Errors Are Different

When a human builds a model, errors tend to cluster around complexity. The simple stuff is usually right; mistakes appear in the intricate waterfall logic or the debt sizing constraints. You learn to focus your review there.

LLM errors follow a different pattern. The model might nail the complex waterfall calculation but miscalculate something basic—referencing the wrong row for revenue, using annual figures where monthly was needed, or applying a formula to the wrong time period. These errors are harder to catch because they occur where you're not looking.

There's also what we call the plausibility trap. LLMs are remarkably good at producing outputs that look right. An IRR of 18.4% on a multifamily acquisition seems reasonable. The cash flows follow a logical pattern. Nothing screams "error." But the debt service might be calculated on the wrong principal balance, or the exit cap rate might be applied to the wrong NOI. The model passes the sniff test while being fundamentally broken.

The implication: you cannot rely on intuition alone. You need a systematic approach.

The Four Layers of Verification

Effective verification operates at four distinct levels, each catching different types of errors.

Structural verification asks: is the model built correctly? Does the layout match what you specified? Are all required sections present? Do sections connect to each other properly? Is the time structure correct—monthly when you needed monthly, annual when you needed annual, with the right number of periods?

Formula verification asks: do the calculations do what they claim? This means tracing formulas, understanding what each cell actually computes, and confirming that references point to the right places.

Numerical verification asks: do the outputs make sense? This is where sanity checks, benchmarks, and sensitivity testing come in.

Edge case verification asks: what happens at the boundaries? Zero values, negative inputs, the first period, the last period, maximum leverage—these boundary conditions reveal fragility in the model's logic.

Most people jump straight to numerical verification. They look at the IRR and ask "does this seem right?" That's necessary but insufficient. A model can produce plausible outputs while being structurally flawed or containing formula errors that happen to cancel out in the base case.

Structural Verification: The Foundation

Before examining any formulas, confirm that the model is built the way you intended.

Start with orientation. If you asked for time flowing across columns, verify that's what you got. If you expected line items in rows, confirm they're there. This sounds trivial, but we have seen LLMs occasionally invert the structure, especially when the prompt was ambiguous.

Check section completeness. Walk through each section you specified and confirm it exists. Missing sections are easy to spot if you're looking for them, invisible if you're not. A model without a debt schedule might still produce IRR outputs—they'll just be wrong.

Examine section connections. Where one section feeds another, trace the linkage. Does the NOI from the operating statement flow into the cash flow? Does the debt service come from the debt schedule? Does the terminal value calculation reference the correct exit year NOI? These connections are where LLMs often make subtle errors.

Verify the time structure. Count the periods. Confirm the start date. Check that stub periods are handled correctly. If the model should run from acquisition through a five-year hold with a month-one stub, verify that's what exists—not a clean five years starting in month one.

A useful technique: before looking at any numbers, write down what you expect to see structurally. Then compare. The act of articulating expectations surfaces assumptions you didn't know you had.

Formula Verification: Reading What the Model Actually Says

This is the most tedious layer and the one most often skipped. It's also where many LLM errors hide.

The core practice is formula tracing. Select a cell, examine its formula, and understand exactly what it computes. Then check whether that computation is correct for what the cell represents.

There's a technique we call "read it aloud." Take a formula and translate it into plain English. "This cell takes the value in D15, multiplies it by one plus the value in C3, but only if E15 is greater than zero, otherwise it returns zero." Does that sentence match what the cell should do? Often, reading the formula aloud reveals logical errors that are invisible when staring at Excel syntax.

Reference checking catches a common LLM failure mode. The formula might be logically correct but point to the wrong cells. The revenue growth formula might reference last year's expenses instead of last year's revenue. The cells are nearby; the formula structure is right; but the result is garbage. Check every reference.

Pattern consistency matters for models with repeated calculations. If row 10 computes Year 1 NOI a certain way, rows 11-15 should compute Years 2-6 the same way. LLMs sometimes vary their approach across rows—perhaps using a different growth logic or referencing different assumption cells. Scan for inconsistency.

Pay special attention to time-based references. In a model with time flowing across columns, formulas that should look back one period (column C referencing column B) sometimes incorrectly reference a fixed column. The formula works in one period and breaks in others. Test the formula in multiple periods, not just the first.

A practical approach: don't try to verify every formula. Focus on formulas that drive major outputs: the cash flow calculation, the IRR inputs, the debt service, the terminal value. Get those right and smaller errors matter less.

Numerical Verification: The Sanity Check Layer

Now you can look at the numbers.

The first line of defense is identity checks—relationships that must be true by definition. Sources must equal uses. Cash in must equal cash out. Debt plus equity must equal total capitalization. If these don't hold, something is wrong at a structural level. Build check cells that compute these balances and flag non-zero differences.

Next are sanity checks—tests of reasonableness. Does NOI divided by purchase price roughly equal the going-in cap rate? Does the debt service coverage ratio fall within lender requirements? Is the levered IRR higher than the unlevered IRR (assuming positive leverage)? These aren't proof of correctness, but failures are proof of errors.

Benchmark testing compares outputs to known examples. If you've modeled this property type before, how do the returns compare? If you have a template or a colleague's model, do similar inputs produce similar outputs? Significant deviations demand explanation.

Sensitivity testing probes whether the model behaves correctly as inputs change. Increase the purchase price—does IRR decrease? Reduce the exit cap rate—does terminal value increase? Raise vacancy—does NOI fall? The direction of change should match intuition. If it doesn't, trace the anomaly.

A powerful technique: back-solve to a known answer. If you know what the IRR should be given certain inputs, can you adjust the model to produce that IRR? If the required adjustments seem unreasonable, the model logic may be flawed.

Edge Case Verification: Finding the Boundaries

Models often work in the middle of their input range and break at the edges.

Zero is the most common edge. What happens when vacancy is zero? When rental growth is zero? When the interest rate is zero? Formulas that divide by inputs will produce errors. Conditional logic may not handle the zero case. Test it.

Negative values appear in scenarios people don't usually model—but that the model should handle gracefully. What if NOI goes negative during lease-up? What if the cash flow in a particular period is negative? Some formulas assume positive values and produce nonsense with negatives.

First and last period require special attention. The first period often has unique logic: a stub month, acquisition costs, initial funding. The last period has terminal value calculations, exit costs, disposition proceeds. These are where LLMs most frequently make errors because the logic differs from the standard periods.

Maximum and minimum values test the extremes. What happens at 100% leverage? At 0% leverage? At a 20% vacancy rate versus a 2% vacancy rate? Edge cases reveal hidden assumptions.

When edge cases fail, you've learned something important about the model's boundaries. Sometimes the failure is acceptable—you don't expect the model to handle negative interest rates. But you should know where the boundaries are, and LLM-generated models don't come with documentation of their own limitations.

Building Verification Into Your Workflow

Verification shouldn't be a single event at the end. Integrate it throughout the model-building process.

After each major section, pause and verify. When the debt schedule is complete, check it before moving to the cash flow. When the cash flow is complete, verify it before building the returns calculations. Catching errors early prevents them from propagating.

Build check cells into the model itself. A row at the bottom that computes sources minus uses and should always equal zero. A cell that tests whether the IRR calculation references the correct cash flows. These automated checks catch errors you introduce during later modifications.

Create verification checklists for model types you build repeatedly. A multifamily acquisition model has predictable verification points: vacancy correctly applied, rent growth compounding properly, debt sized to constraints, waterfall distributions summing to available cash flow. Write these down and walk through them.

When you find an error, don't just fix it—ask why you missed it initially. Was it a type of error you weren't looking for? A section you didn't check? An edge case you didn't test? Each error is feedback on your verification process.

The "Explain It Back" Technique

One of the most powerful verification tools involves the LLM itself.

After the model is built, ask the LLM to explain how specific calculations work. "Walk me through how the IRR is calculated in this model. What cells does it reference? What time period does it cover?"

Compare the explanation to your expectations and to what the formulas actually do. Discrepancies reveal either errors in the model or errors in your understanding. Both are worth catching.

You can also ask the LLM to identify potential issues. "What are the most likely errors in this model structure? Where should I focus my verification?" LLMs are often good at identifying their own failure modes in the abstract—better than they are at avoiding them in practice.

This isn't a substitute for your own verification. It's an additional layer that occasionally catches things you missed.

A Complete Verification Pass

Here's what a thorough verification process looks like in practice.

Begin with structural verification. Open the model and scan the layout without looking at any numbers. Confirm sections exist and are organized correctly. Trace the flow from inputs through calculations to outputs. Check that time periods are correct.

Move to formula verification. Select the cells that drive major outputs—the cash flow, the IRR inputs, the terminal value, the debt service. Trace each formula. Read it aloud. Check references. Look for consistency across time periods.

Proceed to numerical verification. Check identity balances first. Then sanity checks against known relationships. Compare outputs to benchmarks if available. Run sensitivity tests on major inputs.

Finish with edge case verification. Test zeros. Test the first and last periods. Test extremes. Document any boundaries you discover.

Record what you checked and what you found. This documentation matters when you return to the model later, when someone else reviews it, or when you're trying to understand why an error slipped through.

The Professional Standard

Verification is what separates models you can trust from models you can't.

An unverified LLM-generated model is a liability. A thoroughly verified one is an asset. The LLM accelerated the building process; your verification made the output reliable.

This takes time—often nearly as long as building the model yourself would have taken. But it's a different kind of time. Verification is more methodical, less creative, easier to do while tired or distracted. And it produces a result you can confidently put your name on.

The discipline of verification also makes you a better prompter. As you internalize the errors LLMs commonly make, you learn to specify more precisely, to request structures that are easier to verify, to anticipate problems before they occur.

Eventually, verification becomes instinct—not something you do after the model is done, but a continuous process woven into how you work. That's when you know you've mastered it.

Next to the Series

In Part 5: Context and Collaboration, we will cover the final and the most important topic in AI for Excel modeling: how to work with AI to build models together.

About Apers AI

Apers AI was founded by researchers from Harvard, Yale, and MIT with backgrounds in quantitative asset pricing and institutional real estate. Our mission is to advance the science of capital deployment by applying autonomous agents and machine reasoning to real asset markets.

/ APERS

The End-to-End Automation System for
Real Estate Capital

Unifying your deals, workflows, strategies, and knowledge into one autonomous system.
Book A Demo
Start for free