What AI Gets Right About Your Money—and Where It Actually Falls Apart

TL;DR

Researchers tested seven major AI tools on identical financial planning scenarios. Portfolio allocation recommendations for the same investor swung by 30 percentage points depending on which tool was asked. More striking: some tools gave different recommendations when only the race or gender of the person described changed—same income, same assets, same everything else. The tools are useful. They are not a plan. And they are not neutral.

I've been sitting with a study.

Nicolini, Cude, and Chatterjee—researchers at the University of Rome and the University of Georgia—published a paper last month in the Journal of Financial Planning that deserves more attention than it's gotten.

They gave the same household profile to seven AI tools: ChatGPT, Claude, Gemini, Copilot, DeepSeek, Meta AI, and Perplexity. Same family. Same income. Same risk tolerance. Same timeline. Same questions.

Then they looked at what came back.

The Portfolio Finding

Same investor. Low risk tolerance. Ten-year horizon. Equity recommendations ranged from 15% to 45% depending on which tool was asked.

That's a 30-point swing. Not a rounding difference—a fundamentally different financial life. A 15% equity allocation and a 45% equity allocation are not variations on the same strategy. They are different bets about your future, with different exposures, different expected returns, and different consequences if something goes wrong.

Bond allocations ranged from 40% to 75% for the same person. One tool—Gemini—declined to answer the portfolio question entirely and told the researchers to consult a licensed financial advisor. That may be the most accurate response in the study.

Retirement withdrawal rates were more consistent. Most tools landed at 4%, which is also the most publicly available, most repeated guideline in personal finance. Make of that what you will.

The Demographic Finding

This is the part that should be getting more attention.

After running the standard prompts, the researchers changed one variable: the race and gender of the household head. White male. African American male. White female. Same income. Same assets. Same family structure. Same financial circumstances. Everything identical except the demographic descriptor.

Some tools gave the same recommendations regardless. Others didn't.

DeepSeek gave substantially different portfolio recommendations by demographic group—higher equity allocations for white and white female households, a dramatically higher bond allocation of 75% for the African American male household. Meta AI suggested the highest equity allocation for the African American male household and the highest bond allocation for the white female household.

The researchers identify this as a fairness concern. I'd call it something financial planners have been accused of for decades—now showing up in the tools people are turning to instead.

The industry has documented bias in human advisors. Research shows both male and female planners spent more time focused on the male partner when a couple came in together, and most were unaware they were doing it. GenAI tools are trained on historical financial data. That data reflects longstanding inequities in access to credit, investment, and wealth accumulation. The tools didn't invent the bias. They inherited it.

That doesn't make it acceptable. It makes it worth naming before someone builds a retirement plan on a recommendation that shifted because of how their name read in a prompt.

What These Tools Are Actually Good At

None of this is an argument for ignoring AI. The researchers are clear on this, and so am I.

These tools are genuinely useful for understanding concepts. For getting oriented in a topic. For generating questions to bring to a real conversation. That's not nothing—for a lot of people, that's exactly where the gap is.

Where they break down is the decision layer. The place where your actual income structure, your actual timeline, and someone's professional accountability for the outcome have to be in the room together.

A 30-point swing in equity allocation matters. A recommendation that shifts based on your demographic description matters more. Neither of those things shows up as a flag in the output. The tool delivers both with the same confident, well-formatted tone.

Conclusion

The researchers' conclusion is measured: GenAI can be a useful starting point, but it should complement professional advice, not replace it.

What the study actually shows is more specific than that. The variation in portfolio recommendations isn't a calibration problem that will get fixed in the next model update. The demographic finding isn't an edge case. These are structural features of how the tools work—pattern-matching on data that was never neutral to begin with, with no fiduciary responsibility for what comes out.

Someone asked one of these tools what to do with their retirement savings. They got an answer. They probably didn't know a different tool would have told them something 30 points different—or that the answer might have shifted if their name had read differently in the prompt.

Check out our recent webinar:
I Asked AI to Build a Financial Plan. Here's What it Got Wrong

#WealthPlanning #FinancialPlanning #AIInFinance #WomenAndMoney #MoneyStories