Link to the search page

Illusion of Rationality

This article explores the fundamental differences between human thinking and the way Large Language Models (LLMs) generate responses. It illustrates how human decision-making involves multiple, often evolving, acceptance criteria, whereas LLMs rely on statistical probabilities to determine the most likely next word.

et-2025-02-ingram-hero.jpg

We decided to go out to dinner, since neither of us wanted to cook. I said, “Let’s go to Jake’s, it's close and I like their steak sandwiches.” Then I hesitated. I thought some more about Jake’s. Who might we run into at Jake’s? Did I need to change my clothes to go to Jake’s? Would they have the table open that we liked? What if I saw the maître d' who went to the same gym that I hadn’t been to in two months? Where would we end up parking? Sometimes the lot was full and we had to park around the block. And, oh yes, what if they were out of the rolls for the steak sandwich and I had to order something else? 

That is how it goes. I had originally two criteria for accepting a restaurant and by the time I was done I had five or 10. Probably more if we ever found out what they all were. 

You can look at the difference between human thinking and the thinking of Large Language Models as similar to that difference between my first thought and the later musings. Generally, the LLM only has one acceptance criteria and humans are likely to have several. The acceptance criteria of the LLM is that it ranks high on a statistical test. The human criteria involve a number of real concerns that are given all different weightings.  

But the answers that the LLM gives sound so human, we often imagine that there is a thought process behind those words that is much more similar to our own thinking. But there isn’t. And, as AI and LLMs in particular are used more and more in our businesses and our lives, it becomes more and more important to remember that. 

You have doubtless heard that an LLM is nothing more than a text completion engine on steroids. And that is a largely correct characterization. I have seen it illustrated as a partially completed sentence with a list of possibilities for the next word. Each possibility has a probability next to it. And under normal circumstances, the model will choose the word with the highest probability next to it. Where did it get those probabilities, you ask? Well, sit back on a comfortable chair; it’s a long story. In simple terms, the LLM looked at billions and billions of sentences and calculated the probability from all of them. In fact, it is much more complicated. Simply stringing together highly likely words would not actually work. Sentences require some level of contextual coherence and attention to the broader thrust of communications about a topic. These types of objectives are met by the use of model weights that are developed over weeks long model training processes. The end result is that not just the words but also the entire sentences and key phrases in the sentences are subject to acceptance criteria. 

And that is where your prompt comes in. In addition to the weights, the information that you give the LLM in your prompt causes the LLM to narrow down the universe of sentences that it uses to calculate the percentages. And a training process helps the model to steer away from some bad choices and toward some better choices. But if you ask it which restaurant to eat at tonight, it will not consider what the food might be unless you specifically tell it to. And then, all that will happen is that the LLM will look at probabilities for the next word based upon the sentences that your prompt causes it to use for calculating those percentages. At the point it is making those choices, it is no longer using English. All of the words have been translated into numbers since numbers are the native language of computers. 

Human Thinking

In most cases, human thinking follows a certain pattern. When a human encounters a problem, they will spend some time making sure that they understand the problem, then they gather data and insights about the problem. Then they imagine possible ways to solve the problem. With those in mind, they decide on a method for assessing the possible solutions, which they then proceed to assess. They look at the results of their evaluation process and decide on their acceptance criteria, and, finally they choose their preferred solution. There are at least a dozen different ways that humans might do those steps but in most cases, humans will walk through some version of that process. 

And so will an LLM. But the big difference is what I mentioned above. It is the acceptance criteria. For a human, the acceptance criteria is something that they care about for one reason or another. It might be that they are concerned about how others will think of them, or how much it costs, or what it tastes like, or several of a thousand human acceptance criteria. The LLM only has one acceptance criteria. It is that probability. It doesn’t care what the answer is. It doesn’t understand anything about the reason why their choice has the highest probability.  

The different approaches to thinking that humans use include:

  • Critical Thinking—Involves objective analysis to evaluate evidence, identify assumptions, and avoid biases, ensuring sound and logical decision-making.
  • Systems Thinking—Examines how components of a system interconnect, emphasizing dynamic relationships, feedback loops, and holistic problem-solving.
  • Design Thinking—A human-centered, iterative approach to innovation that uses empathy, creativity, and prototyping to solve user-centric problems.
  • Creative Thinking—Generates novel ideas and challenges conventional approaches, fostering innovation when traditional methods fall short.
  • Ethical Reasoning—Evaluates decisions through moral principles, emphasizing fairness, accountability, and societal values to navigate ethical dilemmas.
  • Emotional Intelligence Thinking—Leverages awareness of emotions to navigate relationships, build trust, and guide decisions, particularly in collaborative contexts.
  • Intuitive Thinking—Draws on subconscious pattern recognition and experience to make rapid decisions, especially in ambiguous or time-sensitive situations.
  • Actuarial Thinking—Combines statistical methods, mathematical models, and professional judgment to manage uncertainty, particularly in financial contexts.
  • Risk Management Thinking—Focuses on identifying, assessing, and mitigating risks to balance potential losses with opportunities for resilience and growth.

All of those thinking processes follow the steps described above, but they all make different choices about what sort of data to collect, how to assess possible solutions, and what are the selection criteria. But in all cases, the selection criteria are all human concerns of one type or another. 

Pattern-Based Thinking

I am going to call what an LLM does pattern-based thinking. And the pattern-based thinking that LLM’s use is to some extent copied off the pattern-based thinking that humans use. Specifically, human babies and young children use pattern-based thinking to learn language. Before having language, you might have noticed that communication with your children was somewhat limited. How do they learn language? They learn it from months and even years of listening and observing the patterns that go with the sounds that they hear. They learn to associate the look and taste of a cookie with the sound “cookie,” at least if they are born into an English-speaking family. As they learn to hear the words, they eventually learn to speak as well, by trial and error. And once they have language, other forms of learning and thinking are available. 

The LLM never graduates from pattern-based thinking. It only ever sees the world as a string of next words and probabilities. It does not have any of the reasoning skills that a 10-year-old child has. 

But the problem is that the sentences LLMs create this way sound just like human sentences that were formed following human thinking. But when we work with an LLM, we need to remember over and over that the LLM didn’t think about the problem we asked it about.  It only thought about the probability of the next word. What does that mean? It means that an LLM never has a good reason for what it says in response to your question. It only has its probabilities. It only knows that when faced with a similar phrase, the plurality of responses conformed with its response. 

It is well known that LLMs sometimes “hallucinate,” giving an incorrect response that often sounds plausible. To the LLM, this is not at all a problem, it simply followed its usual thinking and gave you the response constructed word by word that conformed to its “thinking” rules. That is still a feature of LLMs that users must be prepared to find from time to time. Those hallucinations are often obvious to someone knowledgeable about a topic. But what that means to you and me is that when we really need a good answer, because we are not knowledgeable, we are more likely to be fooled by a hallucination. 

That means that we mustn’t rely on the answers from an LLM unless we have the means to validate the responses. And, it is not unusual that an LLM can answer a difficult query in a few seconds and it takes us an hour to validate it. 

I once asked an LLM to fill out a chart telling me which regulator was in agreement with which of a series of statements about the insurance industry. I thought it was absolutely fantastic to have that information to share with a client who was planning to expand into the countries under those regulators. But before handing it to the client, I decided I needed to check that work. I asked the LLM to create for me a work plan to do the job of checking every single entry in the chart. I estimated that it would take a week’s work to complete the validation. I didn’t have a week to give to this, but I started validating anyway, choosing the regulator where I had the best documentation, which should have been the easiest to validate. I looked at the first entry in the chart and after almost an hour of scouring the documentation, I found no evidence to support the entry that the LLM had made. But unlike a paper report, you can ask the LLM about the entry. It admitted immediately that it had no evidence that the regulator would agree with the statement. But it checked it off anyway because to the LLM it seemed likely that a regulator would agree with that statement. All of a sudden, the value of that chart went down to zero. 

I asked an LLM to think like an actuary and tell me how to determine whether a pension plan is adequately funded. It responded with these steps:

“As an actuary, determining whether a pension plan is adequately funded involves several key steps:

  1. Calculate the Funded Ratio: This is the ratio of the plan's assets to its liabilities. A funded ratio of 100% means the plan has enough assets to cover all its obligations. If the ratio is less than 100%, the plan is underfunded; if it's more than 100%, it's overfunded.
  2. Review Actuarial Assumptions: Actuaries use assumptions about future events, such as investment returns, salary growth, and life expectancy, to estimate the plan's liabilities. It's important to ensure these assumptions are reasonable and reflect current trends.
  3. Analyze the Plan's Investment Strategy: The plan's investment portfolio should be diversified and aligned with its risk tolerance and funding goals. Regularly reviewing the investment strategy helps ensure it remains appropriate over time.
  4. Evaluate the Plan Sponsor's Financial Health: The financial stability of the plan sponsor (e.g., the employer or municipality) is crucial. A financially healthy sponsor is more likely to meet its funding obligations.
  5. Monitor Regulatory Compliance: Ensure the plan complies with relevant regulations, such as the Pension Protection Act, which requires annual funding notices and sets funding standards for private-sector plans.
  6. Consider Long-Term Sustainability: Assess whether the plan is on track to achieve full funding over a reasonable period. This involves projecting future contributions and benefits under various scenarios.

By combining these steps, actuaries can provide a comprehensive assessment of a pension plan's funding status and recommend strategies to address any shortfalls.”[1]

But again, the LLM is not thinking like an actuary. It is saying the words that it finds to be the most likely that an actuary would say if faced with that question. If you gave it the data and asked it to perform the valuation, it would give you a nice sounding answer, but as you know, you cannot get the right answer to an actuarial valuation by giving the most likely answer. If you could, passing the actuarial exams would be much easier. 

Conclusion

The LLM is not thinking, at least not thinking in a way that you would recognize as thinking. Its thinking is more like the thinking of a young child. Most recently, the developers of LLMs have proudly announced that they have given their models the capability to answer questions like a college student. But what they didn’t tell you is that it is the thinking of a clever five-year-old with the vocabulary of a college professor. Get your mind around the implications of that!

The conclusion here is the most likely answer: Beware. The very capabilities that make LLMs useful—such as generating human-like text and producing coherent, contextually relevant answers—are also the source of their limitations. Since LLMs lack true comprehension and don't verify facts, their responses can seem credible but might not always be correct or trustworthy. These models can only provide the “Illusion of Rationality.”

Statements of fact and opinions expressed herein are those of the individual authors and are not necessarily those of the Society of Actuaries, the newsletter editors, or the respective authors’ employers.

[1] From Microsoft CoPilot.