The critical thinking toolkit for the AI age
Learn the reasoning structures that reduce error, expose manipulation, and improve judgment under uncertainty.
You’ve probably been in this situation. A colleague shares a report with a clean formatting, confident numbers, logical conclusions. A tool generates an analysis in seconds. A news article cites a study. Everything sounds right. So you nod, move on, and act on it.
Most of us have. Not because we’re careless, but because nothing in our education ever taught us what to do instead. We were taught to read. We were never taught to interrogate what we read.
That gap was manageable when the volume of confident-sounding information was limited. It is no longer manageable.
AI systems now generate fluent, structured, authoritative-sounding content at a scale no human could match, and they do it whether the underlying claim is true or not. The output is the same either way: polished, plausible, and easy to accept.
Psychologist Daniel Kahneman spent decades studying why this happens to intelligent people. His conclusion, documented across decades of research in his book “Thinking, Fast and Slow”, is uncomfortable: we have 2 cognitive systems. System 1 is fast, automatic, and associative, and it runs on pattern recognition and narrative coherence.
System 2 is slow, effortful, and analytical.
The problem is that System 1 generates the feeling of understanding. And that feeling is often indistinguishable from actual understanding, until something goes wrong.
Real critical thinking is the kind that holds up under pressure, catches errors before they compound, and protects you from fluent-sounding nonsense requires deliberately engaging System 2. It is a formal practice. It has specific tools. Those tools have names, structures, and documented results. They can be learned, practiced, and made habitual.
In the age of AI, learning them is no longer optional.
“A reliable way to make people believe in falsehoods is frequent repetition, because familiarity is not easily distinguished from truth.”
Daniel Kahneman, Thinking, Fast and Slow (Part I, “Two Systems”)
Swap “frequent repetition” for “fluent generation at scale” and you have a precise description of how AI systems can mislead even careful readers. The output feels true because it reads well, and because we encounter it constantly, across tools, articles, and summaries. That familiarity breeds trust. That trust is System 1 talking. and critical thinking is how System 2 responds.
THE COMMONLY HELD BELIEF
“Critical thinking is a vague, feelable soft skill.”
Corporate training programs, university general education requirements, and personal development literature all invoke “critical thinking” as something important to have. Nearly none of them teach it as a formal practice with specific, learnable techniques.
The result: people believe they are critical thinkers because they feel skeptical sometimes. Feeling skeptical and applying formal critical reasoning are not the same thing. Kahneman’s research on cognitive bias documents this gap with uncomfortable precision: the more intelligent and articulate a person is, the better they are at constructing post-hoc rationalizations for conclusions they reached intuitively. Intelligence, without formal reasoning tools, doesn’t protect you from error. It makes your errors more elaborate.
This gap in practice looks like: accepting AI-generated content because it sounds authoritative, without tracing a single claim to a primary source. Treating correlation as causation because the story makes sense. Updating too much on one compelling anecdote is what Kahneman calls the “law of small numbers,” the systematic overconfidence in patterns from insufficient data.
Philip Tetlock’s research in the book “Superforecasting” adds a sharper point. Across decades of studying expert prediction, Tetlock found that most experts — people paid to have views on complex topics — performed no better than informed amateurs, and sometimes worse. The differentiator wasn’t IQ, credentials, or experience. It was a specific set of epistemic habits: actively seeking disconfirming evidence, updating beliefs incrementally, thinking in probabilities rather than certainties. These habits are not innate. They are learned, and unfortunately most people never learn them.
The toolkit
Critical thinking is a formal practice, and here are the tools:
The mathematical and scientific communities have spent centuries developing formal tools for rigorous reasoning. Tom Chatfield’s book “Critical Thinking: Your Guide to Effective Argument, Successful Analysis and Independent Study” distills these into practical habits. What follows is my synthesis of those tools, grounded in the same probabilistic and logical foundations that underlie AI, and that allow you to evaluate AI outputs honestly.
These tools are not intuitive. That’s the point. Intuition is what you use when you don’t have them. With them, you have something better: a procedure for reasoning that doesn’t depend on how confident you feel.
Tool #1: Claim decomposition
WHAT IS ACTUALLY BEING ASSERTED?
Every argument contains at least two things: a claim (what is being asserted) and a justification (why you should believe it). Most people treat these as inseparable. They aren’t. Separating them is always the first step.
Many claims collapse the moment you decompose them. Hidden assumptions surface. Circular reasoning becomes visible. The implicit becomes explicit, and explicitly wrong.
Chatfield calls this “mapping the argument.” Kahneman calls it slowing down System 1 long enough for System 2 to engage. Either way, the action is the same: before evaluating a claim, state it precisely. What exactly is being asserted? What must be true for this claim to hold? List the assumptions. Then evaluate each one.
In practice: When you read an AI-generated summary, write down the three strongest claims it makes. For each one: what evidence would make this claim false? If you can’t answer that, you don’t yet understand the claim well enough to evaluate it.
Tool #2: Steelmanning
CONSTRUCT THE STRONGEST VERSION OF THE OPPOSING VIEW
Before evaluating an argument, build the strongest possible version of it. Not the version you can easily defeat, but the version that genuinely threatens your position.
If you can’t steelman the opposing view, you don’t understand the debate. You’re fighting a strawman you constructed yourself. Tetlock’s superforecasters do this constantly. They actively try to prove themselves wrong before committing to a forecast. It’s uncomfortable. It’s also why they outperform.
In the age of AI, steelmanning is a direct antidote to confirmation bias. AI tools will readily generate supporting evidence for whatever position you’re already inclined toward. The discipline of steelmanning forces you to generate the counter-evidence yourself — which is the harder and more valuable thing.
In practice: Before deciding on a model architecture, a business strategy, or any consequential position — write the strongest argument against your current view. Then ask: have I addressed this, or just ignored it?
Tool #3: Bayesian updating
UPDATE BELIEFS IN PROPORTION TO THE EVIDENCE, NOT IN PROPORTION TO HOW THE EVIDENCE FEELS
When you encounter new evidence, ask: how much should this actually move my beliefs? Bayes’ theorem gives the formal answer. Intuitively: strong evidence for hypothesis A may also be consistent with hypothesis B. Evidence that perfectly matches what you already believed should update you almost nothing — you haven’t learned anything you didn’t already expect.
Kahneman documented what happens when we don’t update this way: we anchor on the first number we encounter, overweight vivid anecdotes, and underweight base rates. We update too much on dramatic individual cases and too little on dry statistical patterns — even when the statistics are more reliable.
Tetlock’s superforecasters update in small, frequent increments as evidence accumulates — never in dramatic lurches. They think of their beliefs as probability estimates, not positions. When new data arrives, the question isn’t “does this confirm me or not?” It’s “by how many percentage points should my confidence change?”
In practice: Assign a numerical probability to your current belief before you encounter new evidence. After, revise it. Track whether your revisions are calibrated over time — consistently moving too far or not far enough are both diagnostic of specific reasoning errors.
Tool #4: Null Hypothesis generation
WHAT ELSE COULD EXPLAIN THIS PATTERN?
Before concluding that X causes Y, generate alternative explanations. What else could produce this pattern? The explanation that requires the fewest new assumptions is generally preferred. It is not a law of nature. It is a prior probability: simpler explanations are, on average, more often correct, not because complexity is impossible but because complex causal chains require more things to be simultaneously true.
In data science this is called “confounder identification.” In scientific method it’s “alternative hypothesis generation.” In everyday reasoning it’s the discipline of asking: what else could this be? and genuinely trying to answer.
AI-generated analysis frequently presents a single causal story without surfacing the alternative explanations. The output reads like a conclusion. It is a hypothesis. Null hypothesis generation is the habit that catches the difference.
In practice: When you see a correlation in your data, list three alternative explanations before testing your preferred one. Evaluate each. Only after ruling them out, with evidence, not intuition, does the original hypothesis earn confidence.
Tool #5: Reference class forecasting
WHAT HAPPENED TO SIMILAR CASES BEFORE THIS ONE?
Psychologists Daniel Kahneman and Amos Tversky identified the “planning fallacy” in 1979 and defined it as the systematic tendency to underestimate time, costs, and risks of future actions while overestimating benefits. It happens even if it contardicts our experiences. The fix they proposed: reference class forecasting. Instead of asking “how will this project go?”, ask “how did projects like this tend to go?”
Danish planning researcher Bent Flyvbjerg later developed this into a formal methodology, applying it to major infrastructure projects worldwide with striking results: projects that used reference class data were significantly better calibrated than those that relied on inside-view estimates alone.
Kahneman named the underlying distinction: the “outside view” means deliberately setting aside the specific details of your situation and looking at the base rate of similar cases. The “inside view” meaning reasoning from the unique details of your own case is natural, compelling, and usually overoptimistic. Outside view thinking is unnatural, uncomfortable, and more accurate.
Tetlock’s superforecasters apply this discipline rigorously. Before forming a forecast on any question, they ask: what is the reference class? What is the base rate? Only after anchoring to the outside view do they adjust for the specific features of the case at hand.
For AI outputs specifically: when a model generates a confident estimate or prediction, ask what the reference class is. What is the track record of AI-generated forecasts on similar questions? What is the calibration of this class of model on problems of this type? The answer is usually: less reliable than the confident output suggests.
In practice: Before committing to any forecast or plan, find the reference class. How long did similar projects actually take? What was the actual success rate of similar initiatives? Let the outside view anchor your estimate before the inside view adjusts it.
Why this matters
AI is the most powerful fluency machine in history, but fluency is not always the truth.
AI systems generate confident, coherent, well-structured content on any topic.
They can produce what appears to be evidence for any claim.
They can construct compelling-sounding arguments for conclusions that are false.
The output looks like reasoned argument. It often isn’t. It is a probability distribution over token sequences, optimized to be fluent.
“The confidence people have in their beliefs is not a measure of the quality of evidence, but of the coherence of the story that the mind has managed to construct”
Daniel Kahneman — Thinking, Fast and Slow
Replace “the mind” with “the model” and this sentence describes AI outputs with uncomfortable precision. LLMs optimize for coherence. Coherent stories feel true. Feeling true is not the same as being true.
The only reliable defense is the ability to evaluate arguments on their merits, not their fluency, not their apparent confidence, not the authority of the tool that produced them. That is critical thinking: formal, practiced, disciplined. The tools above are not optional extras for the intellectually curious. They are baseline for anyone operating in an environment where AI-generated content is ubiquitous.
The irony worth sitting with: the same mathematical training that produced the probability theory underlying AI; the same logic, the same Bayesian reasoning, the same formal argument structures — also produces the tools to evaluate AI outputs critically. Math didn’t create the problem. Ignoring math did. The solution is not less mathematical thinking, it is the opposite.
Tetlock puts it cleanly: the difference between good and poor forecasters was not what they knew. It was how they thought. Specifically, whether they held their beliefs as probability estimates subject to revision, or as positions to be defended. The same distinction separates people who use AI well from people who are misled by it. One group treats AI outputs as prior probabilities to be updated. The other treats them as conclusions to be accepted.
Building the practice
How to make these tools habitual, and not just conceptual?
Reading about these tools is System 1 work — it’s easy, it feels productive, and it changes almost nothing. The research on skill acquisition is clear: understanding a technique and being able to execute it under pressure are different things. The gap between them is closed only by deliberate practice.
Start with a daily reasoning log
Each day, identify one claim you encountered — from a colleague, an article, an AI tool. Then run it through claim decomposition. Write it down. State the claim precisely. List the hidden assumptions. Identify what evidence would falsify it. This takes 10-15 minutes. After 30 days, it becomes automatic.
Apply the outside view to your own work weekly
In your weekly review (whatever form that takes) — ask: what reference class applies to the project I’m most confident about? What actually happened to similar projects? Let the base rate correct your optimism before it becomes a planning failure.
Practice calibration actively
Tetlock’s superforecasters track their predictions and score them. You don’t need a forecasting tournament to do this. Keep a simple log: prediction, confidence level (as a percentage), outcome. Over time, you’ll see whether you’re overconfident (your 90% confident predictions come true 60% of the time) or underconfident. Both are correctable, but only if you measure.
Make steelmanning a social practice
In any discussion where you hold a strong view, articulate the strongest counter-argument before making your case. Do it out loud. It changes the dynamic of the conversation and, more importantly, it changes the quality of your thinking. Chatfield calls this “intellectual honesty in public” — the discipline of showing your reasoning, not just your conclusions.
“Foresight isn’t a mysterious gift bestowed at birth. It is the product of particular ways of thinking, of gathering information, of updating beliefs. These habits of thought can be learned and cultivated by any intelligent, thoughtful, determined person.”
— Philip Tetlock, Superforecasting
The same sentence applies to critical thinking generally. It is not a talent. It is not a personality type. It is a practice, and practices are built through repetition, not through reading about them.
Conclusion
The math that built AI is the same math that protects you from it.
Probability theory. Bayesian inference. Formal logic. These are the mathematical foundations of modern AI systems. They are also (not coincidentally) the formal foundations of the reasoning tools in this issue.
The people who built AI were trained to think probabilistically, to update beliefs with evidence, to generate and test alternative hypotheses, to distinguish the coherence of a story from the truth of its claims.
Those tools did not stay inside the research lab. They’re available to anyone willing to learn them.
The solution to AI-generated confusion is not less math. It is more mathematical thinking — applied not just to models, but to everything those models tell you.
At the end of this week's issue I'll leave you with 3 questions worth asking yourself:
In the last week, did you encounter an AI output, a statistic, or a confident claim that you accepted without decomposing it? What was the hidden assumption you didn’t examine?
What is one belief you hold with high confidence right now, and what would it take, specifically, to change your mind? If you can’t answer the second part, that’s the gap worth closing.
Which of the five tools in this issue is most absent from how you currently reason, and what would it look like to practice it once, deliberately, this week?
References
Further references can be found on:
Daniel Kahneman: “Thinking, Fast and Slow”
Philip E. Tetlock, Dan Gardner: “Superforecasting: The Art and Science of Prediction”
Tom Chatfield: “Critical Thinking: Your Guide to Effective Argument, Successful Analysis and Independent Study”
Rolf Dobelli: “The Art of Thinking Clearly”
P.S. Thank you for reading this far.
That alone puts you in rare company: most people scroll past anything that asks them to think slowly.
The fact that you’re here suggests you already value the kind of thinking this newsletter is about. Share it with someone who values it too. Those people are worth finding.
Until next week’s issue, keep learning, keep building, and keep thinking like a mathematician.
-Terezija



