Wednesday Aug 20, 2025

Computation and Language - Generics and Default Reasoning in Large Language Models

Hey PaperLedge learning crew, Ernis here, ready to dive into some seriously fascinating stuff! Today, we're tackling a paper that asks: Can AI really think like us when dealing with everyday assumptions?

Think about it: we make assumptions all the time. "Birds fly," we say. But what about penguins? That's where things get interesting, and that's what this research is all about.

So, what did the researchers actually do? They put 28 of the biggest, fanciest Large Language Models – think of them as the brainiest AI students in the class – to the test. They gave them 20 scenarios involving what are called "generic generalizations." These are statements like, "Ravens are black." Seems simple, right?

But here's the catch: generic generalizations aren't hard and fast rules. They have exceptions. It's like saying, "Coffee is hot." Usually true, but not always! Iced coffee, anyone?

These "generics" are super important because they’re at the heart of how we reason, how we learn, and how we form concepts. When we see a bird, we assume it can fly unless we have a reason to think otherwise. It's default reasoning, and it's something humans are pretty good at.

Now, the results... well, they're a mixed bag. Some of these AI models did surprisingly well with certain reasoning problems. But performance varied wildly! It was like some students aced the test while others completely bombed it.

Here's a key takeaway:

"Most models either struggle to distinguish between defeasible and deductive inference or misinterpret generics as universal statements."

What does that mean in plain English? It means these AI models often have trouble understanding that some rules have exceptions. They might treat "Birds fly" as "ALL birds fly," which, as we know, isn't true. They struggle with nuance.

They also tried different "prompting styles," which is basically how they phrased the questions to the AI. "Few-shot prompting," which is like giving the AI a few examples to learn from, helped a little. But "chain-of-thought prompting," where the AI is asked to explain its reasoning step-by-step, actually made things worse in some cases! It's like overthinking the problem and getting confused.

Imagine trying to explain to someone how to ride a bike. Sometimes, the more you explain, the more confusing it becomes!

So, why does this matter? Well, if we want AI to truly understand and interact with the world like we do, it needs to be able to handle these kinds of assumptions and exceptions. Think about AI being used in:

Medical diagnosis: Doctors make assumptions based on symptoms, but they also know that there can be exceptions.
Legal reasoning: Laws are often based on general principles, but lawyers need to be able to argue for exceptions.
Everyday conversation: We rely on shared assumptions to understand each other. If AI can't do that, conversations can become frustrating and nonsensical.

This research shows that while AI has come a long way, it still has a ways to go when it comes to understanding the nuances of human reasoning. It highlights the gap between simply processing information and truly understanding it.

Here are a couple of things that I was thinking about after reading this paper, and I'd love to hear your thoughts:

If chain-of-thought prompting hurt performance in some cases, what does that tell us about how AI actually "thinks" (or doesn't think!)? Are we anthropomorphizing these models too much?
How can we design AI systems that are better at handling exceptions and uncertainty, instead of just relying on rigid rules? Could we teach them to be more like really good poker players?

That’s all for this episode, learning crew. Let me know your thoughts on this paper! Until next time!

Credit to Paper authors: James Ravi Kirkpatrick, Rachel Katharine Sterken

Comment (0)

No comments yet. Be the first to say something!