Blog Details

The Secrets Behind Large Language Models: How AI "Understands" Our Words?

March 01, 2025

Everawe Labs

Recently, while chatting with clients about AI, I noticed their curiosity was off the charts, especially about tools like ChatGPT, which can chat, write articles, and even help debug code. However, many of them get a little confused when asked, "How does it really understand what we’re saying?" Today, let’s break this down and explore the secrets behind how AI "understands" language.

Ancient cave interior with intricate hieroglyphic and pictographic carvings covering the curved stone walls and ceiling, depicting human figures, animals, and symbolic representations, illuminated by warm golden light, representing the hidden knowledge and complex inner workings behind large language models

From Cave Paintings to Modern AI

Around 40,000 years ago, early humans left images and symbols in caves across Europe. Although there were no written words, these paintings likely recorded life and communicated information. For example, the paintings in Lascaux and Altamira caves show running wild buffalo, flocks of birds, and other scenes, possibly related to hunting or seasonal changes. Although we don’t have direct translations, archaeologists have inferred their meanings through patterns and context. Just like how cave painters used symbols to document their lives, AI learns patterns in language through data.

Turning Language into Numbers

Humans speak with our voices, ears, and eyes, but computers only understand 0 and 1. So how can they communicate with us? The first step is to “translate” language into numbers. Early on, this method was pretty straightforward: assign a number to each word. For example, “apple” is 1, “banana” is 2, and “orange” is 3. This way, the computer can recognize words using numbers. But here's the problem: the computer has no idea that "apple," "banana," and "orange" are all fruits. To it, 1, 2, and 3 are just three unrelated numbers, with no sense of “family” between them.

In 2013, a new technique emerged called “word embedding.” This time, instead of just slapping a label on each word, we give each word a string of numbers—a unique "digital portrait." What’s cool about this is that words with similar meanings have similar portraits. Imagine all the words in a big square: “violin” and “double bass” are close because they’re both string instruments, playing the same type of music; “violin” and “guitar” are also nearby, since both are string instruments, though played differently; but “violin” and “piano” are far apart, as one is plucked and the other is struck. This method allows the computer to understand the “family relationships” between words, no longer just guessing at random. Even cooler, you can do math with these “digital portraits.” For example: “uncle” minus “man” plus “woman” equals “aunt.” It’s like the computer really understands the gender relationship between “uncle” and “aunt”! This is the first time AI shows any "understanding" of word meanings.

Language Prediction and Generation

With words now digitized, what does AI do next? It’s pretty simple: predict the next word. Just like when we play word association, if I say, “I went to the park yesterday,” you’ll likely follow up with “for a walk” or “to walk the dog,” but not “to fly a plane.” Why? Because your brain knows the grammar, the common sense, and the context. AI works the same way. It’s read tons of articles and has figured out the “social circles” of words. After the word “eat,” it often sees words like “food” or “bowl,” but rarely “rocket.” AI uses this trick to chat, write, and tell stories. It predicts one word, then uses that word to predict the next, building a sentence like a snowball rolling down a hill. But can just guessing words like this lead to writing a PhD thesis or a heartfelt poem?

The real breakthrough came in 2017 when Google introduced the Transformer model, which gave AI a much stronger "focus" ability. The “attention mechanism” in Transformer allows AI to quickly zoom in on key information from a massive amount of data. This helps AI understand complex sentence structures, identify “who” and “what,” and organize the logic and flow of longer paragraphs. Sounds a bit technical, but the idea actually comes from our reading habits. When you read an article, don’t you automatically skip the boring parts and go straight to the important stuff? For example, if I say, “Lily left the umbrella in the car, and then she went to buy bubble tea,” you immediately know that “she” refers to Lily, not the umbrella. That’s “attention.” What’s even more impressive is that Transformer is like having eight eyes—it can focus on several things at once: word relationships, sentence logic, and even the larger context of the paragraph. This allows AI to generate stories that aren’t just a jumble of random words, but something with structure and meaning. This revolutionary technology laid the foundation for GPT (Generative Pretrained Transformer) models and completely changed the way AI communicates with humans.

Does AI Really "Understand"?

So where does AI’s “knowledge” come from? It learns from massive amounts of text data. The way AI learns is like a “fill-in-the-blank” game. For example, take the sentence, “Football is Brazil’s passionate belief,” and ask AI what word should go in the blank. By continuously training and adjusting, AI learns the patterns of language. This process helps it predict which word combinations are likely and which words are more likely to appear in specific contexts. However, AI doesn’t understand the actual meaning of these concepts. It’s never played football, so it can’t feel the excitement of a game or the thrill of fans in the streets. It simply guesses and generates based on patterns in the data, lacking real emotion or experience. It’s like a super-efficient copy machine—able to reproduce an ancient poem perfectly, but never truly understanding the poet’s passion. This leads to some typical behaviors of AI: sometimes it makes stuff up, lacks common sense, or is even creatively uninspired.

————

Currently, AI is advancing in more complex directions. Technologies like multimodal learning, causal reasoning, embodied AI, and affective computing are being explored in an effort to evolve AI from a simple “parrot” into a more perceptive and emotionally-aware partner. However, even with these advancements, AI still faces huge challenges. Human “understanding” isn’t just about processing information; it’s deeply tied to consciousness, emotions, and self-awareness. Philosopher John Searle’s famous “Chinese Room” thought experiment illustrates this point: an individual who doesn’t understand Chinese can still use a dictionary to translate Chinese questions into answers, which may look like understanding, but in reality, it’s just following rules. AI is a bit like this “Chinese Room”—no matter how smart it gets, it’s still just “following the dictionary.” It’s pretty fascinating: the more AI tries to “mimic” us, the clearer it becomes how unique we are. Like a mirror, AI reflects our wisdom but reminds us that human understanding isn’t just about data and algorithms—it’s the symphony of heartbeats, tears, and laughter.

Fast Take

Ever wondered how AI really "gets" what we're saying? The answer might surprise you. Dive into the world of language models and uncover the secrets behind how machines learn to communicate like humans—without truly understanding a thing. Ready to see how deep this rabbit hole goes?

The Secrets Behind Large Language Models: How AI "Understands" Our Words?

Fast Take

Ever wondered how AI really "gets" what we're saying? The answer might surprise you. Dive into the world of language models and uncover the secrets behind how machines learn to communicate like humans—without truly understanding a thing. Ready to see how deep this rabbit hole goes?

Share Now