The test sentence (neutral, fresh) Let’s take a new sentence, so readers see generality: “Emma thanked David politely.” We will follow one token’s journey (say, “thanked”) through the Transformer. Step 0 — Tokenization (splitting the input) For simplicity (as in the blogs), we treat words as tokens: [Emma] [thanked] [David]…
Tag: AIBasics
Attention at a Networking Event — Blog 5
No Cheating, Making Choices, and Saying the Next Word (Masking + Output Probabilities) This is the finale of our networking event. By now, every guest: The room is alive with understanding. But understanding alone is not enough. A language model must do one very specific thing: Say the next word….
Attention at a Networking Event — Blog 3
When the Mixer Finally Comes Alive (Self-Attention: Q, K, V) Until now, our networking event has been adorable but slightly awkward. Everyone is standing politely with their profiles (embeddings) and seat numbers (positional encodings), yet nobody has spoken to anyone. It’s like watching five well-dressed introverts circulating air. But language…
Attention at a Networking Event — Blog 2
Seat Numbers at the Mixer (Positional Information) In Blog 1, we got our guests into the room and gave them name tags (token IDs) plus mini personality profiles (embeddings). Everyone is officially “representable in numbers.” Nice. But there’s one awkward rule in the Transformer’s party hall: It has no built-in…
Part 4 – When the Model Starts Cheating: Overfitting, Underfitting, and Taming the Network
By now, our little exam predictor has grown up quite a bit: At this point, the network looks smart on paper. But now we hit a very human problem: The model can become that kid who memorises last year’s question paper perfectly…and still flops in the real exam. This is…
Part 3 – Activation Functions: Same Exam Story with Different Personalities
In Part 1 and Part 2, our little exam predictor learned how to adjust its knobs and walk downhill on the loss landscape. It became good at improving itself, but it still thought in straight lines. Real students, however, do not behave like straight lines. Too much study can hurt,…
Part 2 – How Neural Networks Actually Learn: Slopes, Steps, and Activation Drama
In Part 1 we built our tiny exam predictor: And we ended with this very important headache: “We know we must change the weights and bias to make the loss smaller.But which way should we change them, and by how much?” Today we answer exactly that. From Loss to Landscape:…
The Exam Score Story: What a Neural Network Is Really Doing (Part 1)
Let’s start with a tiny drama. You are a teacher.You want to guess how well a student will score in an exam. You have a theory: You want a small system that takes these two numbers: and gives you one number: That is it.No robots, no brain scans, no mysterious…
Cross-Validation Without Tears: How Playground Rules Can Teach Your Model to Behave in the Real World
If you have ever seen the term cross-validation and felt your brain quietly pack its bags and leave, you are not alone. On paper it sounds very “Mathy”.In practice, it is just a disciplined way of asking: “Does my model still behave well when I show it slightly different slices…
When AI Speaks Its Mind: Understanding Verbalized Sampling
The New Chapter in Prompt Engineering Imagine asking a friend for advice. Instead of giving one fixed answer, they pause, think aloud, list a few possibilities, and even admit how sure they feel about each. That’s what a new prompting technique called Verbalized Sampling (VS) teaches AI to do —…