⚙️Tech, Code & AIApril 5, 2026 at 8:41 AM·10 min read

Move 37: How DeepMind's AlphaGo Played the Most Beautiful Move in 3,000 Years — And Made the Greatest Go Player Alive Quit

March 9, 2016. Seoul. Lee Sedol, the Roger Federer of Go, sits across from a machine. 200 million people are watching. Then AlphaGo plays Move 37 — a move so alien, so beautiful, so impossible that commentators thought it was a mistake. It wasn't.

AlphaGoDeepMindAIMachine LearningReinforcement LearningGoLee SedolNeural Networks

The Impossible Match

March 9, 2016. The Four Seasons Hotel in Seoul, South Korea.

Lee Sedol — 33 years old, 18-time international champion, considered by many the greatest Go player of his generation — sits at a wooden board across from... a monitor. Behind that screen: AlphaGo, a machine built by a London-based AI lab that Google bought for $500 million.

200 million people are watching live across China, Japan, and Korea. Go isn't just a game in East Asia — it's 3,000 years of cultural heritage, considered the pinnacle of human intuition and creativity. Chess has 10^47 possible positions. Go has 10^170 — more than the number of atoms in the observable universe.

Every expert said AI was at least 10 years away from beating a top professional.

AlphaGo would win 4 games to 1.

But the score doesn't tell the story. What happened in Game 2, Move 37, would haunt Lee Sedol for the rest of his career — and prove that AI could discover strategies humans hadn't found in three millennia of playing.

The Problem That Broke Brute Force

Go looks deceptively simple. Two players. Black and white stones. A 19×19 grid. Surround territory, capture stones, control the board.

But the simplicity is a trap.

In chess, Deep Blue could beat Kasparov in 1997 through pure brute force — calculate every possible move 20 steps ahead, pick the best one. Chess has about 35 legal moves per turn. Go has about 250. The branching factor explodes so fast that even if you had a computer the size of the universe, you couldn't brute-force your way through a single game.

Go requires intuition — the ability to look at a board and feel which areas are important, which stones are strong, which groups are alive or dead. Professionals describe it as reading the "shape" and "flow" of the game. It's pattern recognition at a level that seemed fundamentally human.

That's why, when a relatively unknown Go player named Fan Hui lost 5-0 to AlphaGo in October 2015, the Go world dismissed it. Fan Hui was only 2-dan professional (decent, but not elite). Lee Sedol was 9-dan — the highest rank. The gap between them was like comparing a college basketball player to LeBron James.

Lee Sedol accepted the challenge. "I will win 5-0, or maybe lose one game," he told reporters.

He had no idea what was coming.

The Architecture of Intuition

Inside DeepMind's London office, a team led by David Silver and Demis Hassabis had spent two years building something unprecedented.

AlphaGo wasn't one algorithm. It was a hybrid system combining two breakthrough ideas:

1. Deep Neural Networks for Pattern Recognition

AlphaGo used two neural networks:

Policy Network: Trained on 30 million positions from human expert games, it learned to predict "what would a strong human play here?" Given any board position, it outputs a probability distribution over all possible moves. This narrows the search space from 250 moves to maybe 5-10 promising candidates.
Value Network: Trained to evaluate "who's winning?" from any board position. In Go, you can't just count material like chess pieces — you need to judge territory, influence, and potential. The value network learned to estimate the probability of winning from a position, replacing the need to search all the way to the endgame.

Both networks were deep convolutional neural networks (13-19 layers) that learned to recognize spatial patterns on the board — thick groups, weak points, good shape, overextension.

2. Monte Carlo Tree Search (MCTS) for Strategic Planning

But neural networks alone aren't enough. They tell you what looks good, not what is good.

AlphaGo combined the networks with MCTS — a probabilistic search algorithm that:

Builds a tree of possible future moves
Uses the policy network to decide which branches to explore
Uses the value network to evaluate positions without simulating to the end
Runs thousands of fast "rollout" simulations to estimate outcomes
Backs up the results to guide future searches

The magic was in the combination. The neural networks provided human-like intuition. MCTS provided lookahead and verification. Together, they could evaluate positions and plan strategies at a level that felt... alien.

Reinforcement Learning: The Part That Changes Everything

But here's where it gets wild.

After training on human games, AlphaGo played itself — millions of times. This is reinforcement learning: the system plays, evaluates which moves led to wins, and updates its networks to favor those moves.

It started by imitating humans. Then it began discovering moves humans never played.

By the time it faced Lee Sedol, AlphaGo had played more games than every human in history combined. It had explored regions of the strategy space that humans, constrained by tradition and intuition, had never touched.

Demis Hassabis, DeepMind's co-founder (a chess prodigy at 13, game designer at 17, neuroscience PhD at 29), described it as "AI discovering new knowledge."

Lee Sedol was about to experience that discovery firsthand.

Move 37

Game 2. March 10, 2016.

Lee Sedol won Game 1. He played aggressively, tested AlphaGo's weaknesses, and found none. But he won. The human world exhaled.

Game 2 started normally. Lee Sedol played black. AlphaGo played white. The opening followed known patterns.

Then, on move 37, AlphaGo placed a stone on the fifth line from the edge.

The room went silent.

In Go, the fifth line is considered weak — too far from the edge to secure territory, too far from the center to build influence. Professionals call it "dame" — neutral, empty, pointless.

Fan Hui, watching in the commentary room, looked confused.

Michael Redmond, a 9-dan professional commentator, said on the live stream: "I thought it was a mistake. It's a very surprising move."

The move looked wrong. It violated fundamental principles taught to every beginner. AlphaGo's own estimated win probability for that move was only 1 in 10,000 — meaning even it thought it was strange based on human games.

Lee Sedol paused. He left the board and took a 15-minute break (unusual for him). When he returned, his hands were shaking slightly.

He played his response. The game continued.

Twenty moves later, the professional commentators realized: Move 37 was genius.

It wasn't a mistake. It was a move that set up a strategic framework 50 moves deep — a framework that no human had conceived in 3,000 years of Go. The stone on the fifth line became the lynchpin of an attack that slowly, inexorably, dismantled Lee Sedol's position.

AlphaGo won Game 2.

And Game 3.

And Game 4.

Move 78: The Human Strikes Back

In Game 4, something remarkable happened.

Down 3-0, Lee Sedol played Move 78 — a "wedge" move so brilliant, so unexpected, that AlphaGo's win probability estimate dropped from 70% to 30% in a single move. AlphaGo started making mistakes. Lee Sedol won.

After the match, DeepMind's team analyzed the game. AlphaGo had a bug — a blind spot in its training. Lee Sedol found it.

One human move. One bug. One win.

It was the last time a human would ever beat AlphaGo.

The Retirement

AlphaGo won the match 4-1.

Lee Sedol gave an interview afterward, visibly shaken: "I wanted to play the perfect game... I wanted to prove that AlphaGo was not perfect. But I failed. AlphaGo played so perfectly."

Three years later, in November 2019, Lee Sedol retired from professional Go at age 36 — young for a top player. His reason?

"With the debut of AI in Go games, I've realized that I'm not at the top even if I become the number one. There is an entity that cannot be defeated."

Read that again. The greatest Go player of his generation quit because the ceiling of the game had moved beyond human reach.

AlphaGo Zero: When the Student Surpasses the Teacher

But DeepMind wasn't done.

In October 2017, they published a paper on AlphaGo Zero — a version that learned Go from scratch. No human games. No human knowledge. Just the rules of Go and self-play.

AlphaGo Zero:

Trained for 3 days (vs. AlphaGo's months of training on human games)
Beat the original AlphaGo 100 games to 0
Discovered all known human strategies independently... plus new ones

It proved that human knowledge wasn't a shortcut — it was a bottleneck. By starting from zero, the AI could explore the strategy space without human biases.

Then came AlphaZero (generalized to chess and shogi) and MuZero (learned the rules of games just by playing).

The architecture that DeepMind pioneered — combining deep learning with tree search and reinforcement learning — became the foundation for systems far beyond games.

AlphaFold: From Games to Saving Lives

In 2020, DeepMind turned the same approach to a 50-year-old problem in biology: protein folding.

Proteins are chains of amino acids that fold into 3D shapes. The shape determines function. Predicting the shape from the amino acid sequence is crucial for drug design, disease research, and understanding life itself.

Traditional methods took months of lab work and supercomputer time per protein.

AlphaFold 2 solved it. It predicted protein structures with atomic-level accuracy in hours. The scientific community called it a breakthrough of Nobel Prize magnitude (and indeed, Demis Hassabis won the Nobel Prize in Chemistry in 2024).

DeepMind open-sourced the predicted structures of 200 million proteins — essentially every known protein in biology.

From Move 37 in Go to the structure of every protein in the human body. The same core idea: neural networks for intuition, search for verification, self-play for discovery.

The DeepMind Way vs. The OpenAI Way

Here's what makes DeepMind's approach different from the "scale is all you need" philosophy that dominates modern AI:

DeepMind:

Reinforcement learning + search + self-play
Agent learns by doing (playing games, folding proteins, controlling robots)
Focused on generalization and reasoning
Sample-efficient (AlphaGo Zero trained in days, not months)

OpenAI (GPT-style):

Pure next-token prediction at massive scale
Model learns by reading the internet
Focused on breadth and linguistic fluency
Requires enormous data and compute

Both work. But DeepMind's approach feels closer to how humans learn — through interaction, feedback, and experimentation.

When DeepMind merged with Google Brain in 2023 to form Google DeepMind, the bet was clear: the future of AI isn't just scaling transformers. It's combining language models with reasoning, search, and reinforcement learning.

Move 37 wasn't just a Go move. It was a preview.

The Legacy

Go masters in East Asia now study AlphaGo's games the way chess players study Deep Blue and Stockfish. Move 37 appears in textbooks. Professional players have adopted strategies that AlphaGo invented.

The game evolved.

But the real legacy isn't about Go. It's about what AlphaGo proved:

AI doesn't just mimic human thinking. It can discover knowledge humans haven't found in thousands of years of trying.

Lee Sedol, in a later interview, reflected on Move 37:

"I thought AlphaGo was based on probability calculation, and that it was merely a machine. But when I saw this move, I changed my mind. Surely, AlphaGo is creative."

Creative. That's the word he used. Not "computational." Not "algorithmic." Creative.

On March 9, 2016, in a hotel in Seoul, a machine played a move that made 200 million people hold their breath.

And the greatest Go player alive realized he was watching something think in ways humans can't.

Three thousand years of human intuition. Ten to the power of 170 possible board positions.

One move that changed everything.

✍️

Written by Swayam Mohanty

Untold stories behind the tech giants, legendary moments, and the code that changed the world.

Keep Reading

EXFOLIATE! EXFOLIATE! How a Lobster-Themed AI Assistant Became the 5th Most-Starred Repo on GitHub — And Why NVIDIA Bet Its Security Layer On It

⚙️ tech and code

11 min read

EXFOLIATE! EXFOLIATE! How a Lobster-Themed AI Assistant Became the 5th Most-Starred Repo on GitHub — And Why NVIDIA Bet Its Security Layer On It

A crustacean mascot shouting 'EXFOLIATE!' has 325,000 GitHub stars. Behind the memes lies the most ambitious attempt to make AI assistants truly yours — and NVIDIA just built their entire security runtime on top of it.

The Leak That Broke the AI Monopoly: How Meta's LLaMA Escaped on 4chan and Sparked the Open-Source Revolution

In February 2023, Meta released LLaMA as a 'research-only' model. Within a week, it leaked on 4chan. Within a month, teenagers were running ChatGPT-quality models on gaming PCs. The AI industry would never be the same.

The $2 Trillion Monopoly: How NVIDIA's CUDA Became the Oxygen of AI and Why Nobody Can Breathe Without It

In 2006, Jensen Huang launched CUDA to a room of confused developers. By 2024, it had become the single most important moat in tech history — a 17-year lock-in so total that every AI model from GPT-4 to Gemini depends on it to exist.

NVIDIACUDA+6

Apr 1