Dive into the fascinating mechanics behind artificial intelligence. Touch, play, and experiment with the building blocks that power modern AI.
Inspired by biology, an artificial neuron receives inputs, multiplies each by a weight, adds a bias, and passes the result through an activation function. Adjust the sliders below to see it in action.
Activation functions introduce non-linearity, allowing neural networks to learn complex patterns beyond simple linear relationships. Hover over the graphs to explore each function.
Squashes values between 0 and 1. Historically popular for binary classification, but suffers from vanishing gradients at extreme values.
The most widely used activation today. Computationally efficient and avoids vanishing gradients for positive values. Can "die" if neurons stop activating.
Outputs between -1 and 1, zero-centered unlike sigmoid. Often used in recurrent networks. Still suffers from vanishing gradients at saturation.
Fixes the "dying ReLU" problem by allowing a small negative slope. Ensures neurons always have a non-zero gradient, keeping them alive during training.
Watch a neural network learn in real time. Click "Train" to see how the network adjusts its weights to classify the data points. Blue and orange represent two different classes.
Backpropagation is the engine of learning. It calculates how much each weight contributed to the error, then adjusts them to reduce it. Click each step to see the process animated.
Input data flows through the network, layer by layer. Each neuron computes its weighted sum and applies the activation function, producing an output prediction.
The network's prediction is compared to the true answer using a loss function (like mean squared error). This quantifies how wrong the network is.
Using the chain rule of calculus, gradients flow backward through the network. Each weight learns its share of responsibility for the error.
Weights are nudged in the direction that reduces the loss, proportional to the learning rate. This cycle repeats thousands of times until the network converges.
Click on the landscape to place a ball and watch it roll downhill toward the minimum. Gradient descent works the same way — always moving in the direction of steepest descent. Adjust the learning rate to see its effect.
Imagine standing on a mountain in thick fog. You can only feel the slope under your feet. Gradient descent works the same way: at each step, it measures the local slope (gradient) and moves downhill.
Too small: The ball barely moves. Training takes forever.
Too large: The ball overshoots, bouncing past the minimum.
Just right: Smooth convergence to the lowest point.
Different problems demand different structures. Here are the architectures that shaped modern AI.
The simplest architecture: data flows in one direction from input to output. Each layer transforms the data, extracting increasingly abstract features. The building block for all others.
Applies sliding filters across input data to detect spatial patterns. Convolution layers learn to recognize edges, textures, and objects. Revolutionized computer vision and image understanding.
Processes sequences by maintaining hidden state — a form of memory. LSTMs add gating mechanisms to control information flow, solving the vanishing gradient problem in long sequences.
Uses self-attention to weigh relationships between all elements simultaneously, regardless of distance. No recurrence needed. Powers GPT, BERT, and virtually all modern large language models.
From mathematical curiosity to world-changing technology — the key milestones in neural network history.
Warren McCulloch and Walter Pitts create the first mathematical model of an artificial neuron, proving that simple units could compute logical functions.
Frank Rosenblatt builds the Mark I Perceptron, the first hardware implementation. The New York Times calls it the embryo of a computer that "will be able to walk, talk, see, write, reproduce itself."
Rumelhart, Hinton, and Williams popularize the backpropagation algorithm, finally making it practical to train multi-layer networks.
Alex Krizhevsky's deep CNN crushes the ImageNet competition, reducing error rates by half. The deep learning revolution begins. GPUs become essential.
Google researchers introduce the Transformer architecture. Self-attention replaces recurrence, enabling massive parallelism and spawning GPT, BERT, and the era of large language models.
Models with billions of parameters demonstrate emergent abilities: reasoning, code generation, creative writing. AI becomes a general-purpose tool reshaping every industry.