Perceptrons and Neurons
The perceptron, invented by Frank Rosenblatt in 1958, is the simplest neural network—a single artificial neuron that makes decisions by weighing evidence. While limited on its own, the perceptron introduces concepts that underpin all modern deep learning: weighted inputs, activation functions, and learning through weight adjustment.
Understanding perceptrons provides the foundation for understanding neural networks. Every deep network, no matter how complex, is built from these simple computational units.
Biological Inspiration
Artificial neurons draw loose inspiration from biological neurons in the brain. A biological neuron receives signals through dendrites, processes them in the cell body, and transmits output through the axon to other neurons. The connection strength between neurons (synaptic strength) determines how much influence one neuron has on another.
Artificial neurons abstract this process. They receive numerical inputs, multiply each by a weight (analogous to synaptic strength), sum the results, and apply an activation function to produce output. Learning happens by adjusting the weights.
The analogy shouldn't be taken too literally—artificial neurons are mathematical functions, not accurate brain models. But the metaphor helps build intuition for how networks of simple units can produce complex behavior.
import numpy as np
# A biological neuron receives signals and fires if they exceed a threshold
# An artificial neuron does something similar mathematically
def simple_neuron(inputs, weights, threshold):
"""
Simple threshold neuron.
Fires (outputs 1) if weighted sum exceeds threshold.
"""
weighted_sum = np.dot(inputs, weights)
if weighted_sum >= threshold:
return 1 # Neuron fires
else:
return 0 # Neuron doesn't fire
# Example: neuron with 3 inputs
inputs = np.array([0.5, 0.3, 0.8])
weights = np.array([0.4, 0.6, 0.2])
threshold = 0.5
output = simple_neuron(inputs, weights, threshold)
print(f"Inputs: {inputs}")
print(f"Weights: {weights}")
print(f"Weighted sum: {np.dot(inputs, weights):.2f}")
print(f"Threshold: {threshold}")
print(f"Output: {output}")The Perceptron Model
The perceptron takes multiple inputs, multiplies each by a weight, sums them, adds a bias term, and passes the result through a step function:
The weights ($w_i$) determine the importance of each input. The bias ($b$) shifts the decision boundary—it's like having an input that's always 1 with its own weight.
The step function makes the output binary: either the neuron fires (1) or it doesn't (0).
import numpy as np
class Perceptron:
def __init__(self, n_inputs):
# Initialize weights randomly, bias to zero
self.weights = np.random.randn(n_inputs) * 0.1
self.bias = 0.0
def predict(self, x):
"""Compute perceptron output."""
linear = np.dot(x, self.weights) + self.bias
return 1 if linear >= 0 else 0
def predict_batch(self, X):
"""Predict for multiple samples."""
linear = np.dot(X, self.weights) + self.bias
return (linear >= 0).astype(int)
# Create a perceptron with 2 inputs
perceptron = Perceptron(n_inputs=2)
# Test on some inputs
test_inputs = np.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
])
print("Perceptron outputs (random weights):")
for x in test_inputs:
y = perceptron.predict(x)
print(f" Input {x} -> Output {y}")Learning: The Perceptron Algorithm
The perceptron learns by adjusting weights based on errors. When it makes a wrong prediction, it updates weights to do better next time:
If output is 0 but should be 1: Increase weights for active inputs (move toward firing)
If output is 1 but should be 0: Decrease weights for active inputs (move away from firing)
The update rule: $w_i \leftarrow w_i + \eta (y_{true} - y_{pred}) x_i$
where $\eta$ is the learning rate controlling step size.
import numpy as np
class Perceptron:
def __init__(self, n_inputs, learning_rate=0.1):
self.weights = np.random.randn(n_inputs) * 0.1
self.bias = 0.0
self.lr = learning_rate
def predict(self, x):
linear = np.dot(x, self.weights) + self.bias
return 1 if linear >= 0 else 0
def train(self, X, y, epochs=100):
"""Train perceptron using the perceptron learning rule."""
for epoch in range(epochs):
errors = 0
for xi, yi in zip(X, y):
prediction = self.predict(xi)
error = yi - prediction
if error != 0:
# Update weights and bias
self.weights += self.lr * error * xi
self.bias += self.lr * error
errors += 1
if errors == 0:
print(f"Converged at epoch {epoch + 1}")
break
return self
# Train on AND gate
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 0, 0, 1]) # AND function
perceptron = Perceptron(n_inputs=2)
perceptron.train(X, y, epochs=100)
print("\nLearned AND gate:")
for xi, yi in zip(X, y):
pred = perceptron.predict(xi)
print(f" {xi} -> {pred} (expected {yi})")
print(f"\nLearned weights: {perceptron.weights}")
print(f"Learned bias: {perceptron.bias}")Linear Decision Boundaries
A perceptron divides the input space with a linear boundary (a line in 2D, a plane in 3D, a hyperplane in higher dimensions). Points on one side are classified as 1, points on the other as 0.
The decision boundary is where $\sum_i w_i x_i + b = 0$. In 2D with inputs $x_1$ and $x_2$:
This is the equation of a line with slope $-w_1/w_2$ and intercept $-b/w_2$.
import numpy as np
import matplotlib.pyplot as plt
# Train perceptron on OR gate
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 1, 1, 1]) # OR function
class Perceptron:
def __init__(self, n_inputs, learning_rate=0.1):
self.weights = np.random.randn(n_inputs) * 0.1
self.bias = 0.0
self.lr = learning_rate
def predict(self, x):
return 1 if np.dot(x, self.weights) + self.bias >= 0 else 0
def train(self, X, y, epochs=100):
for epoch in range(epochs):
for xi, yi in zip(X, y):
error = yi - self.predict(xi)
self.weights += self.lr * error * xi
self.bias += self.lr * error
perceptron = Perceptron(2)
perceptron.train(X, y)
# The decision boundary
w1, w2 = perceptron.weights
b = perceptron.bias
print(f"Decision boundary: {w1:.2f}*x1 + {w2:.2f}*x2 + {b:.2f} = 0")
print(f"This is a line separating class 0 from class 1")The XOR Problem: Limitations of Perceptrons
A single perceptron can only solve linearly separable problems—where a straight line can separate the classes. This is a severe limitation.
The classic example is XOR (exclusive OR). XOR outputs 1 when inputs differ and 0 when they're the same. No single line can separate the 1s from the 0s in XOR. This was famously demonstrated by Minsky and Papert in 1969, contributing to the first "AI winter."
import numpy as np
# XOR is not linearly separable
X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_xor = np.array([0, 1, 1, 0]) # XOR function
print("XOR truth table:")
for xi, yi in zip(X_xor, y_xor):
print(f" {xi} -> {yi}")
print("\nTry to visualize: plot (0,0) and (1,1) as class 0")
print("Plot (0,1) and (1,0) as class 1")
print("No straight line can separate them!")
# Try to train a perceptron on XOR
class Perceptron:
def __init__(self, n_inputs, lr=0.1):
self.weights = np.zeros(n_inputs)
self.bias = 0.0
self.lr = lr
def predict(self, x):
return 1 if np.dot(x, self.weights) + self.bias >= 0 else 0
def train(self, X, y, epochs=100):
for _ in range(epochs):
for xi, yi in zip(X, y):
error = yi - self.predict(xi)
self.weights += self.lr * error * xi
self.bias += self.lr * error
perceptron = Perceptron(2)
perceptron.train(X_xor, y_xor, epochs=1000)
print("\nPerceptron predictions on XOR (after 1000 epochs):")
correct = 0
for xi, yi in zip(X_xor, y_xor):
pred = perceptron.predict(xi)
correct += (pred == yi)
print(f" {xi} -> {pred} (expected {yi})")
print(f"\nAccuracy: {correct}/{len(y_xor)} - Perceptron CANNOT learn XOR!")From Perceptrons to Neurons
The step function's hard threshold makes learning difficult—small weight changes produce no change in output until you cross the threshold. Modern neural networks replace the step function with smooth, differentiable activation functions.
A neuron with a smooth activation function can compute gradients, enabling the backpropagation algorithm to train multi-layer networks. This seemingly small change unlocks the power of deep learning.
import numpy as np
def step(x):
"""Step function - original perceptron."""
return np.where(x >= 0, 1, 0)
def sigmoid(x):
"""Sigmoid - smooth version of step."""
return 1 / (1 + np.exp(-x))
def relu(x):
"""ReLU - most common in modern networks."""
return np.maximum(0, x)
# Compare activations
x = np.linspace(-5, 5, 100)
print("Activation function comparison:")
print("Step: Hard 0/1 threshold, not differentiable at 0")
print("Sigmoid: Smooth, outputs between 0 and 1")
print("ReLU: Simple, fast, most popular today")
# Show values at a few points
test_points = [-2, -1, 0, 1, 2]
print(f"\n{'x':<6} {'Step':<8} {'Sigmoid':<10} {'ReLU':<8}")
print("-" * 32)
for x_val in test_points:
print(f"{x_val:<6} {int(step(x_val)):<8} {sigmoid(x_val):<10.4f} {relu(x_val):<8}")Multi-Layer Networks Solve XOR
The XOR problem is solved by adding a hidden layer. Two neurons in the hidden layer can each learn a linear boundary, and the output neuron combines them to create a non-linear decision region.
This is the key insight: stacking layers of neurons creates networks that can learn arbitrarily complex functions.
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Manual two-layer network for XOR
# Hidden layer: 2 neurons
# Output layer: 1 neuron
# These weights are hand-crafted to solve XOR
W1 = np.array([[20, 20], # First hidden neuron: AND-like
[20, 20]]) # Second hidden neuron: OR-like
b1 = np.array([-30, -10]) # Biases for hidden layer
W2 = np.array([[-60], [60]]) # Output neuron
b2 = np.array([-30])
def forward(x):
"""Forward pass through 2-layer network."""
# Hidden layer
h = sigmoid(np.dot(x, W1) + b1)
# Output layer
y = sigmoid(np.dot(h, W2) + b2)
return y
# Test on XOR
X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_xor = np.array([0, 1, 1, 0])
print("Two-layer network solving XOR:")
for xi, yi in zip(X_xor, y_xor):
pred = forward(xi)[0]
pred_class = 1 if pred > 0.5 else 0
print(f" {xi} -> {pred:.4f} -> class {pred_class} (expected {yi})")
print("\nWith a hidden layer, neural networks can learn XOR!")
print("This is why depth matters in deep learning.")Weights, Biases, and Parameters
Every connection in a neural network has a weight—a number that determines how much one neuron influences another. Every neuron (except inputs) has a bias—a number added before the activation function.
Together, weights and biases are the network's parameters. Learning is the process of finding parameter values that make the network produce correct outputs. A network with millions of parameters can learn extremely complex functions.
import numpy as np
def count_parameters(layer_sizes):
"""Count total parameters in a feedforward network."""
total_weights = 0
total_biases = 0
for i in range(len(layer_sizes) - 1):
n_in = layer_sizes[i]
n_out = layer_sizes[i + 1]
weights = n_in * n_out
biases = n_out
total_weights += weights
total_biases += biases
print(f"Layer {i+1}: {n_in} -> {n_out}")
print(f" Weights: {n_in} x {n_out} = {weights}")
print(f" Biases: {n_out}")
print(f"\nTotal parameters: {total_weights + total_biases}")
print(f" Weights: {total_weights}")
print(f" Biases: {total_biases}")
return total_weights + total_biases
# Example: small network
print("Small network: 10 inputs, 2 hidden layers (20, 10), 1 output")
count_parameters([10, 20, 10, 1])
print("\n" + "="*50 + "\n")
# Example: larger network
print("Larger network: 784 inputs (28x28 image), hidden (256, 128), 10 outputs")
count_parameters([784, 256, 128, 10])Key Takeaways
- A perceptron is a single artificial neuron: weighted sum of inputs plus bias, passed through an activation function
- Perceptrons can only learn linearly separable functions
- The XOR problem showed perceptrons' limitations—they cannot separate non-linear patterns
- Multi-layer networks overcome this by combining multiple linear boundaries
- Modern neurons use smooth activation functions (sigmoid, ReLU) instead of step functions
- Weights control connection strengths; biases shift activation thresholds
- The number of parameters (weights + biases) determines a network's capacity
- Deep learning's power comes from stacking layers of simple neurons