1๏ธโฃ G-Modules: SO(2) Rotational Equivariance
Paper Section 7.1 Algebraic Concept: Group Representations
๐ Step 1: Build rotation matrix R(ฮธ) for ฮธ = 45ยฐ
ฮธ = ฯ/4 = 45ยฐ
R(45ยฐ) =
[[ 0.7071 -0.7071]
[ 0.7071 0.7071]]
This matrix rotates any 2D point by 45ยฐ counter-clockwise.
โ๏ธ Step 2: Choose layer ฯ(x) = R(ฮฑ)ยทx where ฮฑ = 30ยฐ
Layer weight matrix W = R(30ยฐ) =
[[ 0.8660 -0.5000]
[ 0.5000 0.8660]]
The paper proves: rotations commute with rotations (SO(2) is abelian).
So WยทR(ฮธ) = R(ฮธ)ยทW โ this layer IS equivariant.
๐ฌ Step 3: Test with point x = [1, 0] (point on positive x-axis)
x = [1. 0.] (a point at angle 0ยฐ, distance 1 from origin)
PATH A โ 'Rotate input FIRST, then layer':
R(45ยฐ)ยทx = [0.7071 0.7071] โ rotated 45ยฐ
W ยท (R(45ยฐ)ยทx) = [0.2588 0.9659] โ layer applied after
PATH B โ 'Apply layer FIRST, then rotate output':
Wยทx = [0.8660 0.5000] โ layer applied first
R(45ยฐ)ยท(Wยทx) = [0.2588 0.9659] โ then rotated 45ยฐ
โPath A โ Path Bโ = 0.00000000
โ EQUIVARIANCE CONFIRMED: Both paths give identical results!
โ Conclusion: R(30ยฐ) is a valid equivariant layer for SO(2).
๐ AI Impact: 12ร data efficiency for symmetric data (rotated MNIST)
2๏ธโฃ Tensor Products: The Math Behind Attention
Paper Section 7.2 Algebraic Concept: Tensor Product V โ W
๐ Step 1: Define the two vector spaces V (queries) and W (keys)
V = Rยณ (query space, e.g. each query is a 3D vector)
W = Rยฒ (key space, e.g. each key is a 2D vector)
Basis of V: ['e1', 'e2', 'e3']
Basis of W: ['f1', 'f2']
๐ Step 2: Compute dimension of tensor product V โ W
Formula: dim(V โ W) = dim(V) ร dim(W) = 3 ร 2 = 6
This means attention has 6 independent interaction dimensions.
๐ Step 3: Write out ALL basis elements of V โ W
Every basis element e_i โ f_j captures one specific interaction:
e1โf1 โ query dim 1 interacts with key dim 1
e1โf2 โ query dim 1 interacts with key dim 2
e2โf1 โ query dim 2 interacts with key dim 1
e2โf2 โ query dim 2 interacts with key dim 2
e3โf1 โ query dim 3 interacts with key dim 1
e3โf2 โ query dim 3 interacts with key dim 2
Total basis elements: 6 โ (matches dim = 6)
๐ Step 4: Show concrete query/key pair as tensor product
Example query vector v = [ 2. -1. 3.]
Example key vector w = [1. 4.]
v โ w (as matrix, rows=query dims, cols=key dims):
[[ 2. 8.]
[-1. -4.]
[ 3. 12.]]
Reading across rows: vโw = (2.0)ยทe1โf1 + (8.0)ยทe1โf2 + (-1.0)ยทe2โf1 + (-4.0)ยทe2โf2 + (3.0)ยทe3โf1 + (12.0)ยทe3โf2
In attention: QยทKแต computes this for ALL query-key pairs simultaneously.
The softmax then picks which interactions matter most.
๐ AI Impact: Explains multi-head attention's expressive power (4.1 BLEU score gain)
3๏ธโฃ Functors: Neural Networks as Categorical Functors
Paper Section 7.3 Algebraic Concept: Category Theory & Functors
๐ Step 1: Define simple 1-layer networks
Input x = [ 1. -0.5] (dimension 2)
Layer f: Rยฒ โ Rยณ (maps 2D to 3D)
Layer g: Rยณ โ Rยฒ (maps 3D to 2D)
๐ Step 2: Functor Axiom 1 โ Identity preservation: F(id) = id
Test: identity applied to x = [ 3. -1.]
Result = [ 3. -1.]
Same as input? True
โ F(id_V)(x) = x = id_{F(V)}(x) โ Axiom 1 holds.
๐ Step 3: Functor Axiom 2 โ Composition preservation: F(gโf) = F(g)โF(f)
Input x = [ 1. -0.5]
METHOD A โ Compose into single function (gโf), then apply:
(gโf)(x) = [0.45 0.325]
METHOD B โ Apply f, then apply g to the result:
f(x) = [0.75 0. 0. ] โ hidden representation
g(f(x)) = [0.45 0.325] โ final output
โMethod A โ Method Bโ = 0.0000000000
โ Identical results โ Axiom 2 holds: F(gโf) = F(g)โF(f)
IMPLICATION: You can safely split a network at any layer, process
pieces separately, and recombine โ the math guarantees consistency.
๐ AI Impact: Enables modular network design and compositional generalization
4๏ธโฃ Homology: Betti Numbers โ Counting Topology of Shapes
Paper Section 7.4 Algebraic Concept: Homological Algebra
๐ Step 1: Build the triangle complex
Vertices: ['v0', 'v1', 'v2'] (3 total)
Edges: ['e(0,1)', 'e(1,2)', 'e(0,2)'] (3 total)
Triangles: [] (0 total) โ NO filled interior!
Shape looks like: v0 โโโ v1
โฒ โฑ
v2
Just the outline โ a triangular RING, not a solid triangle.
๐ Step 2: Build the boundary matrix โโ (edges โ vertices)
โโ maps each EDGE to its two ENDPOINTS (with signs for orientation):
โโ(e_ij) = v_j - v_i (tail gets -1, head gets +1)
Columns = edges, Rows = vertices:
e(0,1) e(1,2) e(0,2)
v0 [ -1 0 -1 ]
v1 [ +1 -1 0 ]
v2 [ 0 +1 +1 ]
โโ matrix =
[[-1 0 -1]
[ 1 -1 0]
[ 0 1 1]]
๐ Step 3: Compute ranks and Betti numbers
rank(โโ) = 2 (how many linearly independent boundary relations)
rank(โโ) = 0 (no filled triangles โ no 2D boundaries)
ฮฒโ (components) = 3 vertices โ rank(โโ) = 3 โ 2 = 1
ฮฒโ (holes) = dim(ker โโ) โ dim(im โโ)
= (3 โ 2) โ 0
= 1 โ 0 = 1
โ RESULTS:
ฮฒโ = 1 โ The triangle is ONE connected shape (no separate pieces)
ฮฒโ = 1 โ The triangle has ONE hole/loop inside it
If we FILLED the interior (added the triangle face), ฮฒโ would become 0
because the hole would be "plugged."
IN AI: Persistent homology uses these numbers to describe point cloud shape.
A ring of data points has ฮฒโ=1 (one loop). Useful for detecting circular
patterns in molecular data, brain connectivity, etc.
๐ AI Impact: Robust topological feature extraction under 10% noise
5๏ธโฃ Algebraic Varieties: XOR with Polynomial Network
Paper Section 7.5 Algebraic Concept: Algebraic Geometry / Varieties
๐ Step 1: Write equations from XOR truth table
XOR truth table:
x1=0, x2=0 โ output 0
x1=0, x2=1 โ output 1
x1=1, x2=0 โ output 1
x1=1, x2=1 โ output 0
Polynomial ansatz:
f(x1, x2) = w0 + w1ยทx1 + w2ยทx2 + w3ยทx1ยฒ + w4ยทx2ยฒ + w5ยทx1ยทx2
Plug each (x1, x2, target) into f:
f(0,0) = w0 = 0 ...(eq. 1)
f(0,1) = w0 + w2 + w4 = 1 ...(eq. 2)
f(1,0) = w0 + w1 + w3 = 1 ...(eq. 3)
f(1,1) = w0 + w1 + w2 + w3 + w4 + w5 = 0 ...(eq. 4)
๐ Step 2: Solve the system algebraically
From eq. 1: w0 = 0
Substitute into eq. 2: w2 + w4 = 1
Substitute into eq. 3: w1 + w3 = 1
From eq. 4 with w0=0: w1 + w2 + w3 + w4 + w5 = 0
(w1 + w3) + (w2 + w4) + w5 = 0
1 + 1 + w5 = 0
โ w5 = -2
The paper picks the SIMPLEST/CANONICAL solution:
w0=0, w1=1, w2=1, w3=0, w4=0, w5=-2
โ f(x1, x2) = x1 + x2 โ 2ยทx1ยทx2
๐ Step 3: Verify the solution on all XOR inputs
Weights: w = [0. 1. 1. 0. 0. -2.]
Formula: f(x1,x2) = 1ยทx1 + 1ยทx2 + (-2)ยทx1ยทx2
x1 x2 Target f(x1,x2) Correct?
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0 0 0 0.0 โ
0 1 1 1.0 โ
1 0 1 1.0 โ
1 1 0 0.0 โ
All correct? True
๐ Step 4: Decision boundary as algebraic variety
The DECISION BOUNDARY is where f(x1, x2) = 0:
x1 + x2 โ 2ยทx1ยทx2 = 0
This is an ALGEBRAIC VARIETY โ the set of all points satisfying this equation.
It's a degree-2 curve (a hyperbola!) that perfectly separates the XOR classes.
In algebraic geometry terms:
โข Polynomial ring: R[x1, x2]
โข Ideal: I = โจx1 + x2 โ 2ยทx1ยทx2โฉ
โข Variety: V(I) = {(x1,x2) : x1 + x2 โ 2ยทx1ยทx2 = 0}
The Nullstellensatz (paper Theorem 2.10) connects this geometric curve
to its algebraic definition โ proving the two descriptions are equivalent.
๐ Step 5: Variety passes through XOR boundary points
Points ON the boundary (f=0) โ between classes:
f(0.00, -0.00) = 0.000000 โ 0 โ ON the boundary
f(1.00, 1.00) = 0.000000 โ 0 โ ON the boundary
โ SUMMARY:
โข XOR is NOT linearly separable (no straight-line solution)
โข A degree-2 polynomial solves it EXACTLY
โข Found by algebra (not gradient descent) โ perfect, zero error
โข Decision boundary is a hyperbola: an algebraic variety in Rยฒ
๐ AI Impact: Exact algebraic solutions via Grรถbner basis (instant vs 50 epochs)
๐ SUMMARY: Five Algebraic Structures and Their AI Roles
| # |
Algebraic Concept |
AI Application (What it buys you) |
| 1 |
G-Modules (Groups) |
Equivariant networks โ rotate input = rotate output. 12ร data efficiency |
| 2 |
Tensor Products |
Attention = bilinear map between Q and K. Multi-head = decompose tensor space. 4.1 BLEU gain |
| 3 |
Functors (Categories) |
Nets are functors: composition law holds. Backprop is a contravariant functor |
| 4 |
Homology (ฮฒโ, ฮฒโ, ...) |
Count connected pieces and holes in data. Robust to 10% noise |
| 5 |
Algebraic Varieties |
Decision boundaries are polynomial curves. Grรถbner basis โ exact weights |
โโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ # โ Algebraic Concept โ AI Application (What it buys you) โ
โโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 1 โ G-Modules (Groups) โ Equivariant networks โ rotate input = โ
โ โ โ rotate output. 12ร data efficiency. โ
โโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 2 โ Tensor Products โ Attention = bilinear map between Q and K.โ
โ โ โ Multi-head = decompose the tensor space. โ
โโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 3 โ Functors (Categories) โ Nets are functors: composition law holds.โ
โ โ โ Backprop is a contravariant functor. โ
โโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 4 โ Homology (ฮฒโ, ฮฒโ, ...) โ Count connected pieces and holes in data.โ
โ โ โ Robust to noise โ topology > geometry. โ
โโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 5 โ Algebraic Varieties โ Decision boundaries are polynomial curvesโ
โ โ โ Grรถbner basis โ exact network weights. โ
โโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Each example shows that core AI capabilities are not engineering tricks โ
they are consequences of deep algebraic structure.
๐ Paper Citation
@article{omer2026algebraic,
title={Algebraic Foundations of Modern Artificial Intelligence: A Unified Mathematical Framework},
author={Omer, Siraj Osman},
journal={arXiv preprint},
year={2026}
}