๐Ÿงฎ Algebraic Structures in Modern AI

Five Complete Worked Examples with Full Output Results

Based on: "Algebraic Foundations of Modern Artificial Intelligence"

Siraj Osman Omer | 2026

๐ŸŽฎ
Try the Interactive Demo!
Change angles, vectors, weights in REAL-TIME
โ†’
Slide to rotate Custom vectors Adjust weights Instant results
๐Ÿ Download Python Script โญ GitHub Repository

1๏ธโƒฃ G-Modules: SO(2) Rotational Equivariance

Paper Section 7.1 Algebraic Concept: Group Representations

๐Ÿ“ Step 1: Build rotation matrix R(ฮธ) for ฮธ = 45ยฐ
ฮธ = ฯ€/4 = 45ยฐ R(45ยฐ) = [[ 0.7071 -0.7071] [ 0.7071 0.7071]] This matrix rotates any 2D point by 45ยฐ counter-clockwise.
โš™๏ธ Step 2: Choose layer ฯ†(x) = R(ฮฑ)ยทx where ฮฑ = 30ยฐ
Layer weight matrix W = R(30ยฐ) = [[ 0.8660 -0.5000] [ 0.5000 0.8660]] The paper proves: rotations commute with rotations (SO(2) is abelian). So WยทR(ฮธ) = R(ฮธ)ยทW โ€” this layer IS equivariant.
๐Ÿ”ฌ Step 3: Test with point x = [1, 0] (point on positive x-axis)
x = [1. 0.] (a point at angle 0ยฐ, distance 1 from origin) PATH A โ€” 'Rotate input FIRST, then layer': R(45ยฐ)ยทx = [0.7071 0.7071] โ† rotated 45ยฐ W ยท (R(45ยฐ)ยทx) = [0.2588 0.9659] โ† layer applied after PATH B โ€” 'Apply layer FIRST, then rotate output': Wยทx = [0.8660 0.5000] โ† layer applied first R(45ยฐ)ยท(Wยทx) = [0.2588 0.9659] โ† then rotated 45ยฐ โ€–Path A โˆ’ Path Bโ€– = 0.00000000 โœ“ EQUIVARIANCE CONFIRMED: Both paths give identical results! โœ“ Conclusion: R(30ยฐ) is a valid equivariant layer for SO(2).

๐Ÿ“Š AI Impact: 12ร— data efficiency for symmetric data (rotated MNIST)

2๏ธโƒฃ Tensor Products: The Math Behind Attention

Paper Section 7.2 Algebraic Concept: Tensor Product V โŠ— W

๐Ÿ“ Step 1: Define the two vector spaces V (queries) and W (keys)
V = Rยณ (query space, e.g. each query is a 3D vector) W = Rยฒ (key space, e.g. each key is a 2D vector) Basis of V: ['e1', 'e2', 'e3'] Basis of W: ['f1', 'f2']
๐Ÿ“ Step 2: Compute dimension of tensor product V โŠ— W
Formula: dim(V โŠ— W) = dim(V) ร— dim(W) = 3 ร— 2 = 6 This means attention has 6 independent interaction dimensions.
๐Ÿ“ Step 3: Write out ALL basis elements of V โŠ— W
Every basis element e_i โŠ— f_j captures one specific interaction: e1โŠ—f1 โ† query dim 1 interacts with key dim 1 e1โŠ—f2 โ† query dim 1 interacts with key dim 2 e2โŠ—f1 โ† query dim 2 interacts with key dim 1 e2โŠ—f2 โ† query dim 2 interacts with key dim 2 e3โŠ—f1 โ† query dim 3 interacts with key dim 1 e3โŠ—f2 โ† query dim 3 interacts with key dim 2 Total basis elements: 6 โœ“ (matches dim = 6)
๐Ÿ“ Step 4: Show concrete query/key pair as tensor product
Example query vector v = [ 2. -1. 3.] Example key vector w = [1. 4.] v โŠ— w (as matrix, rows=query dims, cols=key dims): [[ 2. 8.] [-1. -4.] [ 3. 12.]] Reading across rows: vโŠ—w = (2.0)ยทe1โŠ—f1 + (8.0)ยทe1โŠ—f2 + (-1.0)ยทe2โŠ—f1 + (-4.0)ยทe2โŠ—f2 + (3.0)ยทe3โŠ—f1 + (12.0)ยทe3โŠ—f2 In attention: QยทKแต€ computes this for ALL query-key pairs simultaneously. The softmax then picks which interactions matter most.

๐Ÿ“Š AI Impact: Explains multi-head attention's expressive power (4.1 BLEU score gain)

3๏ธโƒฃ Functors: Neural Networks as Categorical Functors

Paper Section 7.3 Algebraic Concept: Category Theory & Functors

๐Ÿ“ Step 1: Define simple 1-layer networks
Input x = [ 1. -0.5] (dimension 2) Layer f: Rยฒ โ†’ Rยณ (maps 2D to 3D) Layer g: Rยณ โ†’ Rยฒ (maps 3D to 2D)
๐Ÿ“ Step 2: Functor Axiom 1 โ€” Identity preservation: F(id) = id
Test: identity applied to x = [ 3. -1.] Result = [ 3. -1.] Same as input? True โœ“ F(id_V)(x) = x = id_{F(V)}(x) โ€” Axiom 1 holds.
๐Ÿ“ Step 3: Functor Axiom 2 โ€” Composition preservation: F(gโˆ˜f) = F(g)โˆ˜F(f)
Input x = [ 1. -0.5] METHOD A โ€” Compose into single function (gโˆ˜f), then apply: (gโˆ˜f)(x) = [0.45 0.325] METHOD B โ€” Apply f, then apply g to the result: f(x) = [0.75 0. 0. ] โ† hidden representation g(f(x)) = [0.45 0.325] โ† final output โ€–Method A โˆ’ Method Bโ€– = 0.0000000000 โœ“ Identical results โ€” Axiom 2 holds: F(gโˆ˜f) = F(g)โˆ˜F(f) IMPLICATION: You can safely split a network at any layer, process pieces separately, and recombine โ€” the math guarantees consistency.

๐Ÿ“Š AI Impact: Enables modular network design and compositional generalization

4๏ธโƒฃ Homology: Betti Numbers โ€” Counting Topology of Shapes

Paper Section 7.4 Algebraic Concept: Homological Algebra

๐Ÿ“ Step 1: Build the triangle complex
Vertices: ['v0', 'v1', 'v2'] (3 total) Edges: ['e(0,1)', 'e(1,2)', 'e(0,2)'] (3 total) Triangles: [] (0 total) โ† NO filled interior! Shape looks like: v0 โ”€โ”€โ”€ v1 โ•ฒ โ•ฑ v2 Just the outline โ€” a triangular RING, not a solid triangle.
๐Ÿ“ Step 2: Build the boundary matrix โˆ‚โ‚ (edges โ†’ vertices)
โˆ‚โ‚ maps each EDGE to its two ENDPOINTS (with signs for orientation): โˆ‚โ‚(e_ij) = v_j - v_i (tail gets -1, head gets +1) Columns = edges, Rows = vertices: e(0,1) e(1,2) e(0,2) v0 [ -1 0 -1 ] v1 [ +1 -1 0 ] v2 [ 0 +1 +1 ] โˆ‚โ‚ matrix = [[-1 0 -1] [ 1 -1 0] [ 0 1 1]]
๐Ÿ“ Step 3: Compute ranks and Betti numbers
rank(โˆ‚โ‚) = 2 (how many linearly independent boundary relations) rank(โˆ‚โ‚‚) = 0 (no filled triangles โ†’ no 2D boundaries) ฮฒโ‚€ (components) = 3 vertices โˆ’ rank(โˆ‚โ‚) = 3 โˆ’ 2 = 1 ฮฒโ‚ (holes) = dim(ker โˆ‚โ‚) โˆ’ dim(im โˆ‚โ‚‚) = (3 โˆ’ 2) โˆ’ 0 = 1 โˆ’ 0 = 1 โœ“ RESULTS: ฮฒโ‚€ = 1 โ†’ The triangle is ONE connected shape (no separate pieces) ฮฒโ‚ = 1 โ†’ The triangle has ONE hole/loop inside it If we FILLED the interior (added the triangle face), ฮฒโ‚ would become 0 because the hole would be "plugged." IN AI: Persistent homology uses these numbers to describe point cloud shape. A ring of data points has ฮฒโ‚=1 (one loop). Useful for detecting circular patterns in molecular data, brain connectivity, etc.

๐Ÿ“Š AI Impact: Robust topological feature extraction under 10% noise

5๏ธโƒฃ Algebraic Varieties: XOR with Polynomial Network

Paper Section 7.5 Algebraic Concept: Algebraic Geometry / Varieties

๐Ÿ“ Step 1: Write equations from XOR truth table
XOR truth table: x1=0, x2=0 โ†’ output 0 x1=0, x2=1 โ†’ output 1 x1=1, x2=0 โ†’ output 1 x1=1, x2=1 โ†’ output 0 Polynomial ansatz: f(x1, x2) = w0 + w1ยทx1 + w2ยทx2 + w3ยทx1ยฒ + w4ยทx2ยฒ + w5ยทx1ยทx2 Plug each (x1, x2, target) into f: f(0,0) = w0 = 0 ...(eq. 1) f(0,1) = w0 + w2 + w4 = 1 ...(eq. 2) f(1,0) = w0 + w1 + w3 = 1 ...(eq. 3) f(1,1) = w0 + w1 + w2 + w3 + w4 + w5 = 0 ...(eq. 4)
๐Ÿ“ Step 2: Solve the system algebraically
From eq. 1: w0 = 0 Substitute into eq. 2: w2 + w4 = 1 Substitute into eq. 3: w1 + w3 = 1 From eq. 4 with w0=0: w1 + w2 + w3 + w4 + w5 = 0 (w1 + w3) + (w2 + w4) + w5 = 0 1 + 1 + w5 = 0 โ†’ w5 = -2 The paper picks the SIMPLEST/CANONICAL solution: w0=0, w1=1, w2=1, w3=0, w4=0, w5=-2 โ†’ f(x1, x2) = x1 + x2 โˆ’ 2ยทx1ยทx2
๐Ÿ“ Step 3: Verify the solution on all XOR inputs
Weights: w = [0. 1. 1. 0. 0. -2.] Formula: f(x1,x2) = 1ยทx1 + 1ยทx2 + (-2)ยทx1ยทx2 x1 x2 Target f(x1,x2) Correct? โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ 0 0 0 0.0 โœ“ 0 1 1 1.0 โœ“ 1 0 1 1.0 โœ“ 1 1 0 0.0 โœ“ All correct? True
๐Ÿ“ Step 4: Decision boundary as algebraic variety
The DECISION BOUNDARY is where f(x1, x2) = 0: x1 + x2 โˆ’ 2ยทx1ยทx2 = 0 This is an ALGEBRAIC VARIETY โ€” the set of all points satisfying this equation. It's a degree-2 curve (a hyperbola!) that perfectly separates the XOR classes. In algebraic geometry terms: โ€ข Polynomial ring: R[x1, x2] โ€ข Ideal: I = โŸจx1 + x2 โˆ’ 2ยทx1ยทx2โŸฉ โ€ข Variety: V(I) = {(x1,x2) : x1 + x2 โˆ’ 2ยทx1ยทx2 = 0} The Nullstellensatz (paper Theorem 2.10) connects this geometric curve to its algebraic definition โ€” proving the two descriptions are equivalent.
๐Ÿ“ Step 5: Variety passes through XOR boundary points
Points ON the boundary (f=0) โ€” between classes: f(0.00, -0.00) = 0.000000 โ‰ˆ 0 โ† ON the boundary f(1.00, 1.00) = 0.000000 โ‰ˆ 0 โ† ON the boundary โœ“ SUMMARY: โ€ข XOR is NOT linearly separable (no straight-line solution) โ€ข A degree-2 polynomial solves it EXACTLY โ€ข Found by algebra (not gradient descent) โ€” perfect, zero error โ€ข Decision boundary is a hyperbola: an algebraic variety in Rยฒ

๐Ÿ“Š AI Impact: Exact algebraic solutions via Grรถbner basis (instant vs 50 epochs)


๐Ÿ“Š SUMMARY: Five Algebraic Structures and Their AI Roles

# Algebraic Concept AI Application (What it buys you)
1 G-Modules (Groups) Equivariant networks โ€” rotate input = rotate output. 12ร— data efficiency
2 Tensor Products Attention = bilinear map between Q and K. Multi-head = decompose tensor space. 4.1 BLEU gain
3 Functors (Categories) Nets are functors: composition law holds. Backprop is a contravariant functor
4 Homology (ฮฒโ‚€, ฮฒโ‚, ...) Count connected pieces and holes in data. Robust to 10% noise
5 Algebraic Varieties Decision boundaries are polynomial curves. Grรถbner basis โ†’ exact weights
โ”Œโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ # โ”‚ Algebraic Concept โ”‚ AI Application (What it buys you) โ”‚ โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 1 โ”‚ G-Modules (Groups) โ”‚ Equivariant networks โ€” rotate input = โ”‚ โ”‚ โ”‚ โ”‚ rotate output. 12ร— data efficiency. โ”‚ โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 2 โ”‚ Tensor Products โ”‚ Attention = bilinear map between Q and K.โ”‚ โ”‚ โ”‚ โ”‚ Multi-head = decompose the tensor space. โ”‚ โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 3 โ”‚ Functors (Categories) โ”‚ Nets are functors: composition law holds.โ”‚ โ”‚ โ”‚ โ”‚ Backprop is a contravariant functor. โ”‚ โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 4 โ”‚ Homology (ฮฒโ‚€, ฮฒโ‚, ...) โ”‚ Count connected pieces and holes in data.โ”‚ โ”‚ โ”‚ โ”‚ Robust to noise โ€” topology > geometry. โ”‚ โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 5 โ”‚ Algebraic Varieties โ”‚ Decision boundaries are polynomial curvesโ”‚ โ”‚ โ”‚ โ”‚ Grรถbner basis โ†’ exact network weights. โ”‚ โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ Each example shows that core AI capabilities are not engineering tricks โ€” they are consequences of deep algebraic structure.

๐Ÿ“š Paper Citation

@article{omer2026algebraic,
  title={Algebraic Foundations of Modern Artificial Intelligence: A Unified Mathematical Framework},
  author={Omer, Siraj Osman},
  journal={arXiv preprint},
  year={2026}
}