Evidence
What 27 experiments across three months established about the geometric structure of meaning, physically grounded universals, and the mechanistic basis of AI sycophancy.
Core Findings
The semantic space is inhabited at every point between any two poles, regardless of which words define those poles, which corpus trained the embedding, or which axis design is used. The midpoint between a love-cluster and a hate-cluster vector lands on semantically coherent intermediate words. This held at 100% across 15 consecutive experiments.
Temperature, speed, and luminosity opposition is clean in any corpus because physical reality forces the same distributional patterns everywhere. Moral, epistemic, and social opposition are entangled because these dimensions reflect culturally specific frameworks. The architecture documents this ceiling precisely rather than hiding it.
The geometric directions encoding moral and truth occupy nearly the same position in embedding space, with a Gram cosine of 0.744. This provides a mechanistic, geometric explanation for why large language models exhibit sycophantic behavior: the training signal cannot distinguish truthfulness from social approval. This was reached independently before discovering convergent research from three separate fields.
After completion, independent work in cognitive science (Gärdenfors, 2024), neuroscience (hippocampal geometry study, January 2026), and LLM interpretability (Zhou et al., 2025) converged on the same geometric structure of meaning from entirely different directions. These were not sources for the research. They are independent corroboration of a direction reached through building and running experiments.
85%
Agent routing accuracy
Benchmark 7
0.744
Gram cosine
moral/truth entanglement
0.123
ICA mean off-diagonal Gram
vs 0.210 human-designed
100%
Semantic midpoint detection
15 consecutive experiments
Benchmark Results
Transformer benchmarks measure what transformers do well. KAIA is not a transformer. These benchmarks measure what agents do: classify intent, rank context, score similarity, complete analogies, retrieve from memory, and route to action.
Linear classifier trained on 80 examples. Stable across runs.
Cosine similarity on de-meaned 50D encoding. Stable across all runs.
Tier separation HIGH > MEDIUM > LOW confirmed. Stable.
Top-1 / top-3 via vector arithmetic. Stable.
Accuracy at 20 items. 897,000 queries per second on one CPU core.
Linear classifier trained on 80 examples. Stable.
The Ceiling
The architecture is honest about its limits. B4 antonym detection achieved 25% top-1 on abstract axes. This is not a failure of the architecture. It is a finding about what distributional embeddings encode.
Abstract conceptual dimensions (moral, epistemic, social) do not exhibit clean geometric structure in distributional word embeddings because they are culturally contingent rather than physically forced. The ceiling is documented precisely, not hidden. It motivates the mathematical track: mathematics provides a calibration reference in the cleanest possible domain, and operations discovered there are tested as fixes for the language ceiling.