Research Findings

Core Findings

What the experiments established

Finding 01

Geometric convexity is a structural property

The semantic space is inhabited at every point between any two poles, regardless of which words define those poles, which corpus trained the embedding, or which axis design is used. The midpoint between a love-cluster and a hate-cluster vector lands on semantically coherent intermediate words. This held at 100% across 15 consecutive experiments.

Finding 02

Physical universals encode cleanly; cultural constructs do not

Temperature, speed, and luminosity opposition is clean in any corpus because physical reality forces the same distributional patterns everywhere. Moral, epistemic, and social opposition are entangled because these dimensions reflect culturally specific frameworks. The architecture documents this ceiling precisely rather than hiding it.

Finding 03 — Safety finding

Moral and truth are nearly the same geometric direction

The geometric directions encoding moral and truth occupy nearly the same position in embedding space, with a Gram cosine of 0.744. This provides a mechanistic, geometric explanation for why large language models exhibit sycophantic behavior: the training signal cannot distinguish truthfulness from social approval. This was reached independently before discovering convergent research from three separate fields.

Finding 04

Independent corroboration from three separate fields

After completion, independent work in cognitive science (Gärdenfors, 2024), neuroscience (hippocampal geometry study, January 2026), and LLM interpretability (Zhou et al., 2025) converged on the same geometric structure of meaning from entirely different directions. These were not sources for the research. They are independent corroboration of a direction reached through building and running experiments.

85%

Agent routing accuracy
Benchmark 7

0.744

Gram cosine
moral/truth entanglement

0.123

ICA mean off-diagonal Gram
vs 0.210 human-designed

100%

Semantic midpoint detection
15 consecutive experiments

Benchmark Results

Seven benchmarks for
agent-oriented semantic reasoning

Transformer benchmarks measure what transformers do well. KAIA is not a transformer. These benchmarks measure what agents do: classify intent, rank context, score similarity, complete analogies, retrieve from memory, and route to action.

B1 · Intent classification

70%

Linear classifier trained on 80 examples. Stable across runs.

B2 · Context relevance ranking

80%

Cosine similarity on de-meaned 50D encoding. Stable across all runs.

B3 · Semantic similarity

80%

Tier separation HIGH > MEDIUM > LOW confirmed. Stable.

B5 · Analogy completion

40% / 65%

Top-1 / top-3 via vector arithmetic. Stable.

B6 · Memory retrieval

70% · 897k/s

Accuracy at 20 items. 897,000 queries per second on one CPU core.

B7 · Agent routing

85%

Linear classifier trained on 80 examples. Stable.

The Ceiling

What the architecture cannot yet do and why

The architecture is honest about its limits. B4 antonym detection achieved 25% top-1 on abstract axes. This is not a failure of the architecture. It is a finding about what distributional embeddings encode.

Abstract conceptual dimensions (moral, epistemic, social) do not exhibit clean geometric structure in distributional word embeddings because they are culturally contingent rather than physically forced. The ceiling is documented precisely, not hidden. It motivates the mathematical track: mathematics provides a calibration reference in the cleanest possible domain, and operations discovered there are tested as fixes for the language ceiling.

See the research roadmap Read Paper 3 on ICA axes

What the experiments established

Geometric convexity is a structural property

Physical universals encode cleanly; cultural constructs do not

Moral and truth are nearly the same geometric direction

Independent corroboration from three separate fields

Seven benchmarks foragent-oriented semantic reasoning

70%

80%

80%

40% / 65%

70% · 897k/s

85%

What the architecture cannot yet do and why

Follow the research as it happens

Seven benchmarks for
agent-oriented semantic reasoning