Data-Driven Semantic Axes: ICA Reveals That Independent Opposition Dimensions in Word Embeddings Do Not Match Human Conceptual Categories

Author

Tiffney Bare
Independent AI Researcher

Year

2026

Series

KAIA Research Series
Language Track

Status

Published
View on Zenodo →

Abstract

Independent Component Analysis applied to WordNet antonym difference vectors discovers semantic axes that are 42 percent more orthogonal than human-designed axes. The ICA-discovered directions include specificity, causality, directness, and change, which are more geometrically independent than the human-designed valence, moral, and truth axes. The mean off-diagonal Gram matrix value drops from 0.210 for human-designed axes to 0.123 for ICA-derived axes, with maximum entanglement reducing from 0.744 to 0.430. The paper documents the most significant entanglement finding of the research series: the moral and truth axes have a Gram cosine of 0.744, meaning they occupy nearly the same geometric direction in GloVe embedding space. This provides a mechanistic, geometric explanation for sycophantic behavior in large language models. The training signal cannot distinguish truthfulness from social approval because the embedding space does not separate them. This finding was reached independently before the discovery of convergent research from three separate fields in cognitive science, neuroscience, and LLM interpretability.

Read full paper on Zenodo →

DOI: 10.5281/zenodo.20214364

Cite this paper

Bare, T. (2026). Data-Driven Semantic Axes: ICA Reveals That Independent Opposition Dimensions in Word Embeddings Do Not Match Human Conceptual Categories. KAIA Research Series, Paper 3. Zenodo. https://doi.org/10.5281/zenodo.20214364

← Previous All papers Next →

Data-Driven Semantic Axes: ICA Reveals That Independent Opposition Dimensions in Word Embeddings Do Not Match Human Conceptual Categories

Follow the research as it happens