1. Video Documentation

First Workshop on Computational Design and Computer-aided Creativity

Marching The Manifold - Visualizing Semantic Relationships in CLIP's Latent Space

Tobias Bongartz

0 0 Hochschule Bielefeld , Lampingstraße 3, 33649 Bielefeld , Germany

2025

Latent spaces form the mathematical foundation of modern AI image generation systems, yet their highdimensional complexity remains inaccessible to design practitioners who could beneft most from understanding them. The Bachelor Thesis "Marching The Manifold" transforms CLIP's (Contrastive Language-Image Pre-training)[1] abstract 768-dimensional embedding space into an interactive, galaxylike 3D visualization where 235,886 embedded words are semantically clustered. The Web-Interface is architected with Vite, leveraging JavaScript and HTML alongside React and Three.js for real time application. A custom-developed Latent Navigator tool, programmed in Python, employs beam search algorithms with FAISS[2]-optimized nearest-neighbor calculations to identify meaningful pathways through this space, which users explore via a purpose-built physical controller. As users traverse these pathways, pre generated Image sets by the difusion models FLUX.1[dev][3] and PixelWave Flux.1-dev 03[4] show corresponding imagery at each point, revealing the continuous semantic transitions between concepts at each point of the pathway. The resulting system visualizes semantic relationships as navigable paths through the latent space. Transitions between concepts manifest as mostly coherent visual transformations in the generated images, while the celestial metaphor provides users with an intuitive frame of reference. This visualization approach bridges the gap between technical AI implementation and creative practice by enabling designers to develop intuitive mental models of latent spaces through direct exploration. The project demonstrates how thoughtful interaction design can transform abstract mathematical structures into accessible tools for creative exploration.

eol>Latent Space Navigation Computational Visualization CLIP Embeddings Physical Computing AI Transparency 1

1. Video Documentation 2. The Custom Controller

A video documentation can be seen at: https://youtu.be/L29iY3emqsA

A 3D rendering of the physical controller can be seen at:https://youtu.be/Vok3NN-SeLs Declaration on Generative AI During the preparation of this work, the author used Perplexity and Claude 3.7 Sonnet Thinking in order to: Improve writing style and for grammar and spelling check. After using these tools, the author reviewed and edited the content as needed and takes full responsibility for the publication’s content.

[1]

Alec

Radford , Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal, “CLIP: Connecting Text and Images,” CLIP: Connecting Text and Images (blog ), January 5 , 2021 , https://openai.com/index/clip/.

[2]

Matthijs

Douze et al., “The Faiss Library” (arXiv, September 6 , 2024 ), URL: https://doi.org/10.48550/arXiv.2401.08281

[3]

“

Black- Forest-Labs/FLUX.1-Dev · Hugging Face, ” accessed January 1 , 2025 , URL: https://huggingface.co/black-forest-labs /FLUX .1-de.v

[4] “PixelWave - FLUX. 1 -Dev 03 | Flux Checkpoint | Civitai,” December 14, 2024 , URL: https://civitai.com/models/141592/pixelwave.