The project aims to make latent space exploration in Stable Diffusion more accessible and playful through an intuitive interface that lets users control the "re-noising" and "denoising" processes, exposing the typically black-boxed diffusion steps for creative manipulation.
Background
In the field of generative AI and image synthesis, Stable Diffusion models operate by manipulating images in latent space through processes of adding and removing noise. However, these internal mechanisms are typically hidden from users, limiting their ability to control and understand the image generation process creatively.
Problem
The main issue lies in making the complex latent space manipulation processes accessible and intuitive for users while maintaining creative control.
Challenge
Current AI image generation interfaces are unintuitive and lack meaningful creative workflows. While optimized efficiency, users feel a lack of visual intuition and connection.
Research Questions
How can we expose hidden steps of the generative process?
What interface paradigms best support latent space manipulation?
How does gamification impact user engagement?
Experiments
I am interested in exposing the “black box” - latent space in the diffusion model. We want to use the bidirectional control of “re-nosing” (DDIM inversion) and “denoising” as the vehicle of image generation
pixel manipulation 1
pixel manipulation 2
re-noise and denoise
prompt steering
Interaction Design
I want to focus on "continuous play" to bring gamified elements into text-based image generation and latent space exploration. The left 2 images are for "1D canvas", involving horizontal and vertical movements. The right 2 images are for "2D canvas", involving 360-degree slingshot style interaction.
UX Design
I designed a few interfaces, but after comparing, two shown below stood out. I selected Design 2 for its borderless canvas that eliminates visual constraints and creates a more immersive user experience. The refined aesthetic reduces visual noise, allowing users to focus entirely on exploring and interacting with generated images without unnecessary interface elements competing for attention.
interface design for best exploration experiences
Frontend
Left window (1D canvas) focuses on the linear and procedural view for clarity of flow and process. The right window (2D canvas) provides a holistic view of all generated images, enhancing the gameplay experience.
frontend mapping
Backend
Multiple APIs are called in a single generation process, and new images are stored, mapped, and displayed back to the canvases.
backend implementation
annotated generation
functional website
Outcome
The project successfully reimagines latent space exploration through an intuitive, game-like interface that exposes Stable Diffusion's underlying processes. While the current implementation demonstrates the potential of direct manipulation and auto-generated prompts for creative control, there's room to improve the balance between power and accessibility.
Focus on expanding evaluation metrics, optimizing prompt generation and backend performance, implementing branching history for multiple creative paths, and developing better tools to visualize noise-image relationships. Consider adding collaborative features to enrich the creative exploration process.