This is an extension for stable-diffusion-webui which visualizes the embedding vector generated by CLIP.
- Input prompt as you like.
- Click
Runbutton. - Wait a second.
This figure shows the embedding vector for each token. Each vector has 768 (for SDv1) or 1024 (for SDv2) dimensions.
This figure shows correlations between each token. Calculation is carried out as follows:
- Compute an embedding vector
vfrom the given prompt.vis typically has dimension (77, 768). For xattn,vis converted byto_klinear layer. - For each token
t, create a new prompt with thetreplaced by padding token. Then compute its embedding vectorv_{t}. - Let
d_{t} = v - v_{t}. - Let
d_{t,n}is nth row vector ofd_{t}.d_{t,n}is a 768(or 1024)-dimensional vector representingt's effect on nth token. Then compute|d_{t,n}|where|x|is norm of a vectorx. - Repeat procedure 2..3 for all
tin the given prompt.
By default, padding token is _</w> (ID=318).