Skip to content

Conversation

@woct0rdho
Copy link
Contributor

@woct0rdho woct0rdho commented Dec 19, 2025

I've implemented Jina CLIP v2, which is used by the NewBie image model. Before this PR, ComfyUI already supports NewBie's DiT (see #11172 ) and gemma-3-4b-it text encoder (in Lumina2). After this PR, we can run NewBie with full functionality in native ComfyUI.

The implementation of Jina CLIP v2 is put in a single py file, in a way similar to the existing BERT and Llama/Gemma. The weights and the tokenizer are also packaged in a single safetensors file. I've tested that it produces the same clip_text_pooled as the official Jina CLIP v2 (within some floating point error).

Here is an image generated using this PR, with a simple workflow in it:

ComfyUI_00039_

If I understand correctly, both Gemma and Jina need the system prompt written in the CLIPTextEncode node:

You are an assistant designed to generate high-quality anime images with the highest degree of image-text alignment based on xml format textual prompts. <Prompt Start>

otherwise the generated image will be garbage.

After this PR is merged, I can make another PR to support checkpoint loader (all-in-one model).

@comfyanonymous comfyanonymous merged commit 4c432c1 into comfyanonymous:master Dec 20, 2025
10 checks passed
@woct0rdho woct0rdho deleted the newbie branch December 20, 2025 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants