Skip to content

Conversation

@leejet
Copy link
Owner

@leejet leejet commented Jan 15, 2026

Flux.2 klein 4B

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "a lovely cat" --cfg-scale 1.0 --steps 4 -v --offload-to-cpu --diffusion-fa
output

Flux.2 klein 9B

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-9b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_8b.safetensors -p "a lovely cat" --cfg-scale 1.0 --steps 4 -v --offload-to-cpu --diffusion-fa
output

Flux.2 klein 4B edit

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -r .\kontext_input.png -p "change 'flux.cpp' to 'klein.cpp'" --cfg-scale 1.0 --sampling-method euler -v --diffusion-fa --offload-to-cpu --steps 4
output

Flux.2 klein 9B edit

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-9b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_8b.safetensors -r .\kontext_input.png -p "change 'flux.cpp' to 'klein.cpp'" --cfg-scale 1.0 --sampling-method euler -v --diffusion-fa --offload-to-cpu --steps 4
output

@leejet
Copy link
Owner Author

leejet commented Jan 16, 2026

Currently, there are still some issues with the support for flux.2 klein. Padding needs to be applied during tokenization and attention_mask must be used in llm.hpp, but at the moment the llm.hpp’s handling of attention_mask may have problems. When attention_mask is enabled, the results become NaN. This is the same issue seen with longcat image. I am still investigating and working on a fix.

@stduhpf
Copy link
Contributor

stduhpf commented Jan 16, 2026

So the --clip-on-cpu workaround should also work there?

@leejet
Copy link
Owner Author

leejet commented Jan 16, 2026

It doesn’t work on my side.

@leejet
Copy link
Owner Author

leejet commented Jan 16, 2026

I think I’ve correctly fixed the attention_mask issue.

@leejet
Copy link
Owner Author

leejet commented Jan 16, 2026

The quality of Flux.2 klein 4B doesn’t seem as good as z-image turbo.

Flux.2 klein 4b

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 5.0 --steps 4 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512 --rng cpu
output

Flux.2 klein base 4b

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-base-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 5.0 --steps 20 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512 --rng cpu
output

@Green-Sky
Copy link
Contributor

Green-Sky commented Jan 16, 2026

The quality of Flux.2 klein 4B doesn’t seem as good as z-image turbo.

Flux.2 klein 4b

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 5.0 --steps 4 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512 --rng cpu

Not sure about cfg, but they use guidance_scale=1.0 for the distilled (non-base) model.

Also they use guidance_scale=4.0 and num_inference_steps=50 for the base model.

(ref is 4b variants on hf)

edit: cfg of 5 seems comparatively high for models that take larger llm embedding inputs.
edit2: logger.warning(f"Guidance scale {guidance_scale} is ignored for step-wise distilled models.") hmm

edit3:

    def do_classifier_free_guidance(self):
        return self._guidance_scale > 1 and not self.config.is_distilled

So cfg should be 1 for the distilled model.

@leejet
Copy link
Owner Author

leejet commented Jan 16, 2026

Not sure about cfg, but they use guidance_scale=1.0 for the distilled (non-base) model.

They changed the README on Hugging Face. When I first checked it, the distill model was also using guidance_scale = 4.0. After changing guidance_scale to 1.0f, the image quality did improve a bit, but it’s still not as good as z-image turbo.

https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/commit/5e67da950fce4a097bc150c22958a05716994cea

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 1.0 --steps 4 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512 --rng cpu
output

@leejet
Copy link
Owner Author

leejet commented Jan 16, 2026

Also they use guidance_scale=4.0 and num_inference_steps=50 for the base model.

In fact, many Hugging Face examples for base models use relatively large step counts, like 40–50 — for example, SDXL uses 40 — but in practice, using around 20 steps often already gives good results.

This is the result with 50 steps. The quality has improved somewhat, but not by much.

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-base-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 5.0 --steps 50 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512 --rng cpuheric, profound, dark academic" --cfg-scale 5.0 --steps 50 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512
output

@Green-Sky
Copy link
Contributor

@leejet you talk about guidance scale, but your command only shows the cfg scale change. Or did you code the guidance scale?

@Green-Sky
Copy link
Contributor

Oh and have you tried reference image(s) ? This is a clear advantage over eg z-image.

@leejet
Copy link
Owner Author

leejet commented Jan 16, 2026

@leejet you talk about guidance scale, but your command only shows the cfg scale change. Or did you code the guidance scale?

guidance_scale in diffusers == --cfg-scale in sd.cpp

@leejet
Copy link
Owner Author

leejet commented Jan 16, 2026

Oh and have you tried reference image(s) ? This is a clear advantage over eg z-image.

Here I’m comparing the performance for T2I. Using a reference image means it’s image editing, which is a different task. Currently, z-image turbo does not support image editing.

@Green-Sky
Copy link
Contributor

@leejet you talk about guidance scale, but your command only shows the cfg scale change. Or did you code the guidance scale?

guidance_scale in diffusers == --cfg-scale in sd.cpp

Guidance scale as defined in [Classifier-Free Diffusion Guidance]

You are right, I did not know that.

Oh and have you tried reference image(s) ? This is a clear advantage over eg z-image.

Here I’m comparing the performance for T2I. Using a reference image means it’s image editing, which is a different task. Currently, z-image turbo does not support image editing.

Yes, I was asking because you did not show any examples yet. :)

@leejet
Copy link
Owner Author

leejet commented Jan 16, 2026

Yes, I was asking because you did not show any examples yet. :)

I’ve updated some examples of image editing. You can take a look. I think the overall quality of the image edits is pretty good.

@leejet leejet merged commit 9565c7f into master Jan 17, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants