Skip to content

Conversation

@ilopezluna
Copy link
Contributor

@ilopezluna ilopezluna commented Jan 15, 2026

Based on #544
I started a new branch because the previous PR relied on gocontainerregistry. Given that, it was easier to start from scratch and use the previous PR as a reference rather than updating the old one.

In this PR, we also add support for packaging Diffusers models. For now, we only support the DDUF format (similar to GGUF: a single file containing everything), as it was the easiest to implement. However, support for other formats can be added later without much difficulty.

The backend is currently enabled only on Linux, but it also works on Darwin. That said, it requires a Python runtime, so we should first design a solution that works well in Docker Desktop.

I have packaged and pushed ignaciolopezluna020/stable-diffusion:latest and ignaciolopezluna020/stable-diffusion:Q4.

You can try by:

Building the image:
docker build --target final-diffusers -t model-runner:diffusers .

Then running it:
docker run -it --rm --gpus all -p 13434:13434 -e MODEL_RUNNER_PORT=13434 model-runner:diffusers

Pull model:
MODEL_RUNNER_HOST=http://localhost:13434 docker model pull ignaciolopezluna020/stable-diffusion:q4

And run it:

curl -s -X POST http://localhost:13434/engines/diffusers/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ignaciolopezluna020/stable-diffusion:Q4",
    "prompt": "A cat sitting on a couch",
    "size": "512x512"
  }' | jq -r '.data[0].b64_json' | base64 -d > cat.png

Note: the image is being built from llama.cpp one, it does not include vllm

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ilopezluna, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the system's capabilities by integrating a new backend specifically designed for Stable Diffusion models. It introduces support for the DDUF format, allowing for streamlined packaging and deployment of Diffusers-based image generation models. The new backend exposes an OpenAI Images API-compatible interface, making it easier to interact with image generation functionalities. This change lays the groundwork for future enhancements in multimodal model support, particularly in the realm of generative AI for images.

Highlights

  • New Stable Diffusion Backend: Introduces a new backend for Stable Diffusion models, enabling image generation capabilities within the system.
  • DDUF Format Support: Adds support for packaging and running Diffusers models using the DDUF (Diffusers Unified Format), a single-file archive similar to GGUF.
  • CLI package Command Extension: The package command in the CLI now includes a --dduf <path> option, allowing users to package DDUF files into Docker model OCI artifacts.
  • OpenAI Images API Compatibility: A new FastAPI-based Python server is implemented for Diffusers, providing endpoints compatible with the OpenAI Images API for image generation requests.
  • Docker Image Integration: The Dockerfile has been updated to include a dedicated build stage for Diffusers, installing necessary Python dependencies like diffusers, torch, transformers, and fastapi into a virtual environment.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new backend for Stable Diffusion models, leveraging the diffusers library. It adds support for packaging models in the DDUF (Diffusers Unified Format) via a new --dduf flag in the package command. The changes span across the Dockerfile to build the new environment, the CLI for packaging, and the core model distribution and inference scheduling components to integrate the new format and backend. The implementation includes a new Python-based server for handling image generation requests, compatible with the OpenAI Images API.

My review focuses on improving the Docker build process for better reproducibility and efficiency, and points out a key inconsistency in the backend implementation regarding model naming that should be addressed. The changes are otherwise well-structured and comprehensive.

@ilopezluna ilopezluna changed the title Add stable diffusion backend [WIP] Add stable diffusion backend Jan 15, 2026
@ilopezluna ilopezluna changed the title [WIP] Add stable diffusion backend Add stable diffusion backend Jan 15, 2026
@ilopezluna ilopezluna marked this pull request as ready for review January 15, 2026 12:51
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In the Dockerfile you changed LD_LIBRARY_PATH from /app/lib:$LD_LIBRARY_PATH to just /app/lib; this will discard any LD_LIBRARY_PATH set by the runtime (including GPU driver paths), so consider appending instead of overwriting to avoid subtle runtime linking issues.
  • The default error message in bundle.Unpack still says "neither GGUF nor safetensors" even though DDUF/diffusers is now supported; updating this message to include the new format will make debugging unsupported bundles less confusing.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the Dockerfile you changed `LD_LIBRARY_PATH` from `/app/lib:$LD_LIBRARY_PATH` to just `/app/lib`; this will discard any LD_LIBRARY_PATH set by the runtime (including GPU driver paths), so consider appending instead of overwriting to avoid subtle runtime linking issues.
- The default error message in `bundle.Unpack` still says "neither GGUF nor safetensors" even though DDUF/diffusers is now supported; updating this message to include the new format will make debugging unsupported bundles less confusing.

## Individual Comments

### Comment 1
<location> `pkg/distribution/internal/bundle/unpack.go:45-47` </location>
<code_context>
+		if err := unpackDDUF(bundle, model); err != nil {
+			return nil, fmt.Errorf("unpack DDUF file: %w", err)
+		}
 	default:
 		return nil, fmt.Errorf("no supported model weights found (neither GGUF nor safetensors)")
 	}
</code_context>

<issue_to_address>
**suggestion:** Error message in Unpack is now stale and doesn’t mention diffusers/DDUF format.

Given that `detectModelFormat` can now return `types.FormatDiffusers` when DDUF files are present, this fallback message is outdated. If we hit the default branch for a diffusers model (e.g., due to misconfiguration), the error will incorrectly claim only GGUF/safetensors are supported. Please update the message to also mention diffusers/DDUF support.

```suggestion
	default:
		return nil, fmt.Errorf("no supported model weights found (expected GGUF, safetensors, or diffusers/DDUF)")
	}
```
</issue_to_address>

### Comment 2
<location> `pkg/inference/platform/platform.go:21-24` </location>
<code_context>
 	return runtime.GOOS == "linux"
 }
+
+// SupportsDiffusers returns true if diffusers is supported on the current platform.
+// Diffusers is supported on Linux (for Docker/CUDA) and macOS (for MPS/Apple Silicon).
+func SupportsDiffusers() bool {
+	// return runtime.GOOS == "linux" || runtime.GOOS == "darwin"
+	return runtime.GOOS == "linux" // Support for macOS disabled for now until we design a solution to distribute it via Docker Desktop.
+}
</code_context>

<issue_to_address>
**nitpick:** SupportsDiffusers comment and implementation diverge regarding macOS support.

The comment says diffusers is supported on Linux and macOS, but the function only returns true on Linux (with the darwin branch commented out). This mismatch can confuse callers and maintainers. Please either update the comment to Linux-only for now, or note that macOS support is intentionally disabled until the distribution approach is finalized.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

ENV MODEL_RUNNER_PORT=12434
ENV LLAMA_SERVER_PATH=/app/bin
ENV HOME=/home/modelrunner
ENV MODELS_PATH=/models
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed $LD_LIBRARY_PATH because I didn't find any usage. Please let me know if I'm missing anything here 🙏

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@ilopezluna ilopezluna requested a review from a team January 15, 2026 13:03
ddufFilename := filepath.Base(ddufPaths[0])
// Ensure the filename has the .dduf extension for proper detection by diffusers server
if !strings.HasSuffix(strings.ToLower(ddufFilename), ".dduf") {
ddufFilename = ddufFilename + ".dduf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? Why don't we error out if doesn't have .dduf?

Copy link
Contributor Author

@ilopezluna ilopezluna Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is during unpacking into a bundle. We use the SHA of the blob as a filename, so no .dduf extension. The diffusers fails if the file provided has no .dduf extension :(

Copy link
Contributor

@doringeman doringeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this so don't display that warning on pull.

diff --git a/pkg/distribution/distribution/client.go b/pkg/distribution/distribution/client.go
index d153e96d..e1815d15 100644
--- a/pkg/distribution/distribution/client.go
+++ b/pkg/distribution/distribution/client.go
@@ -638,9 +638,9 @@ func (c *Client) GetBundle(ref string) (types.ModelBundle, error) {
 
 func GetSupportedFormats() []types.Format {
 	if platform.SupportsVLLM() {
-		return []types.Format{types.FormatGGUF, types.FormatSafetensors}
+		return []types.Format{types.FormatGGUF, types.FormatSafetensors, types.FormatDiffusers}
 	}
-	return []types.Format{types.FormatGGUF}
+	return []types.Format{types.FormatGGUF, types.FormatDiffusers}
 }
 
 func checkCompat(image types.ModelArtifact, log *logrus.Entry, reference string, progressWriter io.Writer) error {
MODEL_RUNNER_HOST=http://localhost:13434 docker model pull ignaciolopezluna020/stable-diffusion:Q4
Warning: vLLM backend currently only implemented for x86_64 NVIDIA platforms
a1670237613e: Pull complete [==================================================>]  2.828GB/2.828GB
Model pulled successfully

@doringeman
Copy link
Contributor

Screenshot 2026-01-15 at 16 01 38

@doringeman
Copy link
Contributor

You should add the new image-generation mode to the following files as well (search for completion/embedding/reranking):

  • ./cmd/cli/commands/configure_flags.go
  • ./cmd/cli/commands/compose_test.go
  • ./pkg/inference/scheduling/loader.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants