Skip to content

feat: add ACE-Step-1.5 support #1310

Closed
rmatif wants to merge 2 commits intoleejet:masterfrom
rmatif:ace-step
Closed

feat: add ACE-Step-1.5 support #1310
rmatif wants to merge 2 commits intoleejet:masterfrom
rmatif:ace-step

Conversation

@rmatif
Copy link
Contributor

@rmatif rmatif commented Mar 2, 2026

This PR add initial support for ACE-Step-1.5

CLI eg:

PROMPT="$(cat prompt.txt)"; LYRICS="$(cat lyrics.txt)";
./build/bin/sd-cli \
  -M audio_gen \
  -m /models/ace_step_1.5_turbo_aio.safetensors \
  -p "$PROMPT" \
  --lyrics "$LYRICS" \
  --sampling-method euler --scheduler simple --cfg-scale 1 --flow-shift 3 --steps 8 \
  --duration 120 --timesignature 4 \
  --language en --keyscale "G major" --bpm 120 \
  --lm-seed 32 -s 32 --fa \
  -o output.wav

Generates 120s of audio in under 12s on an RTX 4090

Download the model: https://huggingface.co/Comfy-Org/ace_step_1.5_ComfyUI_files/blob/main/checkpoints/ace_step_1.5_turbo_aio.safetensors

Required ggml submodule for now: https://github.com/rmatif/ggml/tree/ace

Since this is a text-to-music model and it introduces significant changes and new code, I’m wondering whether it’s preferable to integrate it into stable-diffusion.cpp, or if it would be better for me to create a separate repo (e.g ace.cpp) dedicated to this. @leejet any thoughts?

Output example:

output.mp4

@daniandtheweb
Copy link
Contributor

First of all, great job, that's an amazing addition.
If I may, I'd like to share my thoughts about this.
Given the fact that the project already handles both image and video generation I think that adding audio generation could be an appropriate choice. Moreover the current biggest generation project, ComfyUI, already supports it along with the other standard generation modes.

@wbruna
Copy link
Contributor

wbruna commented Mar 3, 2026

@LostRuins has been adding support to Koboldcpp for the past few days, via ServeurpersoCom/acestep.cpp.

@LostRuins
Copy link
Contributor

My 2c is that ace step is probably out of scope for this project being that it also requires an LM pass for the planner phase, and deals with audio rather than images/video. But I might be biased seeing that I've already got ace step working via ace-step.cpp, seems unnecessary to reinvent the wheel.

@leejet
Copy link
Owner

leejet commented Mar 3, 2026

Great job! However, I think this goes a bit beyond the intended scope of sd.cpp. It might be better to make it a separate repository, perhaps called ace-step.cpp.

@rmatif
Copy link
Contributor Author

rmatif commented Mar 3, 2026

Alright then I’ll close this PR. Since https://github.com/ServeurpersoCom/acestep.cpp exists (which I wasn’t aware of when I started working on this), I think I’ll drop this and may contribute there if I have something relevant to add

@rmatif rmatif closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants