Skip to content

Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.

License

Notifications You must be signed in to change notification settings

SamurAIGPT/Generative-Media-Skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎭 Generative Media Skills for AI Agents

The Ultimate Multimodal Toolset for Claude Code, Cursor, and Gemini CLI.
A high-performance, schema-driven architecture for AI agents to generate, edit, and display professional-grade images, videos, and audio.

Agent Skills Demo

🚀 Get Started | 🎨 Expert Library | ⚙️ Core Primitives | 📖 Reference


✨ Key Features

  • 🤖 Agent-Native Design — Standardized terminal scripts with clean JSON outputs for seamless integration into agentic workflows.
  • 🧠 Expert Knowledge Layer — Domain-specific skills that bake in professional cinematography, atomic design, and branding logic.
  • ⚡ Dynamic Schema-Driven — Powered by schema_data.json, scripts automatically resolve the latest models, endpoints, and valid parameters.
  • 🖼️ Direct Media Display — Use the --view flag to automatically download and open generated media in your system viewer.
  • 📁 Local File Support — Auto-upload images, videos, faces, and audio from your local machine to the CDN for processing.
  • 🌈 100+ AI Models — One-click access to Midjourney v7, Flux Pro, Kling 3.0, Veo3, Suno V5, and more.

🏗️ Scalable Architecture

This repository uses a Core/Library split to ensure efficiency and high-signal discovery for LLMs:

⚙️ Core Primitives (/core)

The raw infrastructure for interacting with the muapi.ai engine.

  • core/media/ — High-fidelity Generation (Image, Video, Audio)
  • core/edit/ — Advanced Editing (Lipsync, Upscale, Effects)
  • core/platform/ — Setup & Polling Utilities

📚 Expert Library (/library)

High-value skills that translate creative intent into technical directives.

  • Cinema Director (/library/motion/cinema-director/) — Technical film direction & cinematography.
  • Nano-Banana (/library/visual/nano-banana/) — Reasoning-driven image generation (Gemini 3 Style).
  • UI Designer (/library/visual/ui-design/) — High-fidelity mobile/web mockups (Atomic Design).
  • Logo Creator (/library/visual/logo-creator/) — Minimalist vector branding (Geometric Primitives).

🧠 Self-Optimizing Skills

Every expert skill in the Library includes a Prompt Optimization Protocol. This allows LLMs (like Claude or Gemini) to use their own reasoning to expand simple user requests into high-fidelity technical briefs before calling the generation scripts.


🚀 Quick Start

1. Install the Skills

# Install all skills to your AI agent
npx skills add SamurAIGPT/Generative-Media-Skills --all

# Or install a specific skill
npx skills add SamurAIGPT/Generative-Media-Skills --skill muapi-media-generation

# List available skills
npx skills add SamurAIGPT/Generative-Media-Skills --list

# Install to specific agents
npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor

2. Configure Your API Key

# Get your key at https://muapi.ai/dashboard
bash core/platform/setup.sh --add-key "YOUR_MUAPI_KEY"

3. Run an Expert Skill with Direct Display

Generate a high-fidelity image and open it immediately using the --view flag.

# Use Nano-Banana reasoning to generate a 2K masterpiece from a local image
bash library/visual/nano-banana/scripts/generate-nano-art.sh \
  --file ./my-source-image.jpg \
  --subject "a glass hummingbird" \
  --style "macro photography" \
  --resolution "2k" \
  --view

4. Direct a Cinematic Scene

cd library/motion/cinema-director
# Create a 10-second 'epic' reveal without audio
bash scripts/generate-film.sh \
  --subject "a cybernetic dragon over Tokyo" \
  --intent "epic" \
  --model "kling-v3.0-pro" \
  --duration 10 \
  --no-audio \
  --view

📖 Schema Reference

This repository includes a streamlined schema_data.json that core scripts use at runtime to:

  • Validate Model IDs: Ensures the requested model exists.
  • Resolve Endpoints: Automatically maps model names to API endpoints.
  • Check Parameters: Validates supported aspect_ratio, resolution, and duration values.

🔧 Compatibility

Optimized for the next generation of AI development environments:

  • Claude Code: Direct terminal execution via tools.
  • Gemini CLI / Cursor / Windsurf: Seamless integration as local scripts.
  • MCP: Each skill is Model Context Protocol-ready for universal agent usage.

📄 License

MIT © 2026

Releases

No releases published

Packages

 
 
 

Languages