Skip to content

feat(skills): support loading skills from GitHub/Git URLs#2091

Open
dgallitelli wants to merge 3 commits intostrands-agents:mainfrom
dgallitelli:feat/skills-github-url-loading
Open

feat(skills): support loading skills from GitHub/Git URLs#2091
dgallitelli wants to merge 3 commits intostrands-agents:mainfrom
dgallitelli:feat/skills-github-url-loading

Conversation

@dgallitelli
Copy link
Copy Markdown
Contributor

@dgallitelli dgallitelli commented Apr 8, 2026

Summary

Closes #2090

  • Add support for remote Git repository URLs as skill sources in AgentSkills and Skill.from_url()
  • Support https://, git@, and ssh:// URLs with optional @ref suffix for branch/tag pinning (e.g., https://github.com/org/skill@v1.0.0)
  • Repositories are shallow-cloned (--depth 1) and cached locally at ~/.cache/strands/skills/ (configurable via cache_dir parameter)

Usage

from strands import Agent, AgentSkills

# Load skill directly from a GitHub repo
plugin = AgentSkills(skills=["https://github.com/dgallitelli/aws-data-agent-skill"])

# With version pinning
plugin = AgentSkills(skills=["https://github.com/org/skill-repo@v1.0.0"])

# Mixed with local skills
plugin = AgentSkills(skills=[
    "https://github.com/org/skill-repo",
    "./local-skills/my-skill",
])

# Or use the Skill classmethod directly
from strands import Skill
skills = Skill.from_url("https://github.com/org/skill-repo@main")

Changes

File Change
_url_loader.py (new) URL parsing, git clone with shallow depth, hash-based caching
skill.py Added Skill.from_url() classmethod
agent_skills.py Updated _resolve_skills() to detect URL strings; added cache_dir parameter
test_url_loader.py (new) 22 tests for URL detection, parsing, cache keys, and clone behavior
test_skill.py 6 tests for Skill.from_url()
test_agent_skills.py 4 tests for URL resolution in AgentSkills

Design decisions

  • Git clone via subprocess: No new dependencies — uses git which is universally available. Handles auth naturally (SSH keys, credential helpers, tokens).
  • Shallow clone: --depth 1 minimizes bandwidth and disk for skill repos.
  • Hash-based caching: sha256(url + ref)[:16] as cache directory name. Repeated loads are instant.
  • Graceful degradation: Failed URL clones log a warning and skip (consistent with how nonexistent filesystem paths are handled).
  • No new dependencies: Only stdlib modules (subprocess, hashlib, shutil).

Test plan

  • All 171 skill tests pass (32 new tests added)
  • Ruff lint and format pass
  • Pre-existing mypy error is unchanged (unrelated @hook decorator typing)
  • Manual test with a real public skill repo (e.g., aws-data-agent-skill)
  • Manual test with a nested real public skill repo (e.g., obra/superpowers/brainstorming)

Davide Gallitelli and others added 2 commits April 9, 2026 00:06
Add support for remote Git repository URLs as skill sources in
AgentSkills, enabling teams to share and reference skills directly
from GitHub without manual cloning.

Closes strands-agents#2090

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse GitHub web URLs like /tree/<ref>/path to extract the clone URL,
branch, and subdirectory path. This enables loading skills from
subdirectories within mono-repos.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dgallitelli dgallitelli force-pushed the feat/skills-github-url-loading branch from 01a4fc3 to 83966f6 Compare April 9, 2026 00:06
@github-actions github-actions bot added size/l and removed size/l labels Apr 9, 2026
@mkmeral
Copy link
Copy Markdown
Contributor

mkmeral commented Apr 9, 2026

/strands review

@mkmeral mkmeral self-assigned this Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 94.11765% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/strands/vended_plugins/skills/_url_loader.py 95.00% 2 Missing and 2 partials ⚠️
src/strands/vended_plugins/skills/agent_skills.py 81.81% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!


logger = logging.getLogger(__name__)

_DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: _DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills" is evaluated at module import time. In containerized environments (Lambda, Docker) where HOME may not be set, Path.home() raises RuntimeError.

Suggestion: Evaluate this lazily, e.g., inside clone_skill_repo when cache_dir is None. You could use a sentinel or simply inline Path.home() / ".cache" / "strands" / "skills" inside the function:

cache_dir = cache_dir or Path.home() / ".cache" / "strands" / "skills"

This also respects XDG_CACHE_HOME if you want to be a good citizen on Linux:

import os
cache_dir = cache_dir or Path(os.environ.get("XDG_CACHE_HOME", Path.home() / ".cache")) / "strands" / "skills"

_DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills"

# Patterns that indicate a string is a URL rather than a local path
_URL_PREFIXES = ("https://", "http://", "git@", "ssh://")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: http:// is included in URL prefixes, allowing skill repos to be cloned over plaintext HTTP. This exposes users to man-in-the-middle attacks where a malicious actor could inject arbitrary code into the cloned skill.

Suggestion: Remove "http://" from _URL_PREFIXES, or at minimum log a warning when an http:// URL is detected in clone_skill_repo. Git itself warns about this, but the SDK should guide users toward secure defaults (tenet: "the obvious path is the happy path").

key = cache_key(url, ref)
target = cache_dir / key

if not target.exists():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Race condition — if two threads/processes call clone_skill_repo concurrently with the same URL, both pass the not target.exists() check and attempt to clone into the same directory. One will fail or produce a corrupt state.

Suggestion: Use an atomic pattern: clone to a temporary directory first, then os.rename() (atomic on the same filesystem) to the target path. If the target already exists by the time rename happens, just delete the temp clone. Something like:

import tempfile

with tempfile.TemporaryDirectory(dir=cache_dir) as tmp:
    tmp_target = Path(tmp) / "repo"
    # ... clone into tmp_target ...
    try:
        tmp_target.rename(target)
    except OSError:
        # Another process won the race — that's fine, use their clone
        pass

return hashlib.sha256(key_input.encode()).hexdigest()[:16]


def clone_skill_repo(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: There is no mechanism to refresh a cached clone. When no ref is specified, the default branch is cloned. If the remote repo is updated, the user gets stale content indefinitely with no obvious way to refresh (other than manually deleting ~/.cache/strands/skills/).

Suggestion: Consider adding a force_refresh: bool = False parameter (or documenting the cache invalidation story clearly). At minimum, document in the docstring that users should delete the cache directory or pin a ref for reproducibility. This is especially important since the cache key is a hash — users can't easily find and delete the right directory.

List of Skill instances loaded from the repository.

Raises:
RuntimeError: If the repository cannot be cloned or ``git`` is not
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The Raises section documents RuntimeError but the method also raises ValueError (via is_url check on line 377). Additionally, calling Skill.from_url() with a non-URL raises ValueError, but using the same string through AgentSkills(skills=["./path"]) treats it as a local path. This inconsistency could confuse users.

Suggestion: Add ValueError to the Raises section in the docstring.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Issue: This PR introduces new public API surface (Skill.from_url() classmethod, cache_dir parameter on AgentSkills.__init__). Per the API Bar Raising guidelines, this should have the needs-api-review label since it adds a new public classmethod that customers will use directly.

Key questions an API reviewer should evaluate:

  1. Should from_url return list[Skill] (inconsistent with from_file which returns a single Skill)? The list return type is due to the possibility of multi-skill repos, but this creates an asymmetric API. Consider whether a separate from_url_directory would be cleaner, or if from_url should raise when multiple skills are found unless the user explicitly opts in.
  2. Is cache_dir the right parameter name and placement? It's only relevant when URL sources are used but appears as a top-level AgentSkills.__init__ parameter alongside unrelated parameters like state_key and max_resource_files.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Issue: The _url_loader module is not mentioned in the directory structure in AGENTS.md. Per the AGENTS.md instructions: "After making changes that affect the directory structure (adding new directories, moving files, or adding significant new files), you MUST update this directory structure section."

Suggestion: Add _url_loader.py under the vended_plugins/skills/ section in the AGENTS.md directory tree.

from ._url_loader import clone_skill_repo, is_url, parse_url_ref

if not is_url(url):
raise ValueError(f"url=<{url}> | not a valid remote URL")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The from_url ValueError uses an f-string in the exception message: f"url=<{url}> | not a valid remote URL". This is consistent with the structured logging style used elsewhere, but the AGENTS.md specifically says "Don't use f-strings in logging calls." Since this is an exception message (not a logging call), this is acceptable — but the logging calls throughout the module correctly use %s format, which is good.

However, the RuntimeError raised in clone_skill_repo at line 189 also uses an f-string: f"url=<{url}>, ref=<{ref}> | failed to clone...". This is fine since it's an exception message, not a log statement.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Review Summary

Assessment: Request Changes

This is a well-structured feature that fills a real user need. The design decisions (subprocess git, shallow clone, hash-based caching) are sound and well-documented in the PR description. The test coverage is thorough with 32 new tests.

Review Categories
  • Security: http:// URL support exposes users to MITM risks; Path.home() at import time can fail in containers. The subprocess usage is properly handled with shell=False.
  • Reliability: Race condition in cache directory creation when concurrent processes clone the same URL. No cache invalidation mechanism for unpinned refs means stale content with no obvious path to refresh.
  • API Design: This PR adds public API surface (Skill.from_url(), cache_dir on AgentSkills) and should have the needs-api-review label. The from_url returning list[Skill] is asymmetric with from_file returning Skill — worth discussing with an API reviewer.
  • Documentation: No documentation PR linked — this adds new public API that users need to know about. AGENTS.md directory tree and set_available_skills docstring need updating.

The core implementation is solid — addressing the security and reliability items above would make this ready to merge.

- Security: remove http:// support (MITM risk), only allow https/ssh/git@
- Reliability: atomic clone via tempdir + rename to prevent race conditions
- Reliability: evaluate cache dir lazily (not at import time) for containers
- Usability: respect XDG_CACHE_HOME for cache directory
- Usability: add force_refresh parameter to re-clone stale unpinned refs
- Docs: add ValueError to Skill.from_url() Raises section
- Docs: add _url_loader.py to AGENTS.md directory tree
- Tests: add coverage for XDG_CACHE_HOME, force_refresh, race condition,
  http:// rejection (184 tests total)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot removed the size/l label Apr 10, 2026
@dgallitelli
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review! I've addressed all the feedback in the latest push:

Security

  • Removed http:// support — only https://, ssh://, and git@ are now accepted. Plaintext HTTP is rejected by is_url().

Reliability

  • Lazy cache dir evaluation_DEFAULT_CACHE_DIR is no longer a module-level constant. The cache path is now computed inside clone_skill_repo() via _default_cache_dir(), which also respects XDG_CACHE_HOME.
  • Atomic clone pattern — Clones now go to a tempfile.mkdtemp() first, then are atomically renamed to the target. If another process wins the race, we use their clone and clean up the temp dir.

Usability

  • force_refresh parameter — Added to both clone_skill_repo() and Skill.from_url(). When True, deletes the cached clone and re-fetches. Documented in the docstring that users should pin a ref for reproducibility or use force_refresh=True for unpinned refs.

Docs

  • Added ValueError to the Raises section in Skill.from_url() docstring.
  • Updated AGENTS.md directory tree to include _url_loader.py.

Tests

  • 184 tests total (up from 179), covering XDG_CACHE_HOME, force_refresh, race condition handling, and http:// rejection.

Re: API design (from_url returning list[Skill])

This is intentional — a URL may point to a repo that is a single skill (has SKILL.md at root) or contains multiple skills (subdirectories with SKILL.md). Returning list[Skill] handles both cases uniformly. This mirrors how AgentSkills(skills=["./path"]) already handles both single-skill dirs and parent dirs transparently. Happy to discuss alternatives if an API reviewer has a different preference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AgentSkills: support loading skills from GitHub URLs

2 participants