Skip to content

[ML] Add EuroBERT/Jina v5 ops to graph validation allowlist#3015

Open
edsavage wants to merge 4 commits intoelastic:mainfrom
edsavage:feature/jina-v5-ops
Open

[ML] Add EuroBERT/Jina v5 ops to graph validation allowlist#3015
edsavage wants to merge 4 commits intoelastic:mainfrom
edsavage:feature/jina-v5-ops

Conversation

@edsavage
Copy link
Copy Markdown
Contributor

@edsavage edsavage commented Mar 29, 2026

Summary

Adds 4 ops required by the Jina Embeddings v5 model architecture (EuroBERT + LoRA adapters):

Op Used for
aten::sin Rotary position embeddings (RoPE) — sine component
aten::cos Rotary position embeddings (RoPE) — cosine component
aten::rsqrt RMSNorm (EuroBERT uses RMSNorm instead of LayerNorm)
aten::silu SiLU/Swish activation (EuroBERT uses SiLU instead of GELU)

Required for elastic/eland#818 which adds support for importing Jina v5 models into Elasticsearch.

Ops verified by tracing jinaai/jina-embeddings-v5-text-nano with merged LoRA retrieval adapter locally.

Cross-repo dependencies

Test plan

  • CI passes (allowlist drift test)
  • Add Jina v5 to validation_models.json and reference_model_ops.json
  • Verify traced model passes graph validation locally

Made with Cursor

Jina Embeddings v5 is based on EuroBERT, which uses a different
architecture from the BERT family:
- RoPE (rotary position embeddings) → aten::sin, aten::cos
- RMSNorm (instead of LayerNorm) → aten::rsqrt
- SiLU activation (instead of GELU) → aten::silu

Required for Eland PR elastic/eland#818 which adds support for
importing Jina v5 models into Elasticsearch.

Made-with: Cursor
@prodsecmachine
Copy link
Copy Markdown

prodsecmachine commented Mar 29, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

aten::sin and aten::cos are now in the allowlist (needed by
EuroBERT/Jina v5 for rotary position embeddings), so tests that
used them as example "unrecognised" ops now fail.

- Replace torch.sin with torch.logit in synthetic test modules
- Update malicious model tests to check for ops that remain
  unrecognised (aten::tan, aten::exp) rather than sin/cos

Made-with: Cursor
…logit)

Regenerate malicious_hidden_in_submodule.pt with aten::logit+clamp so
graph validation still fails when aten::sin is allowed for EuroBERT/Jina.
Update dev-tools/generate_malicious_models.py and test comments.

Made-with: Cursor
@edsavage edsavage marked this pull request as ready for review April 1, 2026 21:11
…te_code

Add jinaai/jina-embeddings-v5-text-nano to reference_models.json,
validation_models.json, and the golden reference_model_ops.json with
its 36 traced ops (verified all covered by the allowlist).

Pass trust_remote_code=True in torchscript_utils.py so models with
custom code (like Jina v5 / EuroBERT) can be loaded by the extraction
and validation tooling.

Made-with: Cursor
Copy link
Copy Markdown

@wwang500 wwang500 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Ed. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants