refactor(tests): distinguish -fail from -unsupported, auto-detect show specs by Stevengre · Pull Request #1015 · runtimeverification/mir-semantics

Stevengre · 2026-04-03T13:31:34Z

Summary

Rename 20 prove-rs test files from -fail to -unsupported to distinguish tests that fail because the semantics doesn't support a feature yet from tests where the program itself is expected to fail
Remove manually maintained PROVE_SHOW_SPECS list — all -fail and -unsupported tests now automatically verify show output
Remove assert!(false) from symbolic-args test (main passes with concrete args, only eats_all_args is unsupported)
Switch CLI test fixtures (test_cli_show_statistics_and_leaves, test_cli_show_minimize_proof) to use symbolic-structs-fail.eats_struct_args for stable proof trees with symbolic branching

Test plan

cd kmir && uv run pytest src/tests/integration/test_integration.py -k "unsupported or fail" -q
cd kmir && uv run pytest src/tests/integration/test_cli.py -k "statistics_and_leaves or minimize_proof" -q

…(missing semantics) Rename 19 prove-rs test files from `-fail` to `-unsupported` to clarify that they fail because the semantics doesn't support the feature yet, not because the program itself is expected to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the manually maintained PROVE_SHOW_SPECS list. All -fail and -unsupported tests now automatically verify show output; passing tests only assert proof success. Remove 5 show expected files for passing tests that no longer need show verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove artificial `assert!(false)` from main — it passes with concrete args. Keep only `eats_all_args` as proof entry (stuck on raw pointer deref and slice ops). Update test_cli.py references accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Switch test_cli_show_statistics_and_leaves and test_cli_show_minimize_proof to use symbolic-structs-fail with eats_struct_args entry. This provides a stable proof tree with symbolic branching (splits), unlike the previous symbolic-args fixture whose main entry now passes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63bead9e61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d506364bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c3bb1d05b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 98c68be3c6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

dkcumming

I like this! I have the comments that would be nice to consider and/or address, but I don't want to get in the way of this merging so approving!

dkcumming · 2026-04-21T20:54:00Z

            test-args: '-k test_verify_rust_std'
            parallel: 6
-            timeout: 60
+            timeout: 90


I think this should be a redundant change since #1069 improved performance, although I haven't seen a repeat on runner 10:

https://github.com/runtimeverification/mir-semantics/actions/runs/24694894839/job/72225410402?pr=1069

https://github.com/runtimeverification/mir-semantics/actions/runs/24697098086/job/72232173183?pr=1071

Not necessarily requesting a change just noting that it should be less than 60 consistently now.

dkcumming · 2026-04-21T20:59:21Z

+def _normalize_show_output(text: str) -> str:
+    text = _normalize_symbol_hashes(text)
+    text = re.sub(r'(?m)^(\s*(?:[│┃┊]\s*)?span: )\d+$', r'\1<span>', text)
+    text = re.sub(r'(?m)^\s*>> message: .*\n?', '', text)
+    return text.rstrip('\r\n')


Is there a reason we want to remove the decoded message? If we have a different match with a fail that errors with a message and a span, would it not be helpful to have that information? Is having the information creating a problem?

I am always deeply suspicious of claude using regex covering things up. At the very least this should be adequately documented to explain what it is doing.

dkcumming · 2026-04-21T21:07:09Z

    smir_json_result = cwd / rs_file.with_suffix('.smir.json').name
    run_process_2(command, cwd=cwd)
+    resolved_smir_json_result = cwd / rs_file.resolve().with_suffix('.smir.json').name
+    if not smir_json_result.is_file() and resolved_smir_json_result.is_file():
+        resolved_smir_json_result.rename(smir_json_result)


I couldn't understand the point of this until I queried claude. It might be nice to mention that this is addressing symlinks.

Stevengre force-pushed the codex/20260403-assert-inhabited branch from 73287c4 to 4d2b75a Compare April 13, 2026 07:55

Stevengre and others added 3 commits April 13, 2026 08:02

Stevengre changed the title ~~fix(intrinsics): prioritize assert_inhabited failure~~ refactor(tests): distinguish -fail from -unsupported, auto-detect show specs Apr 13, 2026

Stevengre requested review from dkcumming and mariaKt April 13, 2026 08:40

Stevengre self-assigned this Apr 13, 2026

Stevengre marked this pull request as ready for review April 13, 2026 08:40

style: fix black formatting in test_integration.py

91fa4f3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread kmir/src/tests/integration/test_integration.py Outdated

Stevengre force-pushed the codex/20260403-assert-inhabited branch from 4cf2769 to 9d50636 Compare April 13, 2026 11:58

chatgpt-codex-connector Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread kmir/src/tests/integration/test_integration.py

Stevengre force-pushed the codex/20260403-assert-inhabited branch 2 times, most recently from 2f06156 to 2c3bb1d Compare April 13, 2026 13:07

chatgpt-codex-connector Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread kmir/src/tests/integration/test_integration.py

fix(tests): keep snapshots for -fail and -unsupported

98c68be

Stevengre force-pushed the codex/20260403-assert-inhabited branch from 2c3bb1d to 98c68be Compare April 13, 2026 14:28

chatgpt-codex-connector Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread kmir/src/tests/integration/test_integration.py

ci: extend verify-rust-std timeout

580ebe3

Stevengre requested review from F-WRunTime, ehildenb and palinatolmach and removed request for F-WRunTime April 16, 2026 02:02

Merge branch 'master' into codex/20260403-assert-inhabited

5d69415

dkcumming approved these changes Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(tests): distinguish -fail from -unsupported, auto-detect show specs#1015

refactor(tests): distinguish -fail from -unsupported, auto-detect show specs#1015
Stevengre wants to merge 8 commits intomasterfrom
codex/20260403-assert-inhabited

Stevengre commented Apr 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

dkcumming left a comment

Uh oh!

dkcumming Apr 21, 2026

Uh oh!

dkcumming Apr 21, 2026

Uh oh!

dkcumming Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Stevengre commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

dkcumming left a comment

Choose a reason for hiding this comment

Uh oh!

dkcumming Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dkcumming Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dkcumming Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stevengre commented Apr 3, 2026 •

edited

Loading