refactor(tests): distinguish -fail from -unsupported, auto-detect show specs#1015
refactor(tests): distinguish -fail from -unsupported, auto-detect show specs#1015
Conversation
…(missing semantics) Rename 19 prove-rs test files from `-fail` to `-unsupported` to clarify that they fail because the semantics doesn't support the feature yet, not because the program itself is expected to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
73287c4 to
4d2b75a
Compare
Remove the manually maintained PROVE_SHOW_SPECS list. All -fail and -unsupported tests now automatically verify show output; passing tests only assert proof success. Remove 5 show expected files for passing tests that no longer need show verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove artificial `assert!(false)` from main — it passes with concrete args. Keep only `eats_all_args` as proof entry (stuck on raw pointer deref and slice ops). Update test_cli.py references accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch test_cli_show_statistics_and_leaves and test_cli_show_minimize_proof to use symbolic-structs-fail with eats_struct_args entry. This provides a stable proof tree with symbolic branching (splits), unlike the previous symbolic-args fixture whose main entry now passes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63bead9e61
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
4cf2769 to
9d50636
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9d506364bd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
2f06156 to
2c3bb1d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c3bb1d05b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
2c3bb1d to
98c68be
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 98c68be3c6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
dkcumming
left a comment
There was a problem hiding this comment.
I like this! I have the comments that would be nice to consider and/or address, but I don't want to get in the way of this merging so approving!
| test-args: '-k test_verify_rust_std' | ||
| parallel: 6 | ||
| timeout: 60 | ||
| timeout: 90 |
There was a problem hiding this comment.
I think this should be a redundant change since #1069 improved performance, although I haven't seen a repeat on runner 10:
- https://github.com/runtimeverification/mir-semantics/actions/runs/24694894839/job/72225410402?pr=1069
- https://github.com/runtimeverification/mir-semantics/actions/runs/24697098086/job/72232173183?pr=1071
Not necessarily requesting a change just noting that it should be less than 60 consistently now.
| def _normalize_show_output(text: str) -> str: | ||
| text = _normalize_symbol_hashes(text) | ||
| text = re.sub(r'(?m)^(\s*(?:[│┃┊]\s*)?span: )\d+$', r'\1<span>', text) | ||
| text = re.sub(r'(?m)^\s*>> message: .*\n?', '', text) | ||
| return text.rstrip('\r\n') |
There was a problem hiding this comment.
Is there a reason we want to remove the decoded message? If we have a different match with a fail that errors with a message and a span, would it not be helpful to have that information? Is having the information creating a problem?
I am always deeply suspicious of claude using regex covering things up. At the very least this should be adequately documented to explain what it is doing.
| smir_json_result = cwd / rs_file.with_suffix('.smir.json').name | ||
| run_process_2(command, cwd=cwd) | ||
| resolved_smir_json_result = cwd / rs_file.resolve().with_suffix('.smir.json').name | ||
| if not smir_json_result.is_file() and resolved_smir_json_result.is_file(): | ||
| resolved_smir_json_result.rename(smir_json_result) |
There was a problem hiding this comment.
I couldn't understand the point of this until I queried claude. It might be nice to mention that this is addressing symlinks.
Summary
-failto-unsupportedto distinguish tests that fail because the semantics doesn't support a feature yet from tests where the program itself is expected to failPROVE_SHOW_SPECSlist — all-failand-unsupportedtests now automatically verify show outputassert!(false)fromsymbolic-argstest (main passes with concrete args, onlyeats_all_argsis unsupported)test_cli_show_statistics_and_leaves,test_cli_show_minimize_proof) to usesymbolic-structs-fail.eats_struct_argsfor stable proof trees with symbolic branchingTest plan
cd kmir && uv run pytest src/tests/integration/test_integration.py -k "unsupported or fail" -qcd kmir && uv run pytest src/tests/integration/test_cli.py -k "statistics_and_leaves or minimize_proof" -q