B&B CPU determinism by aliceb-nv · Pull Request #798 · NVIDIA/cuopt

aliceb-nv · 2026-01-26T13:43:24Z

This PR introduces a deterministic execution mode for the parallel branch-and-bound MIP solver.
When determinism_mode = CUOPT_MODE_DETERMINISTIC, the solver guarantees bitwise-identical results across runs regardless of thread scheduling variations, so long as the runs are on the same platform with the same environment (= same glibc, etc...)

The approach that was chosen is referred to as Bulk Synchronous Parallel (BSP) in the literature. This is an execution model where computation proceeds in discrete horizons of virtual time (=work units) separated by synchronization barriers.
To cope with the inherent nondeterminism of wall-clock-time due to factors such as caching state, OS scheduling, CPU throttling...
instead of time, progress is measured in "work units" approximating the execution time of an algorithm using known and deterministic factors, such as memory operations, problem features and sparsity...

Workers explore the tree independently and locally within a horizon, and collect events (branching decisions, solutions, pseudo-cost updates) during the process without exchanging them yet.
At each horizon boundary, events are sorted by work-unit timestamp with tie-breaking, history is replayed in deterministic order to update global state, and if need be nodes are redistributed across workers to balance load (which replaces the ramp-up mechanism).
Workers operate on local snapshots of shared state (pseudo-costs, upper bound, LP iteration count) taken at horizon start to avoid read races. This trades some accuracy in decisions (pruning may rely on a higher lower bound than in nondeterminsitic mode), for determinism.

Support for start CPU heuristics is present, based on a "producer" model: CPUFJ starts immediately and begins producing solutions with work unit timestamps, that are produced by the B&B thread once the appropriate time is reached. A form of one-way synchronization is implemented to prevent the CPUFJ thread from "falling behind" and producing solutiosn in the past from the perspective of the B&B thread. CPUFJ is allowed to run ahead, and is biased to run faster than B&B to avoid unnecessary syncs.

B&B BFS workers, diving workers, and a start CPUFJ thread are supported in deterministic mode. GPU Heuristics are disabled in this mode and will be incorporated into the deterministic framework in the future.

Benchmark results

B&B alone, 10min

Nondeterministic B&B alone (default): 174 feasible, 38.3% primal gap
Deterministic B&B alone: 167 feasible, 40.3% primal gap
Deterministic B&B+CPUFJ: 184 feasible, 35.9% primal gap

Regression testing again main branch:
With changes: 228 feasible, 11.9% primal gap
Baseline: 224 feasible, 13.2% primal gap (difference likely due to indeterminism/noise)

Summary by CodeRabbit

New Features
- Work-limit control, a deterministic solve mode, and configurable random seed; new CLI flags to set work-limit and enable determinism.
Infrastructure
- Enhanced profiling, memory and timing instrumentation; work-unit synchronization and scheduler added to coordinate deterministic runs; deterministic B&B/workers and replay scaffolding integrated.
Tests
- New deterministic reproducibility tests ensuring consistent termination, objectives, and solutions across repeated runs.

…LP iterations and reliable threshold for reliability branching.

…would not be accounted for when iterations aren't multiples for LOG_FEATURE_ITERATIONS

aliceb-nv · 2026-02-09T11:27:09Z

/ok to test 80037ef

copy-pr-bot · 2026-02-09T11:27:13Z

/ok to test 80037ef

@aliceb-nv, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

aliceb-nv · 2026-02-09T11:28:24Z

/ok to test f8021de

aliceb-nv · 2026-02-09T11:36:01Z

Thanks a lot for your reviewing work Chris!

Regarding 1) - there is no global heap. All live nodes exist within a single worker at all times, and may be shuffled around workers if imbalances are detected. Nodes are fathomed when they're popped from the local worker heap and compared against the upper bound, and during syncs as well to avoid distributing useless nodes when doing work stealing. But I think this can be removed, I will run some tests and change if appropriate.

I completely agree that the PR could use some more refactoring work to avoid duplication/scattering of code and logic. Let's discuss that during the sync👍 I tried to balance clean code structure with "getting it done" in time, but it's a difficult balance to get right

For instrumented vectors: yes, I think this should be the correct approach. The ins_vectors were a "stopgap" made with the intention of phasing them out in time for proper work estimates. They mostly serve to establish a baseline for now, but ideally they will be all removed in time

I've ran benchmarks on both the MIP solves and the miplib2017 root relaxations to make sure I didn't cause any regression; I am sending you the results via slack

Thanks a ton again!

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@cpp/src/dual_simplex/branch_and_bound.hpp`:
- Line 333: Rename the misspelled private method
determistic_collect_diving_solutions_and_update_psuedocosts to
deterministic_collect_diving_solutions_and_update_pseudocosts everywhere: update
the declaration (the method name in branch_and_bound.hpp), its
definition/implementation, and all call sites (including any unit tests or
comments) to use the corrected spelling; ensure any forward declarations or
references (e.g., in the class that owns this method) are updated so builds and
linkage remain consistent.

In `@cpp/src/dual_simplex/deterministic_workers.hpp`:
- Around line 370-383: enqueue_dive_node currently always pushes nodes even when
presolver reports infeasible; change the code to check the result of
this->node_presolver.bounds_strengthening(...) and early-return if it indicates
failure so we don't detach/copy or push the node into dive_queue. Specifically,
in enqueue_dive_node use the boolean return of bounds_strengthening (or adjust
to capture its failure signal) right after calling it and before calling
node->detach_copy() or dive_queue.push_back(...); if it returns false, simply
return without enqueuing.

In `@cpp/src/mip/presolve/third_party_presolve.cpp`:
- Around line 716-719: Add a null-guard for papilo_post_solve_storage_ before
constructing/using papilo::Postsolve and calling post_solver.undo: check if
papilo_post_solve_storage_ is null (similar to the existing status_to_skip
guard) and skip/return early when it is null to avoid dereferencing it in
papilo_post_solve_storage_->getNum() and in the undo call; ensure the guard is
placed before the lines that construct post_solver and call post_solver.undo so
undo() is never invoked with a null papilo_post_solve_storage_.

In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c`:
- Around line 1461-1577: In test_deterministic_bb ensure the test fails early if
the first run is not a successful solve: after capturing first_status in
test_deterministic_bb (where first_status and first_objective are set), add a
check that first_status equals the solver's success termination constant (e.g.,
CUOPT_TERMINATION_STATUS_OPTIMAL) and if not set status =
CUOPT_VALIDATION_ERROR, print a clear message using
termination_status_to_string(first_status), destroy solution and goto DONE; this
guards against consistent but unsuccessful termination across runs and uses
symbols first_status, first_objective, termination_status_to_string,
CUOPT_VALIDATION_ERROR, and test_deterministic_bb to locate where to add the
check.

🧹 Nitpick comments (8)

cpp/src/dual_simplex/triangle_solve.hpp (2)
101-117: U.x.byte_loads undercounts by one diagonal read per active column.

In each active column, U_x[col_end] (the diagonal, line 106) is read but not included in nnz_processed (line 111 counts only col_end - col_start, the off-diagonal range). So U.x.byte_loads at line 117 misses one sizeof(f_t) per active column.

This is the mirror of the slight overcount in lower_triangular_solve for L.i. Neither affects determinism (the accounting is consistent across runs), but if you want accurate work-unit estimates, consider:
Proposed fix
-      nnz_processed += col_end - col_start;
+      nnz_processed += col_end - col_start + 1;  // +1 for diagonal U_x[col_end]
And split the U.i accounting if you want it exact (off-diagonal only):
-  U.i.byte_loads += nnz_processed * sizeof(i_t);
-  U.x.byte_loads += nnz_processed * sizeof(f_t);
+  // nnz_processed now includes diagonal; U_i is only indexed for off-diag
+  // For simplicity, the small overcount on U.i is acceptable.
+  U.i.byte_loads += nnz_processed * sizeof(i_t);
+  U.x.byte_loads += nnz_processed * sizeof(f_t);
18-24: Consider replacing macros with constexpr inline functions.

FLIP, UNFLIP, MARKED, and MARK are classic CSparse-style macros. Replacing them with constexpr inline functions would give type safety and avoid the usual macro pitfalls (double evaluation, namespace pollution). Low priority since these are well-established idioms.
cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (2)
10-19: Duplicate #include <vector>.

<vector> is included on both Line 10 and Line 19.
Proposed fix
 `#include` <vector>
 
 `#include` <cuopt/linear_programming/constants.h>
 `#include` <cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh>
 `#include` <cuopt/linear_programming/utilities/internals.hpp>
 
 `#include` <raft/core/device_span.hpp>
 `#include` <rmm/device_uvector.hpp>
 
-#include <vector>
87-87: Missing Doxygen comment for work_limit.

determinism_mode (Line 112–120) and seed (Line 122–127) have Doxygen doc blocks, but work_limit does not. For consistency and since this is a public-facing header, consider adding one.
cpp/src/mip/solver.cu (1)

269-270: Remove commented-out debug line.

Line 270 (// context.work_unit_scheduler_.verbose = true;) is a debug artifact.
cpp/src/mip/solve.cu (1)
171-173: Double std::getenv call — store the result once.

std::getenv("CUOPT_MIP_HIDE_SOLUTION") is called twice. While harmless in practice, this is unnecessary and slightly fragile.
Proposed fix
-  int hidesol =
-    std::getenv("CUOPT_MIP_HIDE_SOLUTION") ? atoi(std::getenv("CUOPT_MIP_HIDE_SOLUTION")) : 0;
+  const char* hidesol_env = std::getenv("CUOPT_MIP_HIDE_SOLUTION");
+  int hidesol = hidesol_env ? atoi(hidesol_env) : 0;
benchmarks/linear_programming/cuopt/run_mip.cpp (1)

140-153: Consider a settings/config struct for run_single_file.

The function now has 13 parameters. Wrapping them in a struct would improve readability and reduce the chance of argument-ordering mistakes at call sites. Not blocking, just a suggestion for a future cleanup pass.
cpp/src/dual_simplex/bb_event.hpp (1)
46-73: Consider widening event_sequence to avoid overflow in long runs.

With large B&B trees, a 32-bit int can wrap and break deterministic ordering. Using std::uint64_t (and matching types in deterministic workers) would make this robust.
🔧 Proposed change
-  int event_sequence;
+  std::uint64_t event_sequence;

coderabbitai · 2026-02-09T11:41:14Z

cpp/src/dual_simplex/branch_and_bound.hpp

+  void determinism_assign_diving_nodes();
+
+  // Collect and merge diving solutions at sync
+  void determistic_collect_diving_solutions_and_update_psuedocosts();


⚠️ Potential issue | 🟡 Minor

Two typos in method name: determistic → deterministic, psuedocosts → pseudocosts.

This is a private method declaration, so fixing it now is straightforward before it spreads to more call sites.

Proposed fix

- void determistic_collect_diving_solutions_and_update_psuedocosts(); + void deterministic_collect_diving_solutions_and_update_pseudocosts();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

void determistic_collect_diving_solutions_and_update_psuedocosts();

void deterministic_collect_diving_solutions_and_update_pseudocosts();

🤖 Prompt for AI Agents

In `@cpp/src/dual_simplex/branch_and_bound.hpp` at line 333, Rename the misspelled private method determistic_collect_diving_solutions_and_update_psuedocosts to deterministic_collect_diving_solutions_and_update_pseudocosts everywhere: update the declaration (the method name in branch_and_bound.hpp), its definition/implementation, and all call sites (including any unit tests or comments) to use the corrected spelling; ensure any forward declarations or references (e.g., in the class that owns this method) are updated so builds and linkage remain consistent.

coderabbitai · 2026-02-09T11:41:14Z

cpp/src/dual_simplex/deterministic_workers.hpp

+  void enqueue_dive_node(mip_node_t<i_t, f_t>* node,
+                         const lp_problem_t<i_t, f_t>& original_lp,
+                         const simplex_solver_settings_t<i_t, f_t>& settings)
+  {
+    dive_queue_entry_t<i_t, f_t> entry;
+    entry.resolved_lower = original_lp.lower;
+    entry.resolved_upper = original_lp.upper;
+    std::vector<bool> bounds_changed(original_lp.num_cols, false);
+    node->get_variable_bounds(entry.resolved_lower, entry.resolved_upper, bounds_changed);
+    this->node_presolver.bounds_strengthening(
+      settings, bounds_changed, entry.resolved_lower, entry.resolved_upper);
+    entry.node = node->detach_copy();
+    dive_queue.push_back(std::move(entry));
+  }


⚠️ Potential issue | 🟡 Minor

Skip enqueue when bounds strengthening reports infeasible.

Right now, infeasible nodes are still queued for diving, which wastes work. Consider early‑returning when bounds_strengthening(...) fails.

🔧 Proposed fix

- this->node_presolver.bounds_strengthening( - settings, bounds_changed, entry.resolved_lower, entry.resolved_upper); + if (!this->node_presolver.bounds_strengthening( + settings, bounds_changed, entry.resolved_lower, entry.resolved_upper)) { + return; + }

🤖 Prompt for AI Agents

In `@cpp/src/dual_simplex/deterministic_workers.hpp` around lines 370 - 383, enqueue_dive_node currently always pushes nodes even when presolver reports infeasible; change the code to check the result of this->node_presolver.bounds_strengthening(...) and early-return if it indicates failure so we don't detach/copy or push the node into dive_queue. Specifically, in enqueue_dive_node use the boolean return of bounds_strengthening (or adjust to capture its failure signal) right after calling it and before calling node->detach_copy() or dive_queue.push_back(...); if it returns false, simply return without enqueuing.

coderabbitai · 2026-02-09T11:41:14Z

cpp/src/mip/presolve/third_party_presolve.cpp

+  papilo::Postsolve<f_t> post_solver{Msg, papilo_post_solve_storage_->getNum()};

  bool is_optimal = false;
-  auto status     = post_solver.undo(reduced_sol, full_sol, papilo_post_solve_storage_, is_optimal);
+  auto status = post_solver.undo(reduced_sol, full_sol, *papilo_post_solve_storage_, is_optimal);


⚠️ Potential issue | 🟡 Minor

No null-guard before dereferencing papilo_post_solve_storage_.

If undo() is ever called before apply() (or after apply() returned nullopt due to infeasibility), papilo_post_solve_storage_ will be null and the dereference on Lines 716/719 will crash. Consider adding an early check, similar to the status_to_skip guard on Line 699.

Proposed defensive check

if (status_to_skip) { return; } + cuopt_expects(papilo_post_solve_storage_ != nullptr, + error_type_t::RuntimeError, + "Postsolve storage is not initialized; was presolve applied?"); std::vector<f_t> primal_sol_vec_h(primal_solution.size());

🤖 Prompt for AI Agents

In `@cpp/src/mip/presolve/third_party_presolve.cpp` around lines 716 - 719, Add a null-guard for papilo_post_solve_storage_ before constructing/using papilo::Postsolve and calling post_solver.undo: check if papilo_post_solve_storage_ is null (similar to the existing status_to_skip guard) and skip/return early when it is null to avoid dereferencing it in papilo_post_solve_storage_->getNum() and in the undo call; ensure the guard is placed before the lines that construct post_solver and call post_solver.undo so undo() is never invoked with a null papilo_post_solve_storage_.

coderabbitai · 2026-02-09T11:41:14Z

cpp/tests/linear_programming/c_api_tests/c_api_test.c

+cuopt_int_t test_deterministic_bb(const char* filename,
+                                  cuopt_int_t num_runs,
+                                  cuopt_int_t num_threads,
+                                  cuopt_float_t time_limit,
+                                  cuopt_float_t work_limit)
+{
+  cuOptOptimizationProblem problem = NULL;
+  cuOptSolverSettings settings     = NULL;
+  cuopt_float_t first_objective    = 0.0;
+  cuopt_int_t first_status         = -1;
+  cuopt_int_t status;
+  cuopt_int_t run;
+
+  printf("Testing deterministic B&B: %s with %d threads, %d runs\n", filename, num_threads, num_runs);
+
+  status = cuOptReadProblem(filename, &problem);
+  if (status != CUOPT_SUCCESS) {
+    printf("Error reading problem: %d\n", status);
+    goto DONE;
+  }
+
+  status = cuOptCreateSolverSettings(&settings);
+  if (status != CUOPT_SUCCESS) {
+    printf("Error creating solver settings: %d\n", status);
+    goto DONE;
+  }
+
+  status = cuOptSetIntegerParameter(settings, CUOPT_MIP_DETERMINISM_MODE, CUOPT_MODE_DETERMINISTIC);
+  if (status != CUOPT_SUCCESS) {
+    printf("Error setting determinism mode: %d\n", status);
+    goto DONE;
+  }
+
+  status = cuOptSetIntegerParameter(settings, CUOPT_NUM_CPU_THREADS, num_threads);
+  if (status != CUOPT_SUCCESS) {
+    printf("Error setting num threads: %d\n", status);
+    goto DONE;
+  }
+
+  status = cuOptSetFloatParameter(settings, CUOPT_TIME_LIMIT, time_limit);
+  if (status != CUOPT_SUCCESS) {
+    printf("Error setting time limit: %d\n", status);
+    goto DONE;
+  }
+
+  status = cuOptSetFloatParameter(settings, CUOPT_WORK_LIMIT, work_limit);
+  if (status != CUOPT_SUCCESS) {
+    printf("Error setting work limit: %d\n", status);
+    goto DONE;
+  }
+
+  int seed = rand();
+  printf("Seed: %d\n", seed);
+
+  for (run = 0; run < num_runs; run++) {
+    cuOptSolution solution = NULL;
+    cuopt_float_t objective;
+    cuopt_int_t termination_status;
+
+    status = cuOptSetIntegerParameter(settings, CUOPT_RANDOM_SEED, seed);
+    if (status != CUOPT_SUCCESS) {
+      printf("Error setting seed: %d\n", status);
+      goto DONE;
+    }
+
+    status = cuOptSolve(problem, settings, &solution);
+    if (status != CUOPT_SUCCESS) {
+      printf("Error solving problem on run %d: %d\n", run, status);
+      cuOptDestroySolution(&solution);
+      goto DONE;
+    }
+
+    status = cuOptGetObjectiveValue(solution, &objective);
+    if (status != CUOPT_SUCCESS) {
+      printf("Error getting objective value on run %d: %d\n", run, status);
+      cuOptDestroySolution(&solution);
+      goto DONE;
+    }
+
+    status = cuOptGetTerminationStatus(solution, &termination_status);
+    if (status != CUOPT_SUCCESS) {
+      printf("Error getting termination status on run %d: %d\n", run, status);
+      cuOptDestroySolution(&solution);
+      goto DONE;
+    }
+
+    printf("Run %d: status=%s (%d), objective=%f\n",
+           run,
+           termination_status_to_string(termination_status),
+           termination_status,
+           objective);
+
+    if (run == 0) {
+      first_objective = objective;
+      first_status    = termination_status;
+    } else {
+      if (first_status != termination_status) {
+        printf("Determinism failure: run %d termination status %d differs from run 0 status %d\n",
+               run,
+               termination_status,
+               first_status);
+        status = CUOPT_VALIDATION_ERROR;
+        cuOptDestroySolution(&solution);
+        goto DONE;
+      }
+      if (first_objective != objective) {
+        printf("Determinism failure: run %d objective %f differs from run 0 objective %f\n",
+               run,
+               objective,
+               first_objective);
+        status = CUOPT_VALIDATION_ERROR;
+        cuOptDestroySolution(&solution);
+        goto DONE;
+      }
+    }
+    cuOptDestroySolution(&solution);
+  }


⚠️ Potential issue | 🟡 Minor

Ensure the determinism test also validates a “successful” solve.
Right now, consistent failure statuses (e.g., numerical error) would still pass the determinism check. Add a guard to fail early if the first run doesn’t produce a successful termination status.

🧪 Suggested enhancement

if (run == 0) { first_objective = objective; first_status = termination_status; + if (first_status == CUOPT_TERIMINATION_STATUS_NUMERICAL_ERROR || + first_status == CUOPT_TERIMINATION_STATUS_INFEASIBLE || + first_status == CUOPT_TERIMINATION_STATUS_UNBOUNDED) { + printf("Determinism test requires a successful solve; got status %d\n", first_status); + status = CUOPT_VALIDATION_ERROR; + cuOptDestroySolution(&solution); + goto DONE; + } } else {

As per coding guidelines: “Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)”.

🤖 Prompt for AI Agents

In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c` around lines 1461 - 1577, In test_deterministic_bb ensure the test fails early if the first run is not a successful solve: after capturing first_status in test_deterministic_bb (where first_status and first_objective are set), add a check that first_status equals the solver's success termination constant (e.g., CUOPT_TERMINATION_STATUS_OPTIMAL) and if not set status = CUOPT_VALIDATION_ERROR, print a clear message using termination_status_to_string(first_status), destroy solution and goto DONE; this guards against consistent but unsuccessful termination across runs and uses symbols first_status, first_objective, termination_status_to_string, CUOPT_VALIDATION_ERROR, and test_deterministic_bb to locate where to add the check.

aliceb-nv · 2026-02-09T11:52:35Z

/ok to test 7c92978

aliceb-nv · 2026-02-09T11:53:57Z

Benchmark results are as follows: (H100 Eos, 28 threads (hyperthreading))

Baseline Run:

    Feasible count: 224
    Gap <= 0.1%: 109
    Average gap: 0.132204
    Optimal (MIP gap < 1e-4): 43
    Average MIP gap: 0.486274
    Geomean MIP gap: 0.305156

Compared Run (determinism):

    Feasible count: 228
    Gap <= 0.1%: 110
    Average gap: 0.118907
    Optimal (MIP gap < 1e-4): 42
    Average MIP gap: 0.459080
    Geomean MIP gap: 0.296428

aliceb-nv · 2026-02-09T14:24:47Z

/ok to test 3cfc0c7

aliceb-nv · 2026-02-09T14:57:54Z

/ok to test ac0ca00

cpp/src/dual_simplex/branch_and_bound.cpp

cpp/src/dual_simplex/branch_and_bound.hpp

chris-maes · 2026-02-09T17:13:48Z

cpp/src/mip/solve.cu

  CUOPT_LOG_INFO("Objective offset %f scaling_factor %f",
                 problem.presolve_data.objective_offset,
                 problem.presolve_data.objective_scaling_factor);
+  CUOPT_LOG_INFO("Model fingerprint: 0x%x", problem.get_fingerprint());


Nice! Printing a fingerprint is very useful for users and developers, for debugging model changes.

chris-maes

Thanks for all the incredible hard work Alice. It's very exciting to see a deterministic multithreaded branch and bound. Next step: bringing in deterministic GPU heuristics.

Thanks as well for addressing my comments and explaining the concepts to me.

aliceb-nv · 2026-02-10T08:36:27Z

/ok to test 98b8ddd

rg20 · 2026-02-10T16:02:11Z

/merge

nguidotti and others added 30 commits January 9, 2026 11:56

added log

a36bf03

Merge branch 'node-queue' into diving-heuristics

d4c9d54

Merge branch 'diving-heuristics' into parallel-bnb-v2

b9a14bf

small refactor

7c5c996

code cleanup

5753de8

fix reporting frequency

421cbfd

fix style

6faeed0

added missing stl headers. fix incorrect round-robin.

d7046e3

initial working impl on small problens

eb6bc28

more instrumentation

d80667c

refactor to eliminate enum

14441d1

keep basis when possible

8ed172a

fix race condition in guided diving

89cc6de

reliability branching from NVIDIA#599

525f013

modfiied reliability branching to reuse basis from node

0fa76f1

replaced mutex with a vector of mutexes

0926349

restrict reliability branching to main thread

2c069c0

fixed unintialized pseudocost. added adaptive rule for the number of …

f043ac9

…LP iterations and reliable threshold for reliability branching.

re-enable reliability branching

24a3838

setting reliable_threshold to 1

ccca7fe

merge and fix nondeterministic node compare

2a16ee6

more logs, cache cpu clock

ef777c6

plunging for deterministic search

6dcb062

move to work_unit_scheduler to allow for mid-node syncs

35e8177

better debug printouts

e06af9f

improve ins_vector coverage; fix case where instrumeted mem accesses …

99e9ec2

…would not be accounted for when iterations aren't multiples for LOG_FEATURE_ITERATIONS

fix sync bug on termination

8fdbff8

revert disable heuristics

b6d7ecc

no presovle when determinsitic

668391e

restore nondeterminsitc codepath behavior

3b6d532

aliceb-nv added 2 commits February 8, 2026 21:26

fix nvcc build

80037ef

address review comments

f8021de

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

address AI comments

7c92978

aliceb-nv added 2 commits February 9, 2026 13:10

test no sync pruning

b099d81

remove redundant bounds strenghtening call in diving

3cfc0c7

aliceb-nv added 3 commits February 9, 2026 14:53

centralize pseudocost logic

a6f98b3

bump1

90ffa81

bump2

ac0ca00

update naming

c86d753