Skip to content

B&B CPU determinism#798

Merged
rapids-bot[bot] merged 445 commits intoNVIDIA:release/26.02from
aliceb-nv:determinism
Feb 10, 2026
Merged

B&B CPU determinism#798
rapids-bot[bot] merged 445 commits intoNVIDIA:release/26.02from
aliceb-nv:determinism

Conversation

@aliceb-nv
Copy link
Contributor

@aliceb-nv aliceb-nv commented Jan 26, 2026

This PR introduces a deterministic execution mode for the parallel branch-and-bound MIP solver.
When determinism_mode = CUOPT_MODE_DETERMINISTIC, the solver guarantees bitwise-identical results across runs regardless of thread scheduling variations, so long as the runs are on the same platform with the same environment (= same glibc, etc...)

The approach that was chosen is referred to as Bulk Synchronous Parallel (BSP) in the literature. This is an execution model where computation proceeds in discrete horizons of virtual time (=work units) separated by synchronization barriers.
To cope with the inherent nondeterminism of wall-clock-time due to factors such as caching state, OS scheduling, CPU throttling...
instead of time, progress is measured in "work units" approximating the execution time of an algorithm using known and deterministic factors, such as memory operations, problem features and sparsity...

Workers explore the tree independently and locally within a horizon, and collect events (branching decisions, solutions, pseudo-cost updates) during the process without exchanging them yet.
At each horizon boundary, events are sorted by work-unit timestamp with tie-breaking, history is replayed in deterministic order to update global state, and if need be nodes are redistributed across workers to balance load (which replaces the ramp-up mechanism).
Workers operate on local snapshots of shared state (pseudo-costs, upper bound, LP iteration count) taken at horizon start to avoid read races. This trades some accuracy in decisions (pruning may rely on a higher lower bound than in nondeterminsitic mode), for determinism.

Support for start CPU heuristics is present, based on a "producer" model: CPUFJ starts immediately and begins producing solutions with work unit timestamps, that are produced by the B&B thread once the appropriate time is reached. A form of one-way synchronization is implemented to prevent the CPUFJ thread from "falling behind" and producing solutiosn in the past from the perspective of the B&B thread. CPUFJ is allowed to run ahead, and is biased to run faster than B&B to avoid unnecessary syncs.

B&B BFS workers, diving workers, and a start CPUFJ thread are supported in deterministic mode. GPU Heuristics are disabled in this mode and will be incorporated into the deterministic framework in the future.

Benchmark results

B&B alone, 10min

  • Nondeterministic B&B alone (default): 174 feasible, 38.3% primal gap
  • Deterministic B&B alone: 167 feasible, 40.3% primal gap
  • Deterministic B&B+CPUFJ: 184 feasible, 35.9% primal gap

Regression testing again main branch:
With changes: 228 feasible, 11.9% primal gap
Baseline: 224 feasible, 13.2% primal gap (difference likely due to indeterminism/noise)

Summary by CodeRabbit

  • New Features

    • Work-limit control, a deterministic solve mode, and configurable random seed; new CLI flags to set work-limit and enable determinism.
  • Infrastructure

    • Enhanced profiling, memory and timing instrumentation; work-unit synchronization and scheduler added to coordinate deterministic runs; deterministic B&B/workers and replay scaffolding integrated.
  • Tests

    • New deterministic reproducibility tests ensuring consistent termination, objectives, and solutions across repeated runs.

nguidotti and others added 30 commits January 9, 2026 11:56
…LP iterations and reliable threshold for reliability branching.
…would not be accounted for when iterations aren't multiples for LOG_FEATURE_ITERATIONS
@aliceb-nv
Copy link
Contributor Author

/ok to test 80037ef

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 9, 2026

/ok to test 80037ef

@aliceb-nv, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@aliceb-nv
Copy link
Contributor Author

/ok to test f8021de

@aliceb-nv
Copy link
Contributor Author

Thanks a lot for your reviewing work Chris!

Regarding 1) - there is no global heap. All live nodes exist within a single worker at all times, and may be shuffled around workers if imbalances are detected. Nodes are fathomed when they're popped from the local worker heap and compared against the upper bound, and during syncs as well to avoid distributing useless nodes when doing work stealing. But I think this can be removed, I will run some tests and change if appropriate.

I completely agree that the PR could use some more refactoring work to avoid duplication/scattering of code and logic. Let's discuss that during the sync👍 I tried to balance clean code structure with "getting it done" in time, but it's a difficult balance to get right

For instrumented vectors: yes, I think this should be the correct approach. The ins_vectors were a "stopgap" made with the intention of phasing them out in time for proper work estimates. They mostly serve to establish a baseline for now, but ideally they will be all removed in time

I've ran benchmarks on both the MIP solves and the miplib2017 root relaxations to make sure I didn't cause any regression; I am sending you the results via slack

Thanks a ton again!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/branch_and_bound.hpp`:
- Line 333: Rename the misspelled private method
determistic_collect_diving_solutions_and_update_psuedocosts to
deterministic_collect_diving_solutions_and_update_pseudocosts everywhere: update
the declaration (the method name in branch_and_bound.hpp), its
definition/implementation, and all call sites (including any unit tests or
comments) to use the corrected spelling; ensure any forward declarations or
references (e.g., in the class that owns this method) are updated so builds and
linkage remain consistent.

In `@cpp/src/dual_simplex/deterministic_workers.hpp`:
- Around line 370-383: enqueue_dive_node currently always pushes nodes even when
presolver reports infeasible; change the code to check the result of
this->node_presolver.bounds_strengthening(...) and early-return if it indicates
failure so we don't detach/copy or push the node into dive_queue. Specifically,
in enqueue_dive_node use the boolean return of bounds_strengthening (or adjust
to capture its failure signal) right after calling it and before calling
node->detach_copy() or dive_queue.push_back(...); if it returns false, simply
return without enqueuing.

In `@cpp/src/mip/presolve/third_party_presolve.cpp`:
- Around line 716-719: Add a null-guard for papilo_post_solve_storage_ before
constructing/using papilo::Postsolve and calling post_solver.undo: check if
papilo_post_solve_storage_ is null (similar to the existing status_to_skip
guard) and skip/return early when it is null to avoid dereferencing it in
papilo_post_solve_storage_->getNum() and in the undo call; ensure the guard is
placed before the lines that construct post_solver and call post_solver.undo so
undo() is never invoked with a null papilo_post_solve_storage_.

In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c`:
- Around line 1461-1577: In test_deterministic_bb ensure the test fails early if
the first run is not a successful solve: after capturing first_status in
test_deterministic_bb (where first_status and first_objective are set), add a
check that first_status equals the solver's success termination constant (e.g.,
CUOPT_TERMINATION_STATUS_OPTIMAL) and if not set status =
CUOPT_VALIDATION_ERROR, print a clear message using
termination_status_to_string(first_status), destroy solution and goto DONE; this
guards against consistent but unsuccessful termination across runs and uses
symbols first_status, first_objective, termination_status_to_string,
CUOPT_VALIDATION_ERROR, and test_deterministic_bb to locate where to add the
check.
🧹 Nitpick comments (8)
cpp/src/dual_simplex/triangle_solve.hpp (2)

101-117: U.x.byte_loads undercounts by one diagonal read per active column.

In each active column, U_x[col_end] (the diagonal, line 106) is read but not included in nnz_processed (line 111 counts only col_end - col_start, the off-diagonal range). So U.x.byte_loads at line 117 misses one sizeof(f_t) per active column.

This is the mirror of the slight overcount in lower_triangular_solve for L.i. Neither affects determinism (the accounting is consistent across runs), but if you want accurate work-unit estimates, consider:

Proposed fix
-      nnz_processed += col_end - col_start;
+      nnz_processed += col_end - col_start + 1;  // +1 for diagonal U_x[col_end]

And split the U.i accounting if you want it exact (off-diagonal only):

-  U.i.byte_loads += nnz_processed * sizeof(i_t);
-  U.x.byte_loads += nnz_processed * sizeof(f_t);
+  // nnz_processed now includes diagonal; U_i is only indexed for off-diag
+  // For simplicity, the small overcount on U.i is acceptable.
+  U.i.byte_loads += nnz_processed * sizeof(i_t);
+  U.x.byte_loads += nnz_processed * sizeof(f_t);

18-24: Consider replacing macros with constexpr inline functions.

FLIP, UNFLIP, MARKED, and MARK are classic CSparse-style macros. Replacing them with constexpr inline functions would give type safety and avoid the usual macro pitfalls (double evaluation, namespace pollution). Low priority since these are well-established idioms.

cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (2)

10-19: Duplicate #include <vector>.

<vector> is included on both Line 10 and Line 19.

Proposed fix
 `#include` <vector>
 
 `#include` <cuopt/linear_programming/constants.h>
 `#include` <cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh>
 `#include` <cuopt/linear_programming/utilities/internals.hpp>
 
 `#include` <raft/core/device_span.hpp>
 `#include` <rmm/device_uvector.hpp>
 
-#include <vector>

87-87: Missing Doxygen comment for work_limit.

determinism_mode (Line 112–120) and seed (Line 122–127) have Doxygen doc blocks, but work_limit does not. For consistency and since this is a public-facing header, consider adding one.

cpp/src/mip/solver.cu (1)

269-270: Remove commented-out debug line.

Line 270 (// context.work_unit_scheduler_.verbose = true;) is a debug artifact.

cpp/src/mip/solve.cu (1)

171-173: Double std::getenv call — store the result once.

std::getenv("CUOPT_MIP_HIDE_SOLUTION") is called twice. While harmless in practice, this is unnecessary and slightly fragile.

Proposed fix
-  int hidesol =
-    std::getenv("CUOPT_MIP_HIDE_SOLUTION") ? atoi(std::getenv("CUOPT_MIP_HIDE_SOLUTION")) : 0;
+  const char* hidesol_env = std::getenv("CUOPT_MIP_HIDE_SOLUTION");
+  int hidesol = hidesol_env ? atoi(hidesol_env) : 0;
benchmarks/linear_programming/cuopt/run_mip.cpp (1)

140-153: Consider a settings/config struct for run_single_file.

The function now has 13 parameters. Wrapping them in a struct would improve readability and reduce the chance of argument-ordering mistakes at call sites. Not blocking, just a suggestion for a future cleanup pass.

cpp/src/dual_simplex/bb_event.hpp (1)

46-73: Consider widening event_sequence to avoid overflow in long runs.

With large B&B trees, a 32-bit int can wrap and break deterministic ordering. Using std::uint64_t (and matching types in deterministic workers) would make this robust.

🔧 Proposed change
-  int event_sequence;
+  std::uint64_t event_sequence;

void determinism_assign_diving_nodes();

// Collect and merge diving solutions at sync
void determistic_collect_diving_solutions_and_update_psuedocosts();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Two typos in method name: determisticdeterministic, psuedocostspseudocosts.

This is a private method declaration, so fixing it now is straightforward before it spreads to more call sites.

Proposed fix
-  void determistic_collect_diving_solutions_and_update_psuedocosts();
+  void deterministic_collect_diving_solutions_and_update_pseudocosts();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void determistic_collect_diving_solutions_and_update_psuedocosts();
void deterministic_collect_diving_solutions_and_update_pseudocosts();
🤖 Prompt for AI Agents
In `@cpp/src/dual_simplex/branch_and_bound.hpp` at line 333, Rename the misspelled
private method determistic_collect_diving_solutions_and_update_psuedocosts to
deterministic_collect_diving_solutions_and_update_pseudocosts everywhere: update
the declaration (the method name in branch_and_bound.hpp), its
definition/implementation, and all call sites (including any unit tests or
comments) to use the corrected spelling; ensure any forward declarations or
references (e.g., in the class that owns this method) are updated so builds and
linkage remain consistent.

Comment on lines 370 to 383
void enqueue_dive_node(mip_node_t<i_t, f_t>* node,
const lp_problem_t<i_t, f_t>& original_lp,
const simplex_solver_settings_t<i_t, f_t>& settings)
{
dive_queue_entry_t<i_t, f_t> entry;
entry.resolved_lower = original_lp.lower;
entry.resolved_upper = original_lp.upper;
std::vector<bool> bounds_changed(original_lp.num_cols, false);
node->get_variable_bounds(entry.resolved_lower, entry.resolved_upper, bounds_changed);
this->node_presolver.bounds_strengthening(
settings, bounds_changed, entry.resolved_lower, entry.resolved_upper);
entry.node = node->detach_copy();
dive_queue.push_back(std::move(entry));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Skip enqueue when bounds strengthening reports infeasible.

Right now, infeasible nodes are still queued for diving, which wastes work. Consider early‑returning when bounds_strengthening(...) fails.

🔧 Proposed fix
-    this->node_presolver.bounds_strengthening(
-      settings, bounds_changed, entry.resolved_lower, entry.resolved_upper);
+    if (!this->node_presolver.bounds_strengthening(
+          settings, bounds_changed, entry.resolved_lower, entry.resolved_upper)) {
+      return;
+    }
🤖 Prompt for AI Agents
In `@cpp/src/dual_simplex/deterministic_workers.hpp` around lines 370 - 383,
enqueue_dive_node currently always pushes nodes even when presolver reports
infeasible; change the code to check the result of
this->node_presolver.bounds_strengthening(...) and early-return if it indicates
failure so we don't detach/copy or push the node into dive_queue. Specifically,
in enqueue_dive_node use the boolean return of bounds_strengthening (or adjust
to capture its failure signal) right after calling it and before calling
node->detach_copy() or dive_queue.push_back(...); if it returns false, simply
return without enqueuing.

Comment on lines +716 to +719
papilo::Postsolve<f_t> post_solver{Msg, papilo_post_solve_storage_->getNum()};

bool is_optimal = false;
auto status = post_solver.undo(reduced_sol, full_sol, papilo_post_solve_storage_, is_optimal);
auto status = post_solver.undo(reduced_sol, full_sol, *papilo_post_solve_storage_, is_optimal);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

No null-guard before dereferencing papilo_post_solve_storage_.

If undo() is ever called before apply() (or after apply() returned nullopt due to infeasibility), papilo_post_solve_storage_ will be null and the dereference on Lines 716/719 will crash. Consider adding an early check, similar to the status_to_skip guard on Line 699.

Proposed defensive check
   if (status_to_skip) { return; }
+  cuopt_expects(papilo_post_solve_storage_ != nullptr,
+                error_type_t::RuntimeError,
+                "Postsolve storage is not initialized; was presolve applied?");
   std::vector<f_t> primal_sol_vec_h(primal_solution.size());
🤖 Prompt for AI Agents
In `@cpp/src/mip/presolve/third_party_presolve.cpp` around lines 716 - 719, Add a
null-guard for papilo_post_solve_storage_ before constructing/using
papilo::Postsolve and calling post_solver.undo: check if
papilo_post_solve_storage_ is null (similar to the existing status_to_skip
guard) and skip/return early when it is null to avoid dereferencing it in
papilo_post_solve_storage_->getNum() and in the undo call; ensure the guard is
placed before the lines that construct post_solver and call post_solver.undo so
undo() is never invoked with a null papilo_post_solve_storage_.

Comment on lines 1461 to 1577
cuopt_int_t test_deterministic_bb(const char* filename,
cuopt_int_t num_runs,
cuopt_int_t num_threads,
cuopt_float_t time_limit,
cuopt_float_t work_limit)
{
cuOptOptimizationProblem problem = NULL;
cuOptSolverSettings settings = NULL;
cuopt_float_t first_objective = 0.0;
cuopt_int_t first_status = -1;
cuopt_int_t status;
cuopt_int_t run;

printf("Testing deterministic B&B: %s with %d threads, %d runs\n", filename, num_threads, num_runs);

status = cuOptReadProblem(filename, &problem);
if (status != CUOPT_SUCCESS) {
printf("Error reading problem: %d\n", status);
goto DONE;
}

status = cuOptCreateSolverSettings(&settings);
if (status != CUOPT_SUCCESS) {
printf("Error creating solver settings: %d\n", status);
goto DONE;
}

status = cuOptSetIntegerParameter(settings, CUOPT_MIP_DETERMINISM_MODE, CUOPT_MODE_DETERMINISTIC);
if (status != CUOPT_SUCCESS) {
printf("Error setting determinism mode: %d\n", status);
goto DONE;
}

status = cuOptSetIntegerParameter(settings, CUOPT_NUM_CPU_THREADS, num_threads);
if (status != CUOPT_SUCCESS) {
printf("Error setting num threads: %d\n", status);
goto DONE;
}

status = cuOptSetFloatParameter(settings, CUOPT_TIME_LIMIT, time_limit);
if (status != CUOPT_SUCCESS) {
printf("Error setting time limit: %d\n", status);
goto DONE;
}

status = cuOptSetFloatParameter(settings, CUOPT_WORK_LIMIT, work_limit);
if (status != CUOPT_SUCCESS) {
printf("Error setting work limit: %d\n", status);
goto DONE;
}

int seed = rand();
printf("Seed: %d\n", seed);

for (run = 0; run < num_runs; run++) {
cuOptSolution solution = NULL;
cuopt_float_t objective;
cuopt_int_t termination_status;

status = cuOptSetIntegerParameter(settings, CUOPT_RANDOM_SEED, seed);
if (status != CUOPT_SUCCESS) {
printf("Error setting seed: %d\n", status);
goto DONE;
}

status = cuOptSolve(problem, settings, &solution);
if (status != CUOPT_SUCCESS) {
printf("Error solving problem on run %d: %d\n", run, status);
cuOptDestroySolution(&solution);
goto DONE;
}

status = cuOptGetObjectiveValue(solution, &objective);
if (status != CUOPT_SUCCESS) {
printf("Error getting objective value on run %d: %d\n", run, status);
cuOptDestroySolution(&solution);
goto DONE;
}

status = cuOptGetTerminationStatus(solution, &termination_status);
if (status != CUOPT_SUCCESS) {
printf("Error getting termination status on run %d: %d\n", run, status);
cuOptDestroySolution(&solution);
goto DONE;
}

printf("Run %d: status=%s (%d), objective=%f\n",
run,
termination_status_to_string(termination_status),
termination_status,
objective);

if (run == 0) {
first_objective = objective;
first_status = termination_status;
} else {
if (first_status != termination_status) {
printf("Determinism failure: run %d termination status %d differs from run 0 status %d\n",
run,
termination_status,
first_status);
status = CUOPT_VALIDATION_ERROR;
cuOptDestroySolution(&solution);
goto DONE;
}
if (first_objective != objective) {
printf("Determinism failure: run %d objective %f differs from run 0 objective %f\n",
run,
objective,
first_objective);
status = CUOPT_VALIDATION_ERROR;
cuOptDestroySolution(&solution);
goto DONE;
}
}
cuOptDestroySolution(&solution);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ensure the determinism test also validates a “successful” solve.
Right now, consistent failure statuses (e.g., numerical error) would still pass the determinism check. Add a guard to fail early if the first run doesn’t produce a successful termination status.

🧪 Suggested enhancement
     if (run == 0) {
       first_objective = objective;
       first_status    = termination_status;
+      if (first_status == CUOPT_TERIMINATION_STATUS_NUMERICAL_ERROR ||
+          first_status == CUOPT_TERIMINATION_STATUS_INFEASIBLE ||
+          first_status == CUOPT_TERIMINATION_STATUS_UNBOUNDED) {
+        printf("Determinism test requires a successful solve; got status %d\n", first_status);
+        status = CUOPT_VALIDATION_ERROR;
+        cuOptDestroySolution(&solution);
+        goto DONE;
+      }
     } else {
As per coding guidelines: “Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)”.
🤖 Prompt for AI Agents
In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c` around lines 1461 -
1577, In test_deterministic_bb ensure the test fails early if the first run is
not a successful solve: after capturing first_status in test_deterministic_bb
(where first_status and first_objective are set), add a check that first_status
equals the solver's success termination constant (e.g.,
CUOPT_TERMINATION_STATUS_OPTIMAL) and if not set status =
CUOPT_VALIDATION_ERROR, print a clear message using
termination_status_to_string(first_status), destroy solution and goto DONE; this
guards against consistent but unsuccessful termination across runs and uses
symbols first_status, first_objective, termination_status_to_string,
CUOPT_VALIDATION_ERROR, and test_deterministic_bb to locate where to add the
check.

@aliceb-nv
Copy link
Contributor Author

/ok to test 7c92978

@aliceb-nv
Copy link
Contributor Author

Benchmark results are as follows: (H100 Eos, 28 threads (hyperthreading))

Baseline Run:

    Feasible count: 224
    Gap <= 0.1%: 109
    Average gap: 0.132204
    Optimal (MIP gap < 1e-4): 43
    Average MIP gap: 0.486274
    Geomean MIP gap: 0.305156

Compared Run (determinism):

    Feasible count: 228
    Gap <= 0.1%: 110
    Average gap: 0.118907
    Optimal (MIP gap < 1e-4): 42
    Average MIP gap: 0.459080
    Geomean MIP gap: 0.296428

@aliceb-nv
Copy link
Contributor Author

/ok to test 3cfc0c7

@aliceb-nv
Copy link
Contributor Author

/ok to test ac0ca00

CUOPT_LOG_INFO("Objective offset %f scaling_factor %f",
problem.presolve_data.objective_offset,
problem.presolve_data.objective_scaling_factor);
CUOPT_LOG_INFO("Model fingerprint: 0x%x", problem.get_fingerprint());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Printing a fingerprint is very useful for users and developers, for debugging model changes.

Copy link
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the incredible hard work Alice. It's very exciting to see a deterministic multithreaded branch and bound. Next step: bringing in deterministic GPU heuristics.

Thanks as well for addressing my comments and explaining the concepts to me.

@aliceb-nv
Copy link
Contributor Author

/ok to test 98b8ddd

@rg20
Copy link
Contributor

rg20 commented Feb 10, 2026

/merge

@rapids-bot rapids-bot bot merged commit 76d154c into NVIDIA:release/26.02 Feb 10, 2026
165 of 169 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants