B&B CPU determinism#798
Conversation
…LP iterations and reliable threshold for reliability branching.
…would not be accounted for when iterations aren't multiples for LOG_FEATURE_ITERATIONS
|
/ok to test 80037ef |
@aliceb-nv, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test f8021de |
|
Thanks a lot for your reviewing work Chris! Regarding 1) - there is no global heap. All live nodes exist within a single worker at all times, and may be shuffled around workers if imbalances are detected. Nodes are fathomed when they're popped from the local worker heap and compared against the upper bound, and during syncs as well to avoid distributing useless nodes when doing work stealing. But I think this can be removed, I will run some tests and change if appropriate. I completely agree that the PR could use some more refactoring work to avoid duplication/scattering of code and logic. Let's discuss that during the sync👍 I tried to balance clean code structure with "getting it done" in time, but it's a difficult balance to get right For instrumented vectors: yes, I think this should be the correct approach. The ins_vectors were a "stopgap" made with the intention of phasing them out in time for proper work estimates. They mostly serve to establish a baseline for now, but ideally they will be all removed in time I've ran benchmarks on both the MIP solves and the miplib2017 root relaxations to make sure I didn't cause any regression; I am sending you the results via slack Thanks a ton again! |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/branch_and_bound.hpp`:
- Line 333: Rename the misspelled private method
determistic_collect_diving_solutions_and_update_psuedocosts to
deterministic_collect_diving_solutions_and_update_pseudocosts everywhere: update
the declaration (the method name in branch_and_bound.hpp), its
definition/implementation, and all call sites (including any unit tests or
comments) to use the corrected spelling; ensure any forward declarations or
references (e.g., in the class that owns this method) are updated so builds and
linkage remain consistent.
In `@cpp/src/dual_simplex/deterministic_workers.hpp`:
- Around line 370-383: enqueue_dive_node currently always pushes nodes even when
presolver reports infeasible; change the code to check the result of
this->node_presolver.bounds_strengthening(...) and early-return if it indicates
failure so we don't detach/copy or push the node into dive_queue. Specifically,
in enqueue_dive_node use the boolean return of bounds_strengthening (or adjust
to capture its failure signal) right after calling it and before calling
node->detach_copy() or dive_queue.push_back(...); if it returns false, simply
return without enqueuing.
In `@cpp/src/mip/presolve/third_party_presolve.cpp`:
- Around line 716-719: Add a null-guard for papilo_post_solve_storage_ before
constructing/using papilo::Postsolve and calling post_solver.undo: check if
papilo_post_solve_storage_ is null (similar to the existing status_to_skip
guard) and skip/return early when it is null to avoid dereferencing it in
papilo_post_solve_storage_->getNum() and in the undo call; ensure the guard is
placed before the lines that construct post_solver and call post_solver.undo so
undo() is never invoked with a null papilo_post_solve_storage_.
In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c`:
- Around line 1461-1577: In test_deterministic_bb ensure the test fails early if
the first run is not a successful solve: after capturing first_status in
test_deterministic_bb (where first_status and first_objective are set), add a
check that first_status equals the solver's success termination constant (e.g.,
CUOPT_TERMINATION_STATUS_OPTIMAL) and if not set status =
CUOPT_VALIDATION_ERROR, print a clear message using
termination_status_to_string(first_status), destroy solution and goto DONE; this
guards against consistent but unsuccessful termination across runs and uses
symbols first_status, first_objective, termination_status_to_string,
CUOPT_VALIDATION_ERROR, and test_deterministic_bb to locate where to add the
check.
🧹 Nitpick comments (8)
cpp/src/dual_simplex/triangle_solve.hpp (2)
101-117:U.x.byte_loadsundercounts by one diagonal read per active column.In each active column,
U_x[col_end](the diagonal, line 106) is read but not included innnz_processed(line 111 counts onlycol_end - col_start, the off-diagonal range). SoU.x.byte_loadsat line 117 misses onesizeof(f_t)per active column.This is the mirror of the slight overcount in
lower_triangular_solveforL.i. Neither affects determinism (the accounting is consistent across runs), but if you want accurate work-unit estimates, consider:Proposed fix
- nnz_processed += col_end - col_start; + nnz_processed += col_end - col_start + 1; // +1 for diagonal U_x[col_end]And split the
U.iaccounting if you want it exact (off-diagonal only):- U.i.byte_loads += nnz_processed * sizeof(i_t); - U.x.byte_loads += nnz_processed * sizeof(f_t); + // nnz_processed now includes diagonal; U_i is only indexed for off-diag + // For simplicity, the small overcount on U.i is acceptable. + U.i.byte_loads += nnz_processed * sizeof(i_t); + U.x.byte_loads += nnz_processed * sizeof(f_t);
18-24: Consider replacing macros withconstexprinline functions.
FLIP,UNFLIP,MARKED, andMARKare classic CSparse-style macros. Replacing them withconstexprinline functions would give type safety and avoid the usual macro pitfalls (double evaluation, namespace pollution). Low priority since these are well-established idioms.cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (2)
10-19: Duplicate#include <vector>.
<vector>is included on both Line 10 and Line 19.Proposed fix
`#include` <vector> `#include` <cuopt/linear_programming/constants.h> `#include` <cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh> `#include` <cuopt/linear_programming/utilities/internals.hpp> `#include` <raft/core/device_span.hpp> `#include` <rmm/device_uvector.hpp> -#include <vector>
87-87: Missing Doxygen comment forwork_limit.
determinism_mode(Line 112–120) andseed(Line 122–127) have Doxygen doc blocks, butwork_limitdoes not. For consistency and since this is a public-facing header, consider adding one.cpp/src/mip/solver.cu (1)
269-270: Remove commented-out debug line.Line 270 (
// context.work_unit_scheduler_.verbose = true;) is a debug artifact.cpp/src/mip/solve.cu (1)
171-173: Doublestd::getenvcall — store the result once.
std::getenv("CUOPT_MIP_HIDE_SOLUTION")is called twice. While harmless in practice, this is unnecessary and slightly fragile.Proposed fix
- int hidesol = - std::getenv("CUOPT_MIP_HIDE_SOLUTION") ? atoi(std::getenv("CUOPT_MIP_HIDE_SOLUTION")) : 0; + const char* hidesol_env = std::getenv("CUOPT_MIP_HIDE_SOLUTION"); + int hidesol = hidesol_env ? atoi(hidesol_env) : 0;benchmarks/linear_programming/cuopt/run_mip.cpp (1)
140-153: Consider a settings/config struct forrun_single_file.The function now has 13 parameters. Wrapping them in a struct would improve readability and reduce the chance of argument-ordering mistakes at call sites. Not blocking, just a suggestion for a future cleanup pass.
cpp/src/dual_simplex/bb_event.hpp (1)
46-73: Consider wideningevent_sequenceto avoid overflow in long runs.With large B&B trees, a 32-bit
intcan wrap and break deterministic ordering. Usingstd::uint64_t(and matching types in deterministic workers) would make this robust.🔧 Proposed change
- int event_sequence; + std::uint64_t event_sequence;
| void determinism_assign_diving_nodes(); | ||
|
|
||
| // Collect and merge diving solutions at sync | ||
| void determistic_collect_diving_solutions_and_update_psuedocosts(); |
There was a problem hiding this comment.
Two typos in method name: determistic → deterministic, psuedocosts → pseudocosts.
This is a private method declaration, so fixing it now is straightforward before it spreads to more call sites.
Proposed fix
- void determistic_collect_diving_solutions_and_update_psuedocosts();
+ void deterministic_collect_diving_solutions_and_update_pseudocosts();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void determistic_collect_diving_solutions_and_update_psuedocosts(); | |
| void deterministic_collect_diving_solutions_and_update_pseudocosts(); |
🤖 Prompt for AI Agents
In `@cpp/src/dual_simplex/branch_and_bound.hpp` at line 333, Rename the misspelled
private method determistic_collect_diving_solutions_and_update_psuedocosts to
deterministic_collect_diving_solutions_and_update_pseudocosts everywhere: update
the declaration (the method name in branch_and_bound.hpp), its
definition/implementation, and all call sites (including any unit tests or
comments) to use the corrected spelling; ensure any forward declarations or
references (e.g., in the class that owns this method) are updated so builds and
linkage remain consistent.
| void enqueue_dive_node(mip_node_t<i_t, f_t>* node, | ||
| const lp_problem_t<i_t, f_t>& original_lp, | ||
| const simplex_solver_settings_t<i_t, f_t>& settings) | ||
| { | ||
| dive_queue_entry_t<i_t, f_t> entry; | ||
| entry.resolved_lower = original_lp.lower; | ||
| entry.resolved_upper = original_lp.upper; | ||
| std::vector<bool> bounds_changed(original_lp.num_cols, false); | ||
| node->get_variable_bounds(entry.resolved_lower, entry.resolved_upper, bounds_changed); | ||
| this->node_presolver.bounds_strengthening( | ||
| settings, bounds_changed, entry.resolved_lower, entry.resolved_upper); | ||
| entry.node = node->detach_copy(); | ||
| dive_queue.push_back(std::move(entry)); | ||
| } |
There was a problem hiding this comment.
Skip enqueue when bounds strengthening reports infeasible.
Right now, infeasible nodes are still queued for diving, which wastes work. Consider early‑returning when bounds_strengthening(...) fails.
🔧 Proposed fix
- this->node_presolver.bounds_strengthening(
- settings, bounds_changed, entry.resolved_lower, entry.resolved_upper);
+ if (!this->node_presolver.bounds_strengthening(
+ settings, bounds_changed, entry.resolved_lower, entry.resolved_upper)) {
+ return;
+ }🤖 Prompt for AI Agents
In `@cpp/src/dual_simplex/deterministic_workers.hpp` around lines 370 - 383,
enqueue_dive_node currently always pushes nodes even when presolver reports
infeasible; change the code to check the result of
this->node_presolver.bounds_strengthening(...) and early-return if it indicates
failure so we don't detach/copy or push the node into dive_queue. Specifically,
in enqueue_dive_node use the boolean return of bounds_strengthening (or adjust
to capture its failure signal) right after calling it and before calling
node->detach_copy() or dive_queue.push_back(...); if it returns false, simply
return without enqueuing.
| papilo::Postsolve<f_t> post_solver{Msg, papilo_post_solve_storage_->getNum()}; | ||
|
|
||
| bool is_optimal = false; | ||
| auto status = post_solver.undo(reduced_sol, full_sol, papilo_post_solve_storage_, is_optimal); | ||
| auto status = post_solver.undo(reduced_sol, full_sol, *papilo_post_solve_storage_, is_optimal); |
There was a problem hiding this comment.
No null-guard before dereferencing papilo_post_solve_storage_.
If undo() is ever called before apply() (or after apply() returned nullopt due to infeasibility), papilo_post_solve_storage_ will be null and the dereference on Lines 716/719 will crash. Consider adding an early check, similar to the status_to_skip guard on Line 699.
Proposed defensive check
if (status_to_skip) { return; }
+ cuopt_expects(papilo_post_solve_storage_ != nullptr,
+ error_type_t::RuntimeError,
+ "Postsolve storage is not initialized; was presolve applied?");
std::vector<f_t> primal_sol_vec_h(primal_solution.size());🤖 Prompt for AI Agents
In `@cpp/src/mip/presolve/third_party_presolve.cpp` around lines 716 - 719, Add a
null-guard for papilo_post_solve_storage_ before constructing/using
papilo::Postsolve and calling post_solver.undo: check if
papilo_post_solve_storage_ is null (similar to the existing status_to_skip
guard) and skip/return early when it is null to avoid dereferencing it in
papilo_post_solve_storage_->getNum() and in the undo call; ensure the guard is
placed before the lines that construct post_solver and call post_solver.undo so
undo() is never invoked with a null papilo_post_solve_storage_.
| cuopt_int_t test_deterministic_bb(const char* filename, | ||
| cuopt_int_t num_runs, | ||
| cuopt_int_t num_threads, | ||
| cuopt_float_t time_limit, | ||
| cuopt_float_t work_limit) | ||
| { | ||
| cuOptOptimizationProblem problem = NULL; | ||
| cuOptSolverSettings settings = NULL; | ||
| cuopt_float_t first_objective = 0.0; | ||
| cuopt_int_t first_status = -1; | ||
| cuopt_int_t status; | ||
| cuopt_int_t run; | ||
|
|
||
| printf("Testing deterministic B&B: %s with %d threads, %d runs\n", filename, num_threads, num_runs); | ||
|
|
||
| status = cuOptReadProblem(filename, &problem); | ||
| if (status != CUOPT_SUCCESS) { | ||
| printf("Error reading problem: %d\n", status); | ||
| goto DONE; | ||
| } | ||
|
|
||
| status = cuOptCreateSolverSettings(&settings); | ||
| if (status != CUOPT_SUCCESS) { | ||
| printf("Error creating solver settings: %d\n", status); | ||
| goto DONE; | ||
| } | ||
|
|
||
| status = cuOptSetIntegerParameter(settings, CUOPT_MIP_DETERMINISM_MODE, CUOPT_MODE_DETERMINISTIC); | ||
| if (status != CUOPT_SUCCESS) { | ||
| printf("Error setting determinism mode: %d\n", status); | ||
| goto DONE; | ||
| } | ||
|
|
||
| status = cuOptSetIntegerParameter(settings, CUOPT_NUM_CPU_THREADS, num_threads); | ||
| if (status != CUOPT_SUCCESS) { | ||
| printf("Error setting num threads: %d\n", status); | ||
| goto DONE; | ||
| } | ||
|
|
||
| status = cuOptSetFloatParameter(settings, CUOPT_TIME_LIMIT, time_limit); | ||
| if (status != CUOPT_SUCCESS) { | ||
| printf("Error setting time limit: %d\n", status); | ||
| goto DONE; | ||
| } | ||
|
|
||
| status = cuOptSetFloatParameter(settings, CUOPT_WORK_LIMIT, work_limit); | ||
| if (status != CUOPT_SUCCESS) { | ||
| printf("Error setting work limit: %d\n", status); | ||
| goto DONE; | ||
| } | ||
|
|
||
| int seed = rand(); | ||
| printf("Seed: %d\n", seed); | ||
|
|
||
| for (run = 0; run < num_runs; run++) { | ||
| cuOptSolution solution = NULL; | ||
| cuopt_float_t objective; | ||
| cuopt_int_t termination_status; | ||
|
|
||
| status = cuOptSetIntegerParameter(settings, CUOPT_RANDOM_SEED, seed); | ||
| if (status != CUOPT_SUCCESS) { | ||
| printf("Error setting seed: %d\n", status); | ||
| goto DONE; | ||
| } | ||
|
|
||
| status = cuOptSolve(problem, settings, &solution); | ||
| if (status != CUOPT_SUCCESS) { | ||
| printf("Error solving problem on run %d: %d\n", run, status); | ||
| cuOptDestroySolution(&solution); | ||
| goto DONE; | ||
| } | ||
|
|
||
| status = cuOptGetObjectiveValue(solution, &objective); | ||
| if (status != CUOPT_SUCCESS) { | ||
| printf("Error getting objective value on run %d: %d\n", run, status); | ||
| cuOptDestroySolution(&solution); | ||
| goto DONE; | ||
| } | ||
|
|
||
| status = cuOptGetTerminationStatus(solution, &termination_status); | ||
| if (status != CUOPT_SUCCESS) { | ||
| printf("Error getting termination status on run %d: %d\n", run, status); | ||
| cuOptDestroySolution(&solution); | ||
| goto DONE; | ||
| } | ||
|
|
||
| printf("Run %d: status=%s (%d), objective=%f\n", | ||
| run, | ||
| termination_status_to_string(termination_status), | ||
| termination_status, | ||
| objective); | ||
|
|
||
| if (run == 0) { | ||
| first_objective = objective; | ||
| first_status = termination_status; | ||
| } else { | ||
| if (first_status != termination_status) { | ||
| printf("Determinism failure: run %d termination status %d differs from run 0 status %d\n", | ||
| run, | ||
| termination_status, | ||
| first_status); | ||
| status = CUOPT_VALIDATION_ERROR; | ||
| cuOptDestroySolution(&solution); | ||
| goto DONE; | ||
| } | ||
| if (first_objective != objective) { | ||
| printf("Determinism failure: run %d objective %f differs from run 0 objective %f\n", | ||
| run, | ||
| objective, | ||
| first_objective); | ||
| status = CUOPT_VALIDATION_ERROR; | ||
| cuOptDestroySolution(&solution); | ||
| goto DONE; | ||
| } | ||
| } | ||
| cuOptDestroySolution(&solution); | ||
| } |
There was a problem hiding this comment.
Ensure the determinism test also validates a “successful” solve.
Right now, consistent failure statuses (e.g., numerical error) would still pass the determinism check. Add a guard to fail early if the first run doesn’t produce a successful termination status.
🧪 Suggested enhancement
if (run == 0) {
first_objective = objective;
first_status = termination_status;
+ if (first_status == CUOPT_TERIMINATION_STATUS_NUMERICAL_ERROR ||
+ first_status == CUOPT_TERIMINATION_STATUS_INFEASIBLE ||
+ first_status == CUOPT_TERIMINATION_STATUS_UNBOUNDED) {
+ printf("Determinism test requires a successful solve; got status %d\n", first_status);
+ status = CUOPT_VALIDATION_ERROR;
+ cuOptDestroySolution(&solution);
+ goto DONE;
+ }
} else {🤖 Prompt for AI Agents
In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c` around lines 1461 -
1577, In test_deterministic_bb ensure the test fails early if the first run is
not a successful solve: after capturing first_status in test_deterministic_bb
(where first_status and first_objective are set), add a check that first_status
equals the solver's success termination constant (e.g.,
CUOPT_TERMINATION_STATUS_OPTIMAL) and if not set status =
CUOPT_VALIDATION_ERROR, print a clear message using
termination_status_to_string(first_status), destroy solution and goto DONE; this
guards against consistent but unsuccessful termination across runs and uses
symbols first_status, first_objective, termination_status_to_string,
CUOPT_VALIDATION_ERROR, and test_deterministic_bb to locate where to add the
check.
|
/ok to test 7c92978 |
|
Benchmark results are as follows: ( Baseline Run: Compared Run (determinism): |
|
/ok to test 3cfc0c7 |
|
/ok to test ac0ca00 |
| CUOPT_LOG_INFO("Objective offset %f scaling_factor %f", | ||
| problem.presolve_data.objective_offset, | ||
| problem.presolve_data.objective_scaling_factor); | ||
| CUOPT_LOG_INFO("Model fingerprint: 0x%x", problem.get_fingerprint()); |
There was a problem hiding this comment.
Nice! Printing a fingerprint is very useful for users and developers, for debugging model changes.
chris-maes
left a comment
There was a problem hiding this comment.
Thanks for all the incredible hard work Alice. It's very exciting to see a deterministic multithreaded branch and bound. Next step: bringing in deterministic GPU heuristics.
Thanks as well for addressing my comments and explaining the concepts to me.
|
/ok to test 98b8ddd |
|
/merge |
76d154c
into
NVIDIA:release/26.02
This PR introduces a deterministic execution mode for the parallel branch-and-bound MIP solver.
When
determinism_mode = CUOPT_MODE_DETERMINISTIC, the solver guarantees bitwise-identical results across runs regardless of thread scheduling variations, so long as the runs are on the same platform with the same environment (= same glibc, etc...)The approach that was chosen is referred to as Bulk Synchronous Parallel (BSP) in the literature. This is an execution model where computation proceeds in discrete horizons of virtual time (=work units) separated by synchronization barriers.
To cope with the inherent nondeterminism of wall-clock-time due to factors such as caching state, OS scheduling, CPU throttling...
instead of time, progress is measured in "work units" approximating the execution time of an algorithm using known and deterministic factors, such as memory operations, problem features and sparsity...
Workers explore the tree independently and locally within a horizon, and collect events (branching decisions, solutions, pseudo-cost updates) during the process without exchanging them yet.
At each horizon boundary, events are sorted by work-unit timestamp with tie-breaking, history is replayed in deterministic order to update global state, and if need be nodes are redistributed across workers to balance load (which replaces the ramp-up mechanism).
Workers operate on local snapshots of shared state (pseudo-costs, upper bound, LP iteration count) taken at horizon start to avoid read races. This trades some accuracy in decisions (pruning may rely on a higher lower bound than in nondeterminsitic mode), for determinism.
Support for start CPU heuristics is present, based on a "producer" model: CPUFJ starts immediately and begins producing solutions with work unit timestamps, that are produced by the B&B thread once the appropriate time is reached. A form of one-way synchronization is implemented to prevent the CPUFJ thread from "falling behind" and producing solutiosn in the past from the perspective of the B&B thread. CPUFJ is allowed to run ahead, and is biased to run faster than B&B to avoid unnecessary syncs.
B&B BFS workers, diving workers, and a start CPUFJ thread are supported in deterministic mode. GPU Heuristics are disabled in this mode and will be incorporated into the deterministic framework in the future.
Benchmark results
B&B alone, 10min
Regression testing again main branch:
With changes: 228 feasible, 11.9% primal gap
Baseline: 224 feasible, 13.2% primal gap (difference likely due to indeterminism/noise)
Summary by CodeRabbit
New Features
Infrastructure
Tests