Skip to content

Updated CRV coincidence cluster finder#1772

Merged
oksuzian merged 10 commits intoMu2e:mainfrom
ehrlich-uva:CrvCalibration
Mar 29, 2026
Merged

Updated CRV coincidence cluster finder#1772
oksuzian merged 10 commits intoMu2e:mainfrom
ehrlich-uva:CrvCalibration

Conversation

@ehrlich-uva
Copy link
Copy Markdown
Contributor

@ehrlich-uva ehrlich-uva commented Mar 19, 2026

Update cluster finder algorithm.

@FNALbuild
Copy link
Copy Markdown
Collaborator

Hi @ehrlich-uva,
You have proposed changes to files in these packages:

  • CRVReco
  • RecoDataProducts

which require these tests: build.

@Mu2e/write, @Mu2e/fnalbuild-users have access to CI actions on main.

⌛ The following tests have been triggered for 98c3b17: build (Build queue - API unavailable)

About FNALbuild. Code review on Mu2e/Offline.

@FNALbuild
Copy link
Copy Markdown
Collaborator

☀️ The build tests passed at 98c3b17.

Test Result Details
test with Command did not list any other PRs to include
merge Merged 98c3b17 at c570352
build (prof) Log file. Build time: 04 min 05 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 1.
check_cmake Log file.
FIXME, TODO TODO (0) FIXME (0) in 3 files
clang-tidy ➡️ 2 errors 76 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 98c3b17 after being merged into the base branch at c570352.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian
Copy link
Copy Markdown
Collaborator

I'll review this pull request by examining its details and the code changes.Here is my detailed review of PR #1772. This is a significant refactoring of the CRV coincidence cluster finder that changes the algorithm for pairing hits from both readout sides of CRV counters. Below are the issues I found, organized by severity.


🔴 Bugs / Potential Correctness Issues

1. Operator precedence bug in clusterProperties — division vs. map access

In CRVReco/src/CrvCoincidenceFinder_module.cc, the new code computes PE-weighted average times like this:

double hitTime0=hitTimes[0].begin()->second/hitPEs[0].begin()->second;
double hitTime1=hitTimes[1].rbegin()->second/hitPEs[1].rbegin()->second;

This divides only the ->second values, which is correct in terms of C++ operator precedence. However, the hitPEs and hitTimes maps are keyed by float (longitudinal position), and entries are accumulated via +=. The code picks begin() for side 0 and rbegin() for side 1 to find "the longitudinal positions farthest away from each other." But if multiple counters contribute to the same readout side at different longitudinal positions, only the first/last map entry is used — the PE-weighted time from other entries is silently ignored. The comment even acknowledges this ("the situation is actually overconstraint, we ignore the other positions"). This could yield incorrect avgHitTime and avgHitPos for clusters spanning multiple counters on the same side.

2. filterHits now only compares hits from the same readout side when computing PE sums for adjacent counters

In the old code, filterHits iterated over all hits (both SiPM sides) to accumulate PEs from adjacent counters. In the new code, filterHits is called separately per readout side, and iterHitAdjacent iterates over the same hitsFiltered list (which is side-specific). This means the PE sum for the PE threshold check now only includes hits from one readout side. If the threshold logic previously relied on combining PEs from both sides of the same counter, this could reject hits that should pass.

3. Using float as std::map key for hitPEs and hitTimes

std::array<std::map<float,double>, CRVId::nSidesPerBar> hitPEs;
std::array<std::map<float,double>, CRVId::nSidesPerBar> hitTimes;

Using float as a map key is fragile because floating-point equality comparisons can fail due to precision. Two hits that should logically be at the same longitudinal position (same counter, same side) could end up in different map entries if there are tiny floating-point rounding differences in the computation longitudinalPos = counterPos[lengthDirection] ± halfLength. This would corrupt the PE-weighted time averaging.

4. Potential dangling reference: CrvCoincidence.hh still referenced in CalPatRec/inc/HlPrint.hh

The PR deletes RecoDataProducts/inc/CrvCoincidence.hh and removes it from classes.h and classes_def.xml. However, CalPatRec/inc/HlPrint.hh still has a forward declaration:

  class CrvCoincidence;

While a forward declaration alone won't cause a build error (unless it's used somewhere), this is dead code that references a now-deleted class. If any downstream code tries to #include and use CrvCoincidence, it will fail. A search shows this reference may appear in other files too.


🟡 Logic / Design Concerns

5. One-side-only fallback can overwrite with the wrong value

In the single-readout-side fallback:

if(!hitTimes[0].empty())
{
  // ... computes avgHitTime from side 0
}
if(!hitTimes[1].empty())
{
  // ... computes avgHitTime from side 1 — overwrites the side-0 result!
}

Both if blocks are independent (not else if). If somehow both sides have hits but the earlier two-side check failed (because hitPosOrigin was outside the counter), execution falls through to the else branch (one-side fallback), where both if blocks execute. The second if silently overwrites avgHitTime computed from side 0. This should likely be else if to use only one side, or the logic should be restructured — though in practice the else branch should only be reached when one side is empty.

6. clusterMaxTimeDifference recomputed by scanning _sectorMap every iteration

Inside the per-sector-type loop, clusterMaxTimeDifference is computed by scanning all entries of _sectorMap:

float clusterMaxTimeDifference=0;
for(auto const &sectorMapEntry : _sectorMap)
{
  if(sectorMapEntry.second.sectorType==crvSectorType && sectorMapEntry.second.maxTimeDifference>clusterMaxTimeDifference)
    clusterMaxTimeDifference=sectorMapEntry.second.maxTimeDifference;
}

This is repeated for maxHalfLength and readout side detection as well. This is O(N²) in the number of sector types × sector map size. It would be cleaner to precompute these values once per sector type.


🟢 Minor / Style Issues

7. Typos in comments

  • "diving" should be "dividing" (appears multiple times):
    //find avg PE-weighted time by diving by total PEs
    
  • "orign" should be "origin":
    //calculate the time of the hit orign from both sides
    
  • "overconstraint" should be "overconstrained":
    //if we have more than one position at a readout side, the situation is actually overconstraint
    

8. Mixed float and double types

The code inconsistently uses float for some variables (clusterMaxTimeDifference, maxHalfLength, timeDifferenceOppositeSides) and double for others in the same computation chain. While not a bug, this could lead to subtle precision losses and makes the code harder to reason about.

9. readoutSide parameter type mismatch

filterHits takes unsigned int readoutSide, but it's compared against SiPM%CRVId::nSidesPerBar which is int % int = int. The implicit signed-to-unsigned comparison is safe here but could be made explicit.


Summary

Severity Issue Location
🔴 Bug float map keys can split same-counter hits clusterProperties
🔴 Bug filterHits PE threshold now only checks one side filterHits
🔴 Bug Deleted CrvCoincidence.hh still forward-declared HlPrint.hh
🟡 Logic One-side fallback can silently overwrite avgHitTime clusterProperties
🟡 Logic Multi-counter clusters use only first/last map entry clusterProperties
🟡 Perf Repeated _sectorMap scans per sector type produce()
🟢 Style Typos in comments ("diving", "orign", "overconstraint") Throughout
🟢 Style Inconsistent float/double mixing Throughout

@ehrlich-uva
Copy link
Copy Markdown
Contributor Author

Responses:

  1. This is the desired result.
  2. This is the desired result.
  3. Fixed.
  4. Will be done at a later time.
  5. This situation (hits at both counter sides in the else branch) won't happen.
  6. Done.
  7. Fixed.
  8. Will be addressed at a later time.
  9. No change required.

@ehrlich-uva ehrlich-uva marked this pull request as ready for review March 21, 2026 18:37
@oksuzian
Copy link
Copy Markdown
Collaborator

I'll review this pull request by examining its details and the code changes.Here is my review of PR #1772 — "Updated CRV coincidence cluster finder" by @ehrlich-uva. This is a substantial refactor (+174 / −425 across 9 files) that changes the algorithm for pairing hits from both readout sides of CRV counters. I've identified several issues and concerns:


🔴 Potential Bugs

1. Operator precedence bug in clusterProperties — division vs. map access

double hitTime0=hitTimes[0].begin()->second/hitPEs[0].begin()->second;
double hitTime1=hitTimes[1].rbegin()->second/hitPEs[1].rbegin()->second;

This divides hitTimes[side].begin()->second by hitPEs[side].begin()->second which computes a per-position average. However, hitPEs and hitTimes are keyed by longitudinalPos (um), so begin() and rbegin() give the entry at the smallest and largest longitudinal coordinate respectively. This only works correctly when there is exactly one entry per side in the map. If multiple counters contribute hits to the same readout side at different longitudinal positions, the code silently ignores all but the extremes. The comment says "we ignore the other positions" acknowledging this, but there is no flag or warning emitted to downstream consumers.

2. Sign convention issue for longitudinalPos offset

int side=hit->_SiPM%CRVId::nSidesPerBar;
if(side==0) longitudinalPos-=halfLength;
else longitudinalPos+=halfLength;

This converts the counter center position to the SiPM position by subtracting/adding halfLength. The assumption is that side 0 SiPMs sit at counterCenter - halfLength and side 1 at counterCenter + halfLength. If this convention isn't universally true for all CRV sector geometries (e.g., if some sectors have flipped orientation), this would silently produce wrong hit origin calculations. This should be verified against the geometry definitions.

3. hitPosOrigin boundary check may be inverted for certain geometries

if(hitPosOrigin>hitPos0 && hitPosOrigin<hitPos1)

This assumes hitPos0 < hitPos1 (i.e., side 0 is at a smaller coordinate than side 1). Given the offset logic above (side==0 → subtract, side==1 → add), this should hold if counter center coordinates are positive. However, for counters near coordinate zero or with negative center coordinates, hitPos0 could be greater than hitPos1, making this check always fail and falling into the fallback branch. Consider using:

if(hitPosOrigin > std::min(hitPos0,hitPos1) && hitPosOrigin < std::max(hitPos0,hitPos1))

4. One-sided hit time calculation can be overwritten

if(!hitTimes[0].empty())
{
  // ... sets avgHitTime from side 0
}
if(!hitTimes[1].empty())
{
  // ... overwrites avgHitTime from side 1
}

In the "only one readout side has hits" branch, both if blocks use non-exclusive if (not else if). If by some edge case both sides have entries (which shouldn't happen since we're in the else branch of !hitTimes[0].empty() && !hitTimes[1].empty()), the side-1 result would silently overwrite side-0. This is logically safe due to the outer condition, but using else if would make the intent clearer and prevent future regressions.


🟡 Design Concerns

5. _sectorTypeMap populated via default-initialized struct comparison

if(_sectorTypeMap[s.sectorType].maxTimeDifference<s.maxTimeDifference) _sectorTypeMap[s.sectorType].maxTimeDifference=s.maxTimeDifference;
if(_sectorTypeMap[s.sectorType].maxHalfLength<s.counterHalfLength) _sectorTypeMap[s.sectorType].maxHalfLength=s.counterHalfLength;

Using operator[] on the map will default-construct a sectorTypeProperties if the key doesn't exist (with maxTimeDifference=0, maxHalfLength=0). This works correctly since the initial zero values will always be less than real values, but it's fragile — if someone later changes the defaults or if negative values are ever valid, this breaks. Consider using find() + explicit initialization or try_emplace.

6. Floating-point keys via int truncation — potential precision issue

int halfLength=static_cast<int>(std::round(_sectorMap.at(crvSector).counterHalfLength*1000.0));
int longitudinalPos=static_cast<int>(std::round(counterPos[lengthDirection]*1000.0));

Using micrometer-precision int keys is clever to avoid floating-point map key issues. However, if two counters at the same readout side have center positions that differ by less than 1 mm (1000 um rounds to the same int after the halfLength offset), their hits would be erroneously combined into the same map entry. This seems unlikely for CRV counter spacing, but it's worth a comment or assertion.

7. Deletion of CrvCoincidence class — breaking change for downstream consumers

The file RecoDataProducts/inc/CrvCoincidence.hh is entirely removed, along with its dictionary entries. Any downstream code or analysis module that reads CrvCoincidenceCollection from ROOT files will break. Ensure no other modules in Mu2e/Offline or external packages depend on this class. A search for usages would be prudent.


🟢 Minor / Style Issues

8. Mixed float / double types

float clusterMaxTimeDifference=_sectorTypeMap.at(crvSectorType).maxTimeDifference;
float maxHalfLength=_sectorTypeMap.at(crvSectorType).maxHalfLength;
float timeDifferenceOppositeSides = 2.0*maxHalfLength/_fiberSignalSpeed;

_fiberSignalSpeed is double, maxTimeDifference is stored as double in the struct, but local variables are float. This introduces unnecessary precision loss. Use double consistently.

9. Blank line left after removing #cluster settings block

In prolog_v11.fcl and prolog_v12.fcl, the removal of the cluster settings lines leaves an empty line between crvRecoPulsesModuleLabel and #sector-specific coincidence settings. Minor cosmetic issue.

10. filterHits has O(n²) complexity

The filterHits method iterates over all hits for each hit (O(n²)). With the addition of a readout-side filter, hits from the wrong side are still iterated over in the inner loop (just skipped). Pre-partitioning hits by side before calling filterHits (which is already done at the caller level) means the inner loop in filterHits could also skip the redundant side check if the input list is already pre-filtered. Currently the side check is applied to the outer hit but the inner loop iterates the original unfiltered vector.


Summary

Severity Issue Location
🔴 Bug Boundary check hitPosOrigin>hitPos0 && hitPosOrigin<hitPos1 assumes coordinate ordering CrvCoincidenceFinder_module.cc, clusterProperties
🔴 Bug Sign convention for SiPM position offset needs geometry verification CrvCoincidenceFinder_module.cc, clusterProperties
🟡 Design Silent fallback when hit origin is outside counter — no flag/warning CrvCoincidenceFinder_module.cc, clusterProperties
🟡 Design Removed CrvCoincidence class — verify no downstream dependencies RecoDataProducts/
🟡 Design float vs double mixed precision CrvCoincidenceFinder_module.cc
🟢 Minor O(n²) inner loop in filterHits iterates wrong-side hits CrvCoincidenceFinder_module.cc

The core algorithm change (separating readout sides for initial clustering, then recombining for final clustering using fiber signal travel time) is sound in concept, but the coordinate-ordering assumption in the hitPosOrigin boundary check (issue #3) is the most likely source of real bugs.

@ehrlich-uva
Copy link
Copy Markdown
Contributor Author

Responses:

  1. I added a flag in the data product.
  2. This sign convention is used for the entire code.
  3. I added an additional check.
  4. This situation (hits at both counter sides in the else branch) won't happen.
  5. Ok.
  6. Ok.
  7. CrvCoincidence has not been used in a long time.
  8. I changed the floats and doubles.
  9. Ok.
  10. Ok.

@ehrlich-uva ehrlich-uva marked this pull request as draft March 23, 2026 23:16
@oksuzian
Copy link
Copy Markdown
Collaborator

I'll review the pull request to find any issues. Let me start by examining the PR details and its changes.Let me look at the actual changed files on the head branch to review the code.Let me now look at the actual changed source files in the PR branch to do a proper code review.Now I can see the key file — it's been renamed from CrvCoincidenceClusterFinder_module.cc to CrvCoincidenceFinder_module.cc. Let me fetch the actual changed files content.## Review of PR #1772: Updated CRV Coincidence Cluster Finder

PR Summary: This is a draft PR by @ehrlich-uva with 8 commits, 11 changed files (+223/−433 lines). It renames the old two-step CrvCoincidence + CrvCoincidenceClusterFinder pipeline into a single consolidated CrvCoincidenceFinder module, and changes the algorithm that pairs hits from both readout sides of CRV counters.


Issues Found

1. 🔴 Potential Division-by-Zero in Slope Calculation

double slope=(PEs*sumXY-sumX*sumY)/(PEs*sumYY-sumY*sumY);

If all hits in a cluster are at the same _y (thickness direction) coordinate, the denominator PEs*sumYY - sumY*sumY becomes zero. While the assertion on line 455 (assert(layerSet.size()>1)) suggests multiple layers, floating-point precision with very close _y values could still lead to a near-zero denominator producing ±inf or NaN slopes.

2. 🔴 Division-by-Zero in checkCombination Slope Computation

slopes[d]=(layerIterators[d+1]->_x-layerIterators[d]->_x)/(layerIterators[d+1]->_y-layerIterators[d]->_y);

If two hits from different layers happen to be at the same _y position (e.g., due to geometry edge cases or rounding), this produces a division by zero. There is no guard against this.

3. 🟡 assert(layerSet.size()>1) in Production Code

assert(PEs>0);
assert(layerSet.size()>1);

Using assert() in production/framework code is risky because assertions are typically disabled in optimized builds (-DNDEBUG). If the invariant is ever violated, the code would silently produce garbage results instead of failing. Consider using throw or a proper error-handling mechanism.

4. 🟡 SiPM-2<0 Comparison with Signed/Unsigned Types

size_t testChannel = currentChannel + (SiPM-2<0 ? 2 : -2);

currentChannel is size_t (unsigned). When SiPM-2 >= 0, the expression adds -2 to an unsigned value. This works due to unsigned wraparound arithmetic, but it's fragile and compiler-dependent in terms of warnings. The (SiPM-2<0 ? 2 : -2) itself is fine since SiPM is int, but the addition of the signed result (-2) to size_t is a signed/unsigned mismatch.

5. 🟡 Comparison of std::vector::size() with Signed _bigClusterThreshold

if(hits.size()>_bigClusterThreshold)

_bigClusterThreshold is declared as size_t (line 79) but the config parameter bigClusterThreshold is an int (line 58). If a negative value is accidentally provided in the FCL, it would be implicitly cast to a very large size_t, effectively disabling the big-cluster bypass entirely.

6. 🟡 FCL Config Mismatch — Removed Parameters Still Referenced in Older Prolog Files

The new CrvCoincidenceFinder module no longer accepts crvCoincidenceCheckModuleLabel, maxDistance, maxTimeDifference, deadTimeWindowStartMargin, deadTimeWindowEndMargin, timeWindowStart, or timeWindowEnd (which were part of the old CrvCoincidenceClusterFinder config). However, older prolog files (prolog_v03.fcl, prolog_v05.fcl, prolog_v06.fcl, prolog_v08.fcl) still define CrvCoincidenceClusterFinder with the old parameter set including module_type: CrvCoincidenceClusterFinder. If these prolog files are used, the old module type will fail at runtime since CrvCoincidenceClusterFinder no longer exists as a registered module.

7. 🟡 Missing minSlope in SectorConfig for Older FCL Epilog Files

The new SectorConfig struct requires a minSlope parameter (line 42), but the epilog files like epilog_extracted_v02.fcl and epilog_run1a_v01.fcl don't define minSlope in their sector configurations. This will cause FCL validation errors if those epilog files are used with the new module.

8. 🟢 Typo in Comment

//used for the initial clustering process (to keep the number of
//hit combinations down that need to be checked for coincidendes).

Minor: "coincidendes" → "coincidences" (line 140).

9. 🟢 Comment says "missinge"

fhicl::Sequence<int> compensateChannelStatus{Name("compensateChannelStatus"), Comment("compensate missinge pulses for channels with the following channel statuses")};

Minor: "missinge" → "missing" (line 62).


Summary

Severity Count Description
🔴 High 2 Potential division-by-zero in slope calculations
🟡 Medium 4 Assert in production code, signed/unsigned mismatches, FCL compatibility issues
🟢 Low 2 Typos in comments

The most important items to address before merging are the division-by-zero risks in the slope calculations and ensuring FCL backward compatibility with older prolog/epilog files that still reference the removed module types and parameters.

@ehrlich-uva
Copy link
Copy Markdown
Contributor Author

  1. No change needed.
    This cannot happen, since we always require more than one layer. So all hits at the same y cannot happen. That's why the slope is defined in this way.
  2. No change needed.
    This cannot happen either, since two hits from different layers will always have a different y.
  3. They are more a debug check for something that actually can't happen. Still changed these two asserts to throws.
  4. No change needed.
    The expression subtracts 2 only, if SiPM>=2. If SiPM>=2, then currentChannel is also >=2 due to the calculation in the previous line. So the result can never be less than 0 even if it was a signed integer. Also, the variable testChannel is used at a function in the next line, where the argument is an unsigned int.
  5. Changed, but wouldn't have caused a problem in reality.
  6. No change needed.
    These old prologs won't be used with the current code.
  7. No change needed.
    These old epilogs won't be used with the current code. Also, epilog_run1a_v01.fcl has the minSlope parameter.
  8. Changed typo.
  9. Changed typo.

@ehrlich-uva ehrlich-uva marked this pull request as ready for review March 26, 2026 04:42
@oksuzian
Copy link
Copy Markdown
Collaborator

@FNALbuild run build test

@FNALbuild
Copy link
Copy Markdown
Collaborator

⌛ The following tests have been triggered for f7edd99: build (Build queue - API unavailable)

@FNALbuild
Copy link
Copy Markdown
Collaborator

☀️ The build tests passed at f7edd99.

Test Result Details
test with Command did not list any other PRs to include
merge Merged f7edd99 at 5b02c13
build (prof) Log file. Build time: 04 min 07 sec
ceSimReco Log file.
g4test_03MT Log file.
transportOnly Log file.
POT Log file.
g4study Log file.
cosmicSimReco Log file.
cosmicOffSpill Log file.
ceSteps Log file.
ceDigi Log file.
muDauSteps Log file.
ceMix Log file.
rootOverlaps Log file.
g4surfaceCheck Log file.
trigger Log file. Return Code 1.
check_cmake Log file.
FIXME, TODO ➡️ TODO (3) FIXME (0) in 7 files
clang-tidy ➡️ 6 errors 288 warnings
whitespace check no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at f7edd99 after being merged into the base branch at 5b02c13.

For more information, please check the job page here.
Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

@oksuzian oksuzian merged commit b4e91ff into Mu2e:main Mar 29, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants