Skip to content

Conversation

@KonradBreitsprecherBkd
Copy link
Contributor

@KonradBreitsprecherBkd KonradBreitsprecherBkd commented Jan 29, 2026

For a specific setup, the PubSub communication is failing. Minimal conditions to reproduce the bug are:

  • Two publishers, one subscriber (all running autonomous, async)
  • The publishers carry the optional label L1
  • The single subscriber carries the optional label L1 and the optional label L2
  • The publishers are started first, then the subscriber is started
// Start the two publishers first
.\SilKitDemoPublisher.exe -Aa -n Pub1 -f
.\SilKitDemoPublisher.exe -Aa -n Pub2 -f

// Then start the subscriber
.\SilKitDemoSubscriber.exe -aA

// Bug: No reception on the subscriber

The bug can be traced back to the SpecificDiscoveryStore. This is the lookup algorithm that is supposed to improve the matching of publishers and subscribers with labels.

@MariusBgm MariusBgm self-assigned this Feb 9, 2026
Signed-off-by: Marius Börschig <[email protected]>
Signed-off-by: Marius Börschig <[email protected]>
Signed-off-by: Marius Börschig <[email protected]>
@MariusBgm MariusBgm force-pushed the bug/pubsub_labelgroups branch from a0df15e to e4edc7f Compare February 10, 2026 10:17
fix data races

Signed-off-by: Marius Börschig <[email protected]>
reorder allReceived atomic bool

Signed-off-by: Marius Börschig <[email protected]>
auto& not_label_nodes = keyNode.notLabelMap[l.key].nodes;

size_t relevantNodeCount = fit_nodes.size() + not_label_nodes.size();
if (relevantNodeCount < matchCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this just matches the last label to have any nodes in the cluster instead of the one with the least nodes?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the matchCount condition and > 0

Copy link

@KonradBreitsprecher KonradBreitsprecher Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the matchCount condition and > 0. The bug was that the outgreedylabel that was found here in the specific label setup (reproduced by the test) did not contain any handers (of the subscriber) and thus the follow up logic to finish the pubsub connection never happened.

Maybe the dict here should never have been populated to get into this situation, at least the >0 prevents the bug.

Also, for "symmetry reasons" there might be the same situation for mandatory Labels a few lines above.

To stir it up a little more, we might need at least one person who proudly says "I understand what's happening here" otherwise we have a black box algorithm in a central unit that was "introduced for performance reasons by a former colleague". If there is no xkcd we should make one...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the matchCount condition and > 0. The bug was that the outgreedylabel that was found here in the specific label setup (reproduced by the test) did not contain any handers (of the subscriber) and thus the follow up logic to finish the pubsub connection never happened.

Maybe the dict here should never have been populated to get into this situation, at least the >0 prevents the bug.

Also, for "symmetry reasons" there might be the same situation for mandatory Labels a few lines above.

To stir it up a little more, we might need at least one person who proudly says "I understand what's happening here" otherwise we have a black box algorithm in a central unit that was "introduced for performance reasons by a former colleague". If there is no xkcd we should make one...

I fully agree - I think i'll try to make a clean room implementation of the label matching code next week.
This PR is no show stopper for the upcoming 5.0.3 release, though.

@MariusBgm MariusBgm added the bug Something isn't working label Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants