-
Notifications
You must be signed in to change notification settings - Fork 44
Bug in SpecificDiscoveryStore breaks PubSub communication #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Konrad Breitsprecher <[email protected]>
Signed-off-by: Konrad Breitsprecher <[email protected]>
Signed-off-by: Marius Börschig <[email protected]>
Signed-off-by: Marius Börschig <[email protected]>
Signed-off-by: Marius Börschig <[email protected]>
a0df15e to
e4edc7f
Compare
fix data races Signed-off-by: Marius Börschig <[email protected]>
reorder allReceived atomic bool Signed-off-by: Marius Börschig <[email protected]>
| auto& not_label_nodes = keyNode.notLabelMap[l.key].nodes; | ||
|
|
||
| size_t relevantNodeCount = fit_nodes.size() + not_label_nodes.size(); | ||
| if (relevantNodeCount < matchCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this just matches the last label to have any nodes in the cluster instead of the one with the least nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the matchCount condition and > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the matchCount condition and > 0. The bug was that the outgreedylabel that was found here in the specific label setup (reproduced by the test) did not contain any handers (of the subscriber) and thus the follow up logic to finish the pubsub connection never happened.
Maybe the dict here should never have been populated to get into this situation, at least the >0 prevents the bug.
Also, for "symmetry reasons" there might be the same situation for mandatory Labels a few lines above.
To stir it up a little more, we might need at least one person who proudly says "I understand what's happening here" otherwise we have a black box algorithm in a central unit that was "introduced for performance reasons by a former colleague". If there is no xkcd we should make one...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the matchCount condition and > 0. The bug was that the outgreedylabel that was found here in the specific label setup (reproduced by the test) did not contain any handers (of the subscriber) and thus the follow up logic to finish the pubsub connection never happened.
Maybe the dict here should never have been populated to get into this situation, at least the >0 prevents the bug.
Also, for "symmetry reasons" there might be the same situation for mandatory Labels a few lines above.
To stir it up a little more, we might need at least one person who proudly says "I understand what's happening here" otherwise we have a black box algorithm in a central unit that was "introduced for performance reasons by a former colleague". If there is no xkcd we should make one...
I fully agree - I think i'll try to make a clean room implementation of the label matching code next week.
This PR is no show stopper for the upcoming 5.0.3 release, though.
For a specific setup, the PubSub communication is failing. Minimal conditions to reproduce the bug are:
L1L1and the optional labelL2The bug can be traced back to the
SpecificDiscoveryStore. This is the lookup algorithm that is supposed to improve the matching of publishers and subscribers with labels.