Conversation
The purpose of this change is to remove a special-case for Sourcegraph.com. If a user's query has a repo filter like `repo:github.com/foo`, we rewrite it to `repo:^github.com/foo` (note the `^`). This makes the DB query faster because it only needs to search at the beginning of the string. This optimization only happens on Sourcegraph.com because technically someone could have a repo at (say) `https://github.com/owner/foo.github.com` or `https://myothercodehost.example.com/foo.github.com/bar`, and they could query `repo:github.com` intending to match one of those. We wanted to avoid that behavior on customer instances. Those are rare edge cases, and the user could work around this rewriting with `repo:.*github.com` if truly needed. We now make this rewriting only occur if there is a trailing slash, which mitigates the (anyway likely zero) impact. Making this rewriting behavior consistent removes a dotcom edge case and could improve repository search performance for common queries with a tiny hypothetical impact on the behavior. Note that GitHub Pages repositories are not affected here because those are named `foo.github.io` not `foo.github.com` (https://docs.github.com/en/pages/getting-started-with-github-pages/creating-a-github-pages-site#creating-a-repository-for-your-site).
ed276b1 to
d0e4376
Compare
keegancsmith
left a comment
There was a problem hiding this comment.
hmm, I'm not sure about this to be honest. This is an optimization that is not generic and is only useful for instances with millions of entries in the repos table. To be honest the whole of that function should be gated behind dotcom specific?
@keegancsmith could you explain more? What could go wrong if we applied this to Enterprise instances? Overall I like the idea of removing "dot com" specific functionality as much as possible. |
See the descriptions third paragraph. I also am fully on board for removing dotcom specific behaviour. After some more thought maybe the tradeoff on removing dotcom specific behaviour is worth the corner case? |
The purpose of this change is to remove a special-case for Sourcegraph.com.
If a user's query has a repo filter like
repo:github.com/foo, we rewrite it torepo:^github.com/foo(note the^). This makes the DB query faster because it only needs to search at the beginning of the string.This optimization only happens on Sourcegraph.com because technically someone could have a repo at (say)
https://github.com/owner/foo.github.comorhttps://myothercodehost.example.com/foo.github.com/bar, and they could queryrepo:github.comintending to match one of those. We wanted to avoid that behavior on customer instances. Those are rare edge cases, and the user could work around this rewriting withrepo:.*github.comif truly needed. We now make this rewriting only occur if there is a trailing slash, which mitigates the (anyway likely zero) impact.Making this rewriting behavior consistent removes a dotcom edge case and could improve repository search performance for common queries with a tiny hypothetical impact on the behavior.
Note that GitHub Pages repositories are not affected here because those are named
foo.github.ionotfoo.github.com(https://docs.github.com/en/pages/getting-started-with-github-pages/creating-a-github-pages-site#creating-a-repository-for-your-site).Test plan
CI