Skip to content

Implement bot detection and verification#362

Open
jazairi wants to merge 3 commits intomainfrom
use-410-turnstile-gem
Open

Implement bot detection and verification#362
jazairi wants to merge 3 commits intomainfrom
use-410-turnstile-gem

Conversation

@jazairi
Copy link
Contributor

@jazairi jazairi commented Feb 26, 2026

Why these changes are being introduced:

We need a preliminary solution to manage bot traffic in production.

Relevant ticket(s):

How this addresses that need:

  • Implements a Bot Detector service that uses Crawler Detect to identify bots to refer to Cloudflare Turnstile.
  • Implements a Turnstile Controller using the rails-cloudfire-turnstile gem for verification challenges.

Side effects of this change:

The new form may need additional styling. For now, I just added our button styles.

Developer

Accessibility
  • ANDI or WAVE has been run in accordance to our guide.
  • This PR contains no changes to the view layer.
  • New issues flagged by ANDI or WAVE have been resolved.
  • New issues flagged by ANDI or WAVE have been ticketed (link in the Pull Request details above).
  • No new accessibility issues have been flagged.
New ENV
  • All new ENV is documented in README.
  • All new ENV has been added to Heroku Pipeline, Staging and Prod.
  • ENV has not changed.
Approval beyond code review
  • UXWS/stakeholder approval has been confirmed.
  • UXWS/stakeholder review will be completed retroactively.
  • UXWS/stakeholder review is not needed.
Additional context needed to review

This introduces rails-cloudflare-turnstile and crawler_detect as dependencies.

Code Reviewer

Code
  • I have confirmed that the code works as intended.
  • Any CodeClimate issues have been fixed or confirmed as
    added technical debt.
Documentation
  • The commit message is clear and follows our guidelines
    (not just this pull request message).
  • The documentation has been updated or is unnecessary.
  • New dependencies are appropriate or there were no changes.
Testing
  • There are appropriate tests covering any new functionality.
  • No additional test coverage is required.

@jazairi jazairi force-pushed the use-410-turnstile-gem branch from d4f048e to 460d738 Compare February 26, 2026 17:09
@jazairi jazairi had a problem deploying to timdex-ui-pi-use-410-tu-bsn4kh February 26, 2026 17:09 Failure
@jazairi
Copy link
Contributor Author

jazairi commented Feb 26, 2026

Looks like the coveralls outage is ongoing. I'd like to wait until that's resolved to merge, but I think we can start code review in the meantime.

@JPrevost JPrevost self-assigned this Feb 27, 2026
Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you configure the PR build to have this feature enabled and functioning? I want to easily be able to confirm my normal user agent is not blocked from anything but if I change my user agent to a known bot I should be challenged and then be able to pass the challenge to continue.

end

def bot_detection_enabled?
if defined?(Feature)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this check? If the feature is not enabled, why are we checking if the feature is enabled?

Wouldn't Feature.enabled?(:bot_detection) work for this entire method with no internal conditions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah, good point. It's excessively cautious, since we've gated feature validation elsewhere.

head :not_found unless Feature.enabled?(:bot_detection)
end

def handle_forbidden
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some docstrings to explain what is going on in this method (even though it is private)? It's unclear to me if this is the normal path a failed cloud flare turnstile takes or if it is a problem with our tokens, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was envisioning an issue with the tokens -- will clarify.

render :show, status: :unprocessable_entity
end

def safe_return_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some docstrings to explain what is going on in this method (even though it is private)?

private

def ensure_bot_detection_enabled
head :not_found unless Feature.enabled?(:bot_detection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are assuming this should never happen, maybe we should log to Sentry if it does? My assumption is this is handling a case in which a user agent is requesting the turnstile route even though we have it disabled which should not happen under normal circumstances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the idea, but now I'm wondering if we even care whether user agents are requesting the turnstile route. I might just drop this for clarity's sake.

detector = CrawlerDetect.new(ua)
detector.crawler?
rescue StandardError => e
Rails.logger.warn("BotDetector: crawler_detect failed for UA '#{ua}': #{e.message}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think warn is the right level here. I'd lean towards info or debug with a slight preference towards debug (i.e. I don't really want this in production logs, but if we want to keep them in for a bit until they get annoying I'm okay with info).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this. Happy to go with debug for now.

# SearchController `results` action (path `/results`) and other search-related
# paths to be search result pages. This keeps the rule simple and conservative.
path = request.path.to_s
return true if path.start_with?('/search') || path.start_with?('/results') || path.include?('/search')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check these routes? I don't believe /search is a route at all.

I feel like both /results and /record are the paths to block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protect all the routes, especially the shadow routes!

@qltysh
Copy link

qltysh bot commented Mar 2, 2026

❌ 2 blocking issues (2 total)

Tool Category Rule Count
rubocop Lint Class has too many lines. [289/100] 1
rubocop Style Line is too long. [121/120] 1

class Feature
# List of all valid features in the application
VALID_FEATURES = %i[geodata boolean_picker oa_always simulate_search_latency tab_primo_all tab_timdex_all
VALID_FEATURES = %i[bot_detection geodata boolean_picker oa_always simulate_search_latency tab_primo_all tab_timdex_all
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long. [121/120] [rubocop:Layout/LineLength]

@jazairi jazairi had a problem deploying to timdex-ui-pi-use-410-tu-bsn4kh March 2, 2026 18:09 Failure
jazairi added 2 commits March 2, 2026 10:11
Why these changes are being introduced:

We need a preliminary solution to manage bot traffic in production.

Relevant ticket(s):

- https://mitlibraries.atlassian.net/browse/USE-410

How this addresses that need:

- Implements a Bot Detector service that uses Crawler Detect to identify
bots to refer to Cloudflare Turnstile.
- Implements a Turnstile Controller using the `rails-cloudfire-turnstile`
gem for verification challenges.

Side effects of this change:

The new form may need additional styling. For now, I just added our
button styles.
@jazairi jazairi force-pushed the use-410-turnstile-gem branch from 76f6402 to fb87d65 Compare March 2, 2026 18:39
@jazairi jazairi temporarily deployed to timdex-ui-pi-use-410-tu-bsn4kh March 2, 2026 18:41 Inactive
@coveralls
Copy link

coveralls commented Mar 2, 2026

Pull Request Test Coverage Report for Build 22592486169

Details

  • 37 of 37 (100.0%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.05%) to 98.188%

Totals Coverage Status
Change from base Build 22584961347: 0.05%
Covered Lines: 1355
Relevant Lines: 1380

💛 - Coveralls

@jazairi jazairi force-pushed the use-410-turnstile-gem branch from fb87d65 to 97274a2 Compare March 2, 2026 18:41
@jazairi jazairi temporarily deployed to timdex-ui-pi-use-410-tu-bsn4kh March 2, 2026 18:43 Inactive
@jazairi jazairi temporarily deployed to timdex-ui-pi-use-410-tu-bsn4kh March 2, 2026 18:46 Inactive
@jazairi jazairi force-pushed the use-410-turnstile-gem branch from 42b13a4 to 5c844c9 Compare March 2, 2026 19:32
@jazairi jazairi temporarily deployed to timdex-ui-pi-use-410-tu-bsn4kh March 2, 2026 19:33 Inactive
@jazairi jazairi force-pushed the use-410-turnstile-gem branch from 5c844c9 to f04ae32 Compare March 2, 2026 19:36
@jazairi jazairi temporarily deployed to timdex-ui-pi-use-410-tu-bsn4kh March 2, 2026 19:36 Inactive
- Explicity require Feature model in initializer
- Update CrawlerDetect method name
- Remove trailing whitespace from Bot Detector test
- Switch from OpenStruct to Struct
- Update test assertions

Test Turnstile root path fallback and fix typo in CrawlerDetect instantiation
@jazairi
Copy link
Contributor Author

jazairi commented Mar 2, 2026

@JPrevost I think this is ready for review again. I pushed two separate commits, one two respond to code review feedback, and the other for linting and cleanup. When I enabled the feature in the PR build, I learned that my method name for CrawlerDetect was wrong, so that's fixed. 🤦 I'd flipped the bots? boolean to test locally, and all the stubs had the wrong name, too -- a good case for VCR, perhaps.

In any case, here's how I've tested the feature:

  1. Set my UA to Googlebot in dev tools.
  2. Run a search and confirm that the challenge view shows.
  3. Update your UA to something non-bot and confirm that you can pass the challenge. (In my experience, it does not let you pass the challenge if using a bot UA.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants