Populate sender email and recipients in threads output by cpinto · Pull Request #78 · basecamp/hey-cli

cpinto · 2026-04-14T12:05:49Z

Summary

hey threads <id> --json was returning empty creator.email_address and an empty recipients array for every entry. The HTML parser only captured the sender's display name and discarded the rest of the sender link.
Scrape the sender email from the  element inside the sender anchor.
Extract per-entry recipients by slicing HTML between entry anchors and reusing the existing fullRecipientsRe + extractEmails helpers. Dedupe recipients by email so a repeat in the HTML does not produce duplicate contacts.
Additive only — existing callers of ParseTopicEntriesHTML keep working; empty fields become populated.

Why it matters: the empty recipients list makes any "reply all" flow downstream impossible without re-fetching and re-parsing the topic page, because there is no way to know who else was addressed on the entry.

Test plan

go build ./...
go test ./internal/htmlutil/... ./internal/cmd/...
hey threads <real-thread-id> --json — verified creator.email_address and recipients[].email_address are populated on all entries (previously empty)
hey threads <real-thread-id> (styled) — output unchanged

🤖 Generated with Claude Code

Summary by cubic

Populates creator.email_address and per-entry recipients in hey threads <id> --json so reply-all flows have the data they need. No changes to styled output; existing callers keep working.

Bug Fixes
- Scrapes sender email from  inside the sender anchor and maps it by entry id.
- Extracts per-entry recipients by slicing between entry anchors and using fullRecipientsRe + extractEmails; dedupes by email.
- Keeps ParseTopicEntriesHTML signature and behavior additive; previously empty fields are now populated.

^{Written for commit eeb083d. Summary will update on new commits.}

`hey threads <id> --json` was returning empty `creator.email_address` and `recipients` for every entry because the HTML parser only captured the sender's display name and ignored the rest of the sender link. Scrape the sender email from the `` inside each sender anchor, and extract per-entry recipients by slicing the HTML between entry anchors and reusing the existing `fullRecipientsRe` + `extractEmails` helpers. Dedupe recipients by address so a repeat in the HTML doesn't produce duplicate contacts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cubic-dev-ai

No issues found across 1 file

Copilot

Pull request overview

This PR improves the HTML-based thread entry parsing so hey threads <id> --json includes each entry’s sender email and recipients, enabling downstream flows (like “reply all”) without re-fetching the topic page.

Changes:

Scrape creator.email_address for each entry from the sender markup (entry__sender-email).
Populate per-entry recipients[] by extracting emails from the entry-scoped entry__full-recipients section and deduping by email.

Tip

If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T12:09:36Z

 var (
 	entryBlockRe     = regexp.MustCompile(`(?s)data-entry-id="(\d+)"`)
 	senderRe         = regexp.MustCompile(`id="sender_entry_(\d+)"[^>]*>\s*([^<]+?)\s*<`)
+	senderEmailRe    = regexp.MustCompile(`(?s)sender_entry_(\d+).*?entry__sender-email[^>]*><span[^>]*>[^<]*</span>([^<]+)<`)


senderEmailRe is very loosely scoped: it matches sender_entry_(\d+) and then uses .*? with DOTALL to find the next entry__sender-email anywhere later in the document. If any sender block is missing the expected entry__sender-email markup (or if sender_entry_### appears outside the sender element), this can mis-associate an email with the wrong entry ID. Consider tightening the regex to anchor on id="sender_entry_(\d+)" and constrain the match to within the sender element (e.g., stop at </a>), or extract the sender block first and then parse the email within that substring.

Suggested change

senderEmailRe = regexp.MustCompile(`(?s)sender_entry_(\d+).*?entry__sender-email[^>]*><span[^>]*>[^<]*([^<]+)<`)

senderEmailRe = regexp.MustCompile(`(?s)id="sender_entry_(\d+)"[^>]*>.*?entry__sender-email[^>]*><span[^>]*>[^<]*\s*([^<]+)\s*</a>`)

Copilot · 2026-04-14T12:09:36Z

+	for i, eid := range entryIDs {
+		anchor := fmt.Sprintf(`id="entry_%s"`, eid)
+		start := strings.Index(html, anchor)
+		if start < 0 {
+			continue
+		}
+		end := len(html)
+		if i+1 < len(entryIDs) {
+			nextAnchor := fmt.Sprintf(`id="entry_%s"`, entryIDs[i+1])
+			if n := strings.Index(html[start:], nextAnchor); n > 0 {
+				end = start + n
+			}
+		}


The recipients extraction loop does a full strings.Index(html, anchor) scan for every entry ID, making parsing O(n*m) over the HTML size. Since entryIDs are already in document order, consider tracking the current offset (search from the previous anchor forward) or precomputing anchor indices once, then slice based on those positions. This keeps performance predictable for large threads and also avoids accidentally matching an earlier occurrence of the same anchor substring.

Copilot · 2026-04-14T12:09:36Z

+	// Associate recipients with entries by slicing between entry anchors.
+	entryRecipients := map[string][]models.Contact{}
+	for i, eid := range entryIDs {
+		anchor := fmt.Sprintf(`id="entry_%s"`, eid)
+		start := strings.Index(html, anchor)
+		if start < 0 {
+			continue
+		}
+		end := len(html)
+		if i+1 < len(entryIDs) {
+			nextAnchor := fmt.Sprintf(`id="entry_%s"`, entryIDs[i+1])
+			if n := strings.Index(html[start:], nextAnchor); n > 0 {
+				end = start + n
+			}
+		}
+		m := fullRecipientsRe.FindStringSubmatch(html[start:end])
+		if m == nil {
+			continue
+		}
+		seen := map[string]bool{}
+		for _, addr := range extractEmails(m[1]) {
+			if seen[addr] {
+				continue
+			}
+			seen[addr] = true
+			entryRecipients[eid] = append(entryRecipients[eid], models.Contact{EmailAddress: addr})
+		}
+	}


New behavior is being added to ParseTopicEntriesHTML (sender email scraping, per-entry recipients parsing + deduping), but there are currently no unit tests covering this HTML parsing. Adding a focused test with a minimal HTML fixture would help prevent silent regressions when HEY’s markup changes again (e.g., ensure creator.email_address and recipients[].email_address populate as expected per entry).

Copilot AI review requested due to automatic review settings April 14, 2026 12:05

github-actions Bot added the bug Something isn't working label Apr 14, 2026

Copilot started reviewing on behalf of cpinto April 14, 2026 12:06 View session

cubic-dev-ai Bot reviewed Apr 14, 2026

View reviewed changes

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Populate sender email and recipients in threads output#78

Populate sender email and recipients in threads output#78
cpinto wants to merge 1 commit intobasecamp:mainfrom
cpinto:fix/threads-populate-sender-email-and-recipients

cpinto commented Apr 14, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	senderEmailRe = regexp.MustCompile(`(?s)sender_entry_(\d+).?entry__sender-email[^>]><span[^>]>[^<]</span>([^<]+)<`)
	senderEmailRe = regexp.MustCompile(`(?s)id="sender_entry_(\d+)"[^>]>.?entry__sender-email[^>]><span[^>]>[^<]</span>\s([^<]+)\s*</a>`)

Conversation

cpinto commented Apr 14, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cpinto commented Apr 14, 2026 •

edited by cubic-dev-ai Bot

Loading