Skip to content

Populate sender email and recipients in threads output#78

Open
cpinto wants to merge 1 commit intobasecamp:mainfrom
cpinto:fix/threads-populate-sender-email-and-recipients
Open

Populate sender email and recipients in threads output#78
cpinto wants to merge 1 commit intobasecamp:mainfrom
cpinto:fix/threads-populate-sender-email-and-recipients

Conversation

@cpinto
Copy link
Copy Markdown
Contributor

@cpinto cpinto commented Apr 14, 2026

Summary

  • hey threads <id> --json was returning empty creator.email_address and an empty recipients array for every entry. The HTML parser only captured the sender's display name and discarded the rest of the sender link.
  • Scrape the sender email from the <span class="entry__sender-email"> element inside the sender anchor.
  • Extract per-entry recipients by slicing HTML between entry anchors and reusing the existing fullRecipientsRe + extractEmails helpers. Dedupe recipients by email so a repeat in the HTML does not produce duplicate contacts.
  • Additive only — existing callers of ParseTopicEntriesHTML keep working; empty fields become populated.

Why it matters: the empty recipients list makes any "reply all" flow downstream impossible without re-fetching and re-parsing the topic page, because there is no way to know who else was addressed on the entry.

Test plan

  • go build ./...
  • go test ./internal/htmlutil/... ./internal/cmd/...
  • hey threads <real-thread-id> --json — verified creator.email_address and recipients[].email_address are populated on all entries (previously empty)
  • hey threads <real-thread-id> (styled) — output unchanged

🤖 Generated with Claude Code


Summary by cubic

Populates creator.email_address and per-entry recipients in hey threads <id> --json so reply-all flows have the data they need. No changes to styled output; existing callers keep working.

  • Bug Fixes
    • Scrapes sender email from <span class="entry__sender-email"> inside the sender anchor and maps it by entry id.
    • Extracts per-entry recipients by slicing between entry anchors and using fullRecipientsRe + extractEmails; dedupes by email.
    • Keeps ParseTopicEntriesHTML signature and behavior additive; previously empty fields are now populated.

Written for commit eeb083d. Summary will update on new commits.

`hey threads <id> --json` was returning empty `creator.email_address`
and `recipients` for every entry because the HTML parser only captured
the sender's display name and ignored the rest of the sender link.

Scrape the sender email from the `<span class="entry__sender-email">`
inside each sender anchor, and extract per-entry recipients by slicing
the HTML between entry anchors and reusing the existing
`fullRecipientsRe` + `extractEmails` helpers. Dedupe recipients by
address so a repeat in the HTML doesn't produce duplicate contacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 14, 2026 12:05
@github-actions github-actions Bot added the bug Something isn't working label Apr 14, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the HTML-based thread entry parsing so hey threads <id> --json includes each entry’s sender email and recipients, enabling downstream flows (like “reply all”) without re-fetching the topic page.

Changes:

  • Scrape creator.email_address for each entry from the sender markup (entry__sender-email).
  • Populate per-entry recipients[] by extracting emails from the entry-scoped entry__full-recipients section and deduping by email.

Tip

If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

var (
entryBlockRe = regexp.MustCompile(`(?s)data-entry-id="(\d+)"`)
senderRe = regexp.MustCompile(`id="sender_entry_(\d+)"[^>]*>\s*([^<]+?)\s*<`)
senderEmailRe = regexp.MustCompile(`(?s)sender_entry_(\d+).*?entry__sender-email[^>]*><span[^>]*>[^<]*</span>([^<]+)<`)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

senderEmailRe is very loosely scoped: it matches sender_entry_(\d+) and then uses .*? with DOTALL to find the next entry__sender-email anywhere later in the document. If any sender block is missing the expected entry__sender-email markup (or if sender_entry_### appears outside the sender element), this can mis-associate an email with the wrong entry ID. Consider tightening the regex to anchor on id="sender_entry_(\d+)" and constrain the match to within the sender element (e.g., stop at </a>), or extract the sender block first and then parse the email within that substring.

Suggested change
senderEmailRe = regexp.MustCompile(`(?s)sender_entry_(\d+).*?entry__sender-email[^>]*><span[^>]*>[^<]*</span>([^<]+)<`)
senderEmailRe = regexp.MustCompile(`(?s)id="sender_entry_(\d+)"[^>]*>.*?entry__sender-email[^>]*><span[^>]*>[^<]*</span>\s*([^<]+)\s*</a>`)

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +124
for i, eid := range entryIDs {
anchor := fmt.Sprintf(`id="entry_%s"`, eid)
start := strings.Index(html, anchor)
if start < 0 {
continue
}
end := len(html)
if i+1 < len(entryIDs) {
nextAnchor := fmt.Sprintf(`id="entry_%s"`, entryIDs[i+1])
if n := strings.Index(html[start:], nextAnchor); n > 0 {
end = start + n
}
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recipients extraction loop does a full strings.Index(html, anchor) scan for every entry ID, making parsing O(n*m) over the HTML size. Since entryIDs are already in document order, consider tracking the current offset (search from the previous anchor forward) or precomputing anchor indices once, then slice based on those positions. This keeps performance predictable for large threads and also avoids accidentally matching an earlier occurrence of the same anchor substring.

Copilot uses AI. Check for mistakes.
Comment on lines +110 to +137
// Associate recipients with entries by slicing between entry anchors.
entryRecipients := map[string][]models.Contact{}
for i, eid := range entryIDs {
anchor := fmt.Sprintf(`id="entry_%s"`, eid)
start := strings.Index(html, anchor)
if start < 0 {
continue
}
end := len(html)
if i+1 < len(entryIDs) {
nextAnchor := fmt.Sprintf(`id="entry_%s"`, entryIDs[i+1])
if n := strings.Index(html[start:], nextAnchor); n > 0 {
end = start + n
}
}
m := fullRecipientsRe.FindStringSubmatch(html[start:end])
if m == nil {
continue
}
seen := map[string]bool{}
for _, addr := range extractEmails(m[1]) {
if seen[addr] {
continue
}
seen[addr] = true
entryRecipients[eid] = append(entryRecipients[eid], models.Contact{EmailAddress: addr})
}
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior is being added to ParseTopicEntriesHTML (sender email scraping, per-entry recipients parsing + deduping), but there are currently no unit tests covering this HTML parsing. Adding a focused test with a minimal HTML fixture would help prevent silent regressions when HEY’s markup changes again (e.g., ensure creator.email_address and recipients[].email_address populate as expected per entry).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants