Skip to content

[WIP] EC-1710: Acceptance test profiling instrumentation#3229

Open
st3penta wants to merge 2 commits intoconforma:mainfrom
st3penta:EC-1710-acceptance-profiling
Open

[WIP] EC-1710: Acceptance test profiling instrumentation#3229
st3penta wants to merge 2 commits intoconforma:mainfrom
st3penta:EC-1710-acceptance-profiling

Conversation

@st3penta
Copy link
Copy Markdown
Contributor

Summary

  • Temporary profiling branch for spike EC-1710 (acceptance test performance investigation)
  • NOT intended for merge -- will be closed after data collection
  • Purpose: collect runtime profiling data from CI acceptance test runs

What this does

Adds a profile package to the acceptance tests that instruments the test suite to collect:

  • Per-scenario wall-clock timing -- identifies the slowest scenarios and features
  • Container startup statistics -- tracks creation time for registry, git, and wiremock containers (count, total, avg, min, max)
  • Kind cluster creation sub-phases -- breaks down cluster startup into node creation, config application, CLI image build, and task bundle build
  • Peak container concurrency -- tracks how many containers are active simultaneously

The profiling report is printed to stderr at the end of the test run, so it will be visible in the GitHub Actions job logs for the acceptance test runs.

Files changed

File Change
acceptance/profile/profile.go New profiling package
acceptance/acceptance_test.go Init/report hooks, per-scenario timing
acceptance/kubernetes/kind/kind.go Kind cluster phase instrumentation
acceptance/registry/registry.go Registry container timing
acceptance/git/git.go Git container timing
acceptance/wiremock/wiremock.go Wiremock container timing

Test plan

  • CI acceptance tests run successfully with profiling enabled
  • Profiling report appears in the acceptance test job's stderr output
  • Collected data is sufficient to identify performance bottlenecks

🤖 Generated with Claude Code

Adds a profile package to the acceptance tests that tracks per-scenario
wall-clock timing, container startup timing by type (registry, git,
wiremock), Kind cluster creation sub-phases, and peak container
concurrency. The profiling report is printed to stderr at the end of
the test run, making it visible in GitHub Actions logs.

This is temporary instrumentation for the EC-1710 performance spike.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Important

Review skipped

Ignore keyword(s) in the title.

⛔ Ignored keywords (1)
  • WIP

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 54552d69-3c6f-425a-868e-09629fb57889

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Add acceptance test profiling instrumentation for EC-1710

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add profiling instrumentation to acceptance tests for performance analysis
• Track per-scenario wall-clock timing to identify slowest test scenarios
• Instrument container startup statistics (registry, git, wiremock) with count, total, avg, min, max
  metrics
• Break down Kind cluster creation into sub-phases (node creation, config, CLI image, task bundle)
• Monitor peak container concurrency and output profiling report to stderr
Diagram
flowchart LR
  Init["profile.Init()"]
  ScenarioStart["ScenarioStart()"]
  ContainerTiming["BeginContainer()"]
  PhaseTiming["Begin()"]
  ScenarioEnd["ScenarioEnd()"]
  Report["Report()"]
  
  Init --> ScenarioStart
  ScenarioStart --> ContainerTiming
  ScenarioStart --> PhaseTiming
  ContainerTiming --> ScenarioEnd
  PhaseTiming --> ScenarioEnd
  ScenarioEnd --> Report
  
  Report -->|Output| Stderr["stderr report"]
  Report -->|Output| JsonL["events.jsonl"]
  Report -->|Output| ReportTxt["report.txt"]
Loading

Grey Divider

File Changes

1. acceptance/profile/profile.go ✨ Enhancement +333/-0

New profiling package for test instrumentation

• New profiling package with runtime instrumentation for acceptance tests
• Tracks per-scenario timing, container startup statistics, and Kind cluster phases
• Records phase durations, container creation metrics (count, total, avg, min, max)
• Monitors peak container concurrency and generates formatted profiling report
• Outputs results to stderr and JSON event log in /tmp/ec-profile/

acceptance/profile/profile.go


2. acceptance/acceptance_test.go ✨ Enhancement +7/-0

Integrate profiling hooks into test suite

• Import new profile package
• Call profile.Init() at test suite start and profile.Report() at end
• Add ScenarioStart() hook in Before to mark scenario beginning
• Add ScenarioEnd() hook in After to record scenario duration
• Ensure Report() is called on early exit paths

acceptance/acceptance_test.go


3. acceptance/kubernetes/kind/kind.go ✨ Enhancement +16/-0

Instrument Kind cluster creation phases

• Import profile package
• Instrument Kind cluster creation with phase timing markers
• Track sub-phases: node creation, config application, CLI image build, task bundle build
• Wrap each phase with Begin()/end() calls to measure duration
• Record total cluster creation time

acceptance/kubernetes/kind/kind.go


View more (3)
4. acceptance/registry/registry.go ✨ Enhancement +4/-0

Add registry container timing instrumentation

• Import profile package
• Wrap registry container creation with BeginContainer() timing
• Record container startup duration from creation to port mapping
• Handle error cases to ensure timing is recorded

acceptance/registry/registry.go


5. acceptance/git/git.go ✨ Enhancement +4/-0

Add git container timing instrumentation

• Import profile package
• Wrap git container creation with BeginContainer() timing
• Record container startup duration from creation to port mapping
• Handle error cases to ensure timing is recorded

acceptance/git/git.go


6. acceptance/wiremock/wiremock.go ✨ Enhancement +5/-0

Add wiremock container timing instrumentation

• Import profile package
• Wrap wiremock container creation with BeginContainer() timing
• Record container startup duration from creation to port mapping
• Handle error cases to ensure timing is recorded

acceptance/wiremock/wiremock.go


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review bot commented Apr 13, 2026

Code Review by Qodo

🐞 Bugs (3)   📘 Rule violations (0)   📎 Requirement gaps (0)
🐞\ ≡ Correctness (2) ➹ Performance (1)

Grey Divider


Action required

1. Container concurrency miscount 🐞
Description
profile.BeginContainer increments activeContainers but nothing ever decrements it, so the report’s
“Peak concurrent containers”/“Final active containers” reflect cumulative startup completions (and
can include failed startups) rather than true active container concurrency.
Code

acceptance/profile/profile.go[R124-159]

+// BeginContainer starts timing a container creation. Call the returned function when ready.
+func BeginContainer(containerType string) func() {
+	if !enabled {
+		return func() {}
+	}
+	start := time.Now()
+	return func() {
+		dur := time.Since(start)
+		mu.Lock()
+		stats, ok := containerTypes[containerType]
+		if !ok {
+			stats = &containerStats{Min: dur}
+			containerTypes[containerType] = stats
+		}
+		stats.Count++
+		stats.Total += dur
+		if dur < stats.Min {
+			stats.Min = dur
+		}
+		if dur > stats.Max {
+			stats.Max = dur
+		}
+		mu.Unlock()
+
+		current := activeContainers.Add(1)
+		mu.Lock()
+		if current > peakContainers {
+			peakContainers = current
+		}
+		mu.Unlock()
+
+		logEvent(map[string]any{
+			"event": "container", "type": containerType,
+			"duration_ms": dur.Milliseconds(), "ts": time.Now().Format(time.RFC3339),
+		})
+	}
Evidence
BeginContainer increments activeContainers and updates peakContainers, but there is no corresponding
decrement anywhere, while the report prints these values as active/concurrent container counts.
Additionally, call sites invoke endContainer() even on GenericContainer error, which still
increments activeContainers and records stats for a container that never started.

acceptance/profile/profile.go[124-160]
acceptance/profile/profile.go[225-257]
acceptance/git/git.go[181-198]
acceptance/registry/registry.go[110-127]
acceptance/wiremock/wiremock.go[219-236]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`BeginContainer` increments `activeContainers` but the value is never decremented, yet the report labels it as “active”/“concurrent” containers. Error paths also call `endContainer()` which increments the counters even when `GenericContainer` fails.

## Issue Context
The current code only has a “startup finished” callback and does not have any hook for container stop/teardown, so it cannot accurately represent *active lifetime* container concurrency unless you either (a) add a decrement hook on teardown, or (b) rename/report a different metric (e.g., “containers started” or “concurrent container startups”).

## Fix Focus Areas
- acceptance/profile/profile.go[124-160]
- acceptance/profile/profile.go[225-257]
- acceptance/git/git.go[183-197]
- acceptance/registry/registry.go[112-126]
- acceptance/wiremock/wiremock.go[221-235]

## Suggested fix approach
- Stop counting failed startups: only record container stats and increment any counters after a successful startup.
- Either:
 - **Option A (recommended for correctness):** change the API to support both success/failure and teardown:
   - `BeginContainer(type) (finish func(success bool), stopped func())`
   - `finish(true)` records duration and increments an `activeContainers` gauge; `stopped()` decrements.
 - **Option B:** if no teardown hook exists, rename the metric to match behavior (e.g., `containersStarted`) and remove “active/concurrent” wording from the report.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Container timing includes MappedPort 🐞
Description
The container startup timer is stopped after MappedPort(), so the measured “startup” duration
includes additional Docker API calls beyond container start and can be inflated/variable across
runs.
Code

acceptance/registry/registry.go[R112-125]

+	endContainer := profile.BeginContainer("registry")
	registry, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
		ContainerRequest: req,
		Started:          true,
		Logger:           logger,
	})
	if err != nil {
+		endContainer()
		return ctx, err
	}

	port, err := registry.MappedPort(ctx, "5000/tcp")
+	endContainer()
	if err != nil {
Evidence
All three container call sites start timing before GenericContainer, but stop timing only after
MappedPort returns; this includes extra work beyond container creation/start. The same end function
is also invoked on GenericContainer error paths, further mixing failed and successful timings.

acceptance/registry/registry.go[110-127]
acceptance/git/git.go[181-198]
acceptance/wiremock/wiremock.go[219-236]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Container startup timing is recorded until after `MappedPort()` completes, which measures more than just container startup and can skew comparisons.

## Issue Context
If the intent is “time-to-container-ready”, then the metric should be named accordingly. If the intent is “container startup time”, the timer should end immediately after `GenericContainer(... Started: true ...)` succeeds.

## Fix Focus Areas
- acceptance/git/git.go[183-197]
- acceptance/registry/registry.go[112-126]
- acceptance/wiremock/wiremock.go[221-235]

## Suggested fix approach
- Move `endContainer()` to immediately after `GenericContainer` succeeds.
- Do **not** call `endContainer()` on `GenericContainer` error (or pass a `success=false` flag if you adopt a new API).
- If you still want a separate metric for “time until port is mapped/ready”, add a second timer around `MappedPort()` with a distinct name.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Event logging fsync overhead 🐞
Description
logEvent calls outputFile.Sync() on every event, adding environment-dependent I/O latency that can
materially distort the phase/scenario/container timings being collected.
Code

acceptance/profile/profile.go[R95-104]

+func logEvent(data map[string]any) {
+	if outputFile == nil {
+		return
+	}
+	b, _ := json.Marshal(data)
+	fileMu.Lock()
+	outputFile.Write(append(b, '\n'))
+	outputFile.Sync()
+	fileMu.Unlock()
+}
Evidence
Each event does a synchronous write + Sync under a mutex, and Begin/BeginContainer/ScenarioEnd call
logEvent, so this overhead is incurred repeatedly throughout the test run; errors from
Marshal/Write/Sync are also ignored so failures are silent.

acceptance/profile/profile.go[95-104]
acceptance/profile/profile.go[106-122]
acceptance/profile/profile.go[124-160]
acceptance/profile/profile.go[170-188]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Calling `Sync()` per event can add non-trivial overhead (especially on disk-backed /tmp), inflating the timings you’re trying to measure.

## Issue Context
Durability on every line is usually unnecessary for CI profiling logs; buffering and flushing at the end typically provides enough safety with far less overhead.

## Fix Focus Areas
- acceptance/profile/profile.go[95-104]
- acceptance/profile/profile.go[324-333]

## Suggested fix approach
- Wrap `outputFile` with `bufio.Writer` and write JSONL lines to the buffer.
- Remove per-event `Sync()`; instead `Flush()` (and optionally `Sync()`) once in `doReport()` before closing.
- Check and handle marshal/write/flush errors (at least best-effort log to stderr once) to avoid silently losing profiling output.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +124 to +159
// BeginContainer starts timing a container creation. Call the returned function when ready.
func BeginContainer(containerType string) func() {
if !enabled {
return func() {}
}
start := time.Now()
return func() {
dur := time.Since(start)
mu.Lock()
stats, ok := containerTypes[containerType]
if !ok {
stats = &containerStats{Min: dur}
containerTypes[containerType] = stats
}
stats.Count++
stats.Total += dur
if dur < stats.Min {
stats.Min = dur
}
if dur > stats.Max {
stats.Max = dur
}
mu.Unlock()

current := activeContainers.Add(1)
mu.Lock()
if current > peakContainers {
peakContainers = current
}
mu.Unlock()

logEvent(map[string]any{
"event": "container", "type": containerType,
"duration_ms": dur.Milliseconds(), "ts": time.Now().Format(time.RFC3339),
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Container concurrency miscount 🐞 Bug ≡ Correctness

profile.BeginContainer increments activeContainers but nothing ever decrements it, so the report’s
“Peak concurrent containers”/“Final active containers” reflect cumulative startup completions (and
can include failed startups) rather than true active container concurrency.
Agent Prompt
## Issue description
`BeginContainer` increments `activeContainers` but the value is never decremented, yet the report labels it as “active”/“concurrent” containers. Error paths also call `endContainer()` which increments the counters even when `GenericContainer` fails.

## Issue Context
The current code only has a “startup finished” callback and does not have any hook for container stop/teardown, so it cannot accurately represent *active lifetime* container concurrency unless you either (a) add a decrement hook on teardown, or (b) rename/report a different metric (e.g., “containers started” or “concurrent container startups”).

## Fix Focus Areas
- acceptance/profile/profile.go[124-160]
- acceptance/profile/profile.go[225-257]
- acceptance/git/git.go[183-197]
- acceptance/registry/registry.go[112-126]
- acceptance/wiremock/wiremock.go[221-235]

## Suggested fix approach
- Stop counting failed startups: only record container stats and increment any counters after a successful startup.
- Either:
  - **Option A (recommended for correctness):** change the API to support both success/failure and teardown:
    - `BeginContainer(type) (finish func(success bool), stopped func())`
    - `finish(true)` records duration and increments an `activeContainers` gauge; `stopped()` decrements.
  - **Option B:** if no teardown hook exists, rename the metric to match behavior (e.g., `containersStarted`) and remove “active/concurrent” wording from the report.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
acceptance 55.18% <ø> (+<0.01%) ⬆️
generative 17.90% <ø> (ø)
integration 26.65% <ø> (ø)
unit 69.01% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The profiling report written to stderr by go test was not captured in
GitHub Actions job logs. Add a step to print the report file and a step
to upload the profiling directory as an artifact so the data is always
accessible after CI runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant