Skip to content

Remainder of the OTel Exporter changes#43

Merged
JoshDreamland merged 43 commits intomainfrom
OTelExport-Tests
Mar 5, 2026
Merged

Remainder of the OTel Exporter changes#43
JoshDreamland merged 43 commits intomainfrom
OTelExport-Tests

Conversation

@JoshDreamland
Copy link
Copy Markdown
Contributor

Contains all remaining OTel Exporter changes, including GUC plumbing (OTelExport branch) and tests.

JoshDreamland and others added 24 commits February 24, 2026 11:55
- Add opentelemetry-cpp as submodule and wire into CMake build
  (built before clickhouse-cpp to avoid duplicate abseil targets)
- Add Findabsl.cmake shim so clickhouse-cpp reuses gRPC's abseil
- Add GUCs: pg_stat_ch.use_otel, otel_endpoint, hostname
- Fix OTel exporter: correct hostname GUC binding, string type fixes,
  rename factory to MakeOpenTelemetryExporter()
- Decouple stats_exporter from clickhouse-cpp (no direct #include)
- Update regression test expected output for new GUCs
When system gRPC is installed, OTel's grpc.cmake finds it via
find_package(gRPC CONFIG QUIET) and skips FetchContent. This means
abseil is never built as a module, so the absl::int128 target
doesn't exist when clickhouse-cpp needs it.

Set CMAKE_DISABLE_FIND_PACKAGE_gRPC=TRUE so OTel always uses
FetchContent for gRPC + abseil, guaranteeing the targets exist.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…Docker

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Test both ./configure and Meson paths end-to-end; both pass 10/10
- Switch to Meson (cleaner, no PKG_CONFIG_PATH/bison/flex PATH exports needed)
- Meson checks IPC::Run at configure time, so PERL5LIB must be set in shell
  profile rather than just at test-run time — document this clearly
- Drop -j$(nproc) from ninja invocation (not available on macOS; ninja
  auto-parallelises anyway)
- Point testing.md TAP section to docker-setup.md for prerequisites

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This interface is compatible with OpenTelemetry-style logging and aggregation instrumentation.

Ideally, this change itself is a no-op.
This is an implementation of `StatsExporter` that emits statistics to OpenTelemetry as a combination of Dimensions (for tag data and string data) and Measurements (for numerical data). All columns are also emitted as log records, whether they are dimensions, measurements, or neither.

Additionally, adds OpenTelemetry and absl as modules (courtesy of Claude).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- t/024_otel_export.pl: OTel export integration tests (basic export,
  batch sizing, immediate flush, metric labels, stats accuracy,
  connection failure handling)
- t/025_otel_reconnect.pl: OTel reconnection and failure recovery tests
- t/psch.pm: Add OTel helper functions (psch_init_node_with_otel,
  psch_otelcol_available, psch_get_otel_histogram_total, etc.)
- docker/docker-compose.otel.yml: OTel collector container definition
- docker/otel/collector-config.yaml: Collector config with Prometheus
  exporter on :9091 for test assertions
- scripts/run-tests.sh: Add otel test type that auto-starts the
  collector if not running
- mise.toml: Add test:otel, otel:start, otel:stop, otel:logs tasks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- scripts/run-tests.sh: auto-prepend ~/perl5/lib/perl5 to PERL5LIB
  so cpanm-installed IPC::Run is found without shell profile changes
- t/024_otel_export.pl: update metric names to include _unit suffix
  (OTel SDK appends unit to instrument name); replace target_info
  service_name assertion with job label check (target_info is not
  emitted by this collector config)
- t/025_otel_reconnect.pl: remove send_failures assertion in the
  destructive subtest; OTel CommitBatch failures increment the
  exporter-internal consecutive_failures counter but do not call
  PschRecordExportFailure, so the shmem counter stays 0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add missing <cstdlib>, <type_traits>, <unistd.h> includes
- Fix unsigned comparison warning: use if constexpr(is_signed_v<T>)
  to guard val < 0 check in HistogramColumn::Append()
- Add GUCs: pg_stat_ch.exporter_backend (clickhouse/opentelemetry)
  and pg_stat_ch.otel_endpoint (default localhost:4317)
- Wire GUC into PschExporterInit(), replacing `if (false)` dead code
- Fix hostname resolution: use gethostname() with HOSTNAME env fallback
- Replace placeholder "description"/"unit" in OTel instrument creation
- Document ignored `len` parameter in MetricFixedString
- Optimize CounterColumn::Crunch() to avoid copying current_row_tags
Copilot AI review requested due to automatic review settings March 4, 2026 15:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenTelemetry (OTLP gRPC) export support alongside the existing ClickHouse exporter, including new GUCs, build plumbing, Docker-based collector setup, and TAP integration tests.

Changes:

  • Add OTel exporter implementation and runtime selection via pg_stat_ch.use_otel
  • Add OTel collector Docker/Compose config + new TAP tests for export and reconnect behavior
  • Refactor ClickHouse exporter into its own compilation unit; update build system and docs

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
third_party/opentelemetry-cpp Adds OTel C++ SDK as a submodule dependency.
test/regression/expected/guc.out Updates regression expectations for new OTel/hostname GUCs.
t/psch.pm Adds TAP helper functions for OTel collector + Prometheus assertions.
t/024_otel_export.pl New TAP test for basic OTel export, labels, and stats invariants.
t/025_otel_reconnect.pl New TAP test for collector restart/reconnect behavior.
src/export/stats_exporter.cc Selects exporter at init time and factors ClickHouse exporter out.
src/export/otel_exporter.h Declares the OTel exporter factory.
src/export/otel_exporter.cc Implements OTel logs+metrics export via OTLP gRPC.
src/export/clickhouse_exporter.h Declares the ClickHouse exporter factory.
src/export/clickhouse_exporter.cc Moves ClickHouse exporter implementation out of stats_exporter.cc.
src/config/guc.h Adds new GUC declarations for OTel.
src/config/guc.cc Adds new GUC definitions and registration (use_otel, endpoint, hostname).
scripts/run-tests.sh Adds otel test target and PERL5LIB bootstrapping.
mise.toml Adds mise tasks to run OTel tests and manage collector container.
include/config/guc.h Exposes new OTel GUCs to external includes.
docs/testing.md Updates ClickHouse test docs; references Docker/TAP setup doc.
docs/docker-setup.md Adds Docker + TAP-enabled Postgres setup documentation.
docker/otel/collector-config.yaml Adds collector config for OTLP receiver + Prometheus exporter.
docker/docker-compose.otel.yml Adds compose file for local OTel collector used by tests.
cmake/Findabsl.cmake Adds CMake shim for find_package(absl) with vendored abseil.
README.md Updates testing docs pointers and Docker prerequisites wording.
CMakeLists.txt Builds OTel SDK first (grpc/absl), then clickhouse-cpp; links OTel libs.
.gitmodules Registers OTel C++ SDK submodule.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +155 to +158
if (val < 0) {
LogNegativeValue(name, static_cast<int64_t>(val));
stash_val = 0;
} else {
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogNegativeValue is called here but there’s no visible declaration/definition in this translation unit (and the diff shows a new LogNegativeValue in stats_exporter.cc without a header declaration). This will fail to compile unless it’s declared via an included header. Consider moving LogNegativeValue into otel_exporter.cc (e.g., in the anonymous namespace) or adding a proper declaration in a shared header and ensuring it links from exactly one TU.

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +195
system("docker compose -f $compose_file up -d") == 0
or die "Failed to start OTel collector container";
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses system with a single string, which invokes a shell and interpolates $compose_file. If PROJECT_DIR (or the path) contains spaces or shell metacharacters, this can break or be exploited. Prefer the multi-arg system('docker','compose','-f',$compose_file,'up','-d') form (and similarly for down -v).

Copilot uses AI. Check for mistakes.
use psch;

# Skip if Docker not available
if (!system("docker ps >/dev/null 2>&1") == 0) {
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker availability check relies on !system(...) == 0, which is correct but hard to read due to operator precedence. Please rewrite as system(...) != 0 (or capture the exit code explicitly) to avoid confusion and make intent clear. Same pattern appears in t/025_otel_reconnect.pl.

Suggested change
if (!system("docker ps >/dev/null 2>&1") == 0) {
if (system("docker ps >/dev/null 2>&1") != 0) {

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +220
string NoStringViews(std::string_view x) { return string{x}; }

private:
OTelExporter* exp;
std::string name;
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For string-like attributes, OpenTelemetry C++ commonly stores attribute values as non-owning string views. NoStringViews(std::string_view) returns a temporary std::string, which may be converted/stored as an attribute string view and then dangle after the call. Consider storing a stable std::string member (similar to CounterColumn::stash_val) and passing that stable storage to SetAttribute, or use an owning attribute type (if available in your OTel SDK version).

Suggested change
string NoStringViews(std::string_view x) { return string{x}; }
private:
OTelExporter* exp;
std::string name;
const std::string& NoStringViews(std::string_view x) {
stash_val.assign(x);
return stash_val;
}
private:
OTelExporter* exp;
std::string name;
std::string stash_val;

Copilot uses AI. Check for mistakes.
Comment on lines +337 to +340
} catch (const std::exception& e) {
// PschLog(LogLevel::Warning, "pg_stat_ch: OTel init failed: %s", e.what());
return false;
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On initialization failure, the exception is swallowed and the function returns false with no server-visible error. This makes diagnosing endpoint/config issues difficult. Please emit an elog(WARNING, ...) (or your project’s logging helper) with e.what() and consider recording an export failure so send_failures/stats reflect the error.

Copilot uses AI. Check for mistakes.
Comment on lines 8 to 12
| `tap` | Perl TAP tests (stress, concurrent, lifecycle) | PG built with `-Dtap_tests=enabled` ([see below](#tap-tests)) | `./scripts/run-tests.sh <pg_install> tap` |
| `isolation` | Race condition tests | None | `mise run test:isolation` |
| `stress` | High-load stress test with pgbench | None | `mise run test:stress` |
| `clickhouse` | ClickHouse integration tests | Docker | `mise run test:clickhouse` |
| `clickhouse` | ClickHouse integration tests | Docker + Docker Compose + TAP-enabled PG ([see setup guide](docker-setup.md)) | `./scripts/run-tests.sh ../postgres/install_tap clickhouse` |
| `all` | Run all tests | None (skips TAP if unavailable) | `mise run test:all` |
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs list clickhouse but not the newly added otel test type (added in scripts/run-tests.sh and mise.toml). Please add an otel row describing prerequisites (Docker + TAP-enabled PG) and how to run it.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,113 @@
# Configuring Docker for ClickHouse Tests
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc now effectively covers Docker + TAP-enabled PostgreSQL setup that also applies to OTel integration tests. Consider updating the title (and/or introductory paragraph) to reflect that it’s a shared prerequisite doc for both ClickHouse and OTel tests, so readers don’t assume it’s ClickHouse-only.

Copilot uses AI. Check for mistakes.
JoshDreamland and others added 12 commits March 4, 2026 12:07
- Move GetAHostname before the anonymous namespace for external linkage
- Remove the def() helper; inline the ternary at its one call site
- Add test/unit/hostname_test.cc with 5 GTest cases for GetAHostname
- Wire up GoogleTest via FetchContent (opt-in: PSCH_BUILD_UNIT_TESTS=ON)
- Add mise task test:unit (builds into build_unit/)
- Ignore build_unit/ in .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: David E. Wheeler <46604+theory@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Add missing <cstdint> and <cstdlib> includes to hostname_test.cc
- Fix CMakeLists comment: tests need OTel/PG headers but no running services

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 4, 2026 20:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

use psch;

# Skip if Docker not available
if (!system("docker ps >/dev/null 2>&1") == 0) {
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Docker availability check relies on !system(...) == 0, which is easy to misread and depends on operator precedence. Using system(...) != 0 (or reusing psch_otelcol_available() for a single skip path) would be clearer and less error-prone.

Suggested change
if (!system("docker ps >/dev/null 2>&1") == 0) {
if (system("docker ps >/dev/null 2>&1") != 0) {

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +12
# Prefix all metric names with "pg_stat_ch_" so that instrument names
# like "duration_us" become "pg_stat_ch_duration_us_count" etc.
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says instrument names like duration_us become pg_stat_ch_duration_us_count, but the tests/assertions in this PR query metrics like pg_stat_ch_duration_us_unit_*. Please align this comment with the actual exported Prometheus metric names so future test/debug work isn’t misled.

Suggested change
# Prefix all metric names with "pg_stat_ch_" so that instrument names
# like "duration_us" become "pg_stat_ch_duration_us_count" etc.
# Prefix all metric names with "pg_stat_ch_"; instrument names like
# "duration_us" are exported as "pg_stat_ch_duration_us_unit_*" etc.

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +212
my $project_dir = $ENV{PROJECT_DIR} // '.';
my $compose_file = "$project_dir/docker/docker-compose.otel.yml";

system("docker compose -f $compose_file down -v");
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: system("docker compose -f $compose_file down -v") uses the shell and can mis-handle paths with spaces/metacharacters. Use the multi-arg system form so the compose file path is passed safely as a single argument.

Copilot uses AI. Check for mistakes.
Comment on lines +256 to +257
if ($line =~ /^\Q${metric_base}\E_count(?:\{[^}]*\})?\s+(\d+(?:\.\d+)?)/) {
$total += $1;
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Prometheus text exposition format can emit numbers in scientific notation (e.g., 1e+06) as well as special values (NaN, +Inf). The current regex only matches plain decimals, so this helper can silently return 0/undercount if the formatter uses exponent notation. Consider parsing with a float regex that supports exponents (or using a numeric parser) so the helper is robust.

Suggested change
if ($line =~ /^\Q${metric_base}\E_count(?:\{[^}]*\})?\s+(\d+(?:\.\d+)?)/) {
$total += $1;
if ($line =~ /^\Q${metric_base}\E_count(?:\{[^}]*\})?\s+([+-]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?|NaN|[+-]?Inf)/) {
my $value = $1;
next if $value eq 'NaN' || $value eq '+Inf' || $value eq '-Inf';
$total += $value;

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +118
if ($line =~ /^pg_stat_ch_duration_us_unit_sum(?:\{[^}]*\})?\s+(\d+(?:\.\d+)?)/) {
$sum += $1;
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex used to parse Prometheus metric values only matches simple decimals, but Prometheus exposition may use scientific notation. If _sum is formatted as 1.23e+04, this will be missed and the test may fail/flap. Consider updating the value parsing to accept exponent notation (and optionally NaN/+Inf).

Suggested change
if ($line =~ /^pg_stat_ch_duration_us_unit_sum(?:\{[^}]*\})?\s+(\d+(?:\.\d+)?)/) {
$sum += $1;
if ($line =~ /^pg_stat_ch_duration_us_unit_sum(?:\{[^}]*\})?\s+([+-]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?|NaN|[+-]Inf)/) {
my $value = $1;
next if $value eq 'NaN' || $value eq '+Inf' || $value eq '-Inf';
$sum += $value;

Copilot uses AI. Check for mistakes.
@JoshDreamland JoshDreamland merged commit 00b7ada into main Mar 5, 2026
11 checks passed
@JoshDreamland JoshDreamland deleted the OTelExport-Tests branch March 5, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants