Remainder of the OTel Exporter changes#43
Conversation
- Add opentelemetry-cpp as submodule and wire into CMake build (built before clickhouse-cpp to avoid duplicate abseil targets) - Add Findabsl.cmake shim so clickhouse-cpp reuses gRPC's abseil - Add GUCs: pg_stat_ch.use_otel, otel_endpoint, hostname - Fix OTel exporter: correct hostname GUC binding, string type fixes, rename factory to MakeOpenTelemetryExporter() - Decouple stats_exporter from clickhouse-cpp (no direct #include) - Update regression test expected output for new GUCs
When system gRPC is installed, OTel's grpc.cmake finds it via find_package(gRPC CONFIG QUIET) and skips FetchContent. This means abseil is never built as a module, so the absl::int128 target doesn't exist when clickhouse-cpp needs it. Set CMAKE_DISABLE_FIND_PACKAGE_gRPC=TRUE so OTel always uses FetchContent for gRPC + abseil, guaranteeing the targets exist.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…Docker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Test both ./configure and Meson paths end-to-end; both pass 10/10 - Switch to Meson (cleaner, no PKG_CONFIG_PATH/bison/flex PATH exports needed) - Meson checks IPC::Run at configure time, so PERL5LIB must be set in shell profile rather than just at test-run time — document this clearly - Drop -j$(nproc) from ninja invocation (not available on macOS; ninja auto-parallelises anyway) - Point testing.md TAP section to docker-setup.md for prerequisites Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This interface is compatible with OpenTelemetry-style logging and aggregation instrumentation. Ideally, this change itself is a no-op.
This is an implementation of `StatsExporter` that emits statistics to OpenTelemetry as a combination of Dimensions (for tag data and string data) and Measurements (for numerical data). All columns are also emitted as log records, whether they are dimensions, measurements, or neither. Additionally, adds OpenTelemetry and absl as modules (courtesy of Claude).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- t/024_otel_export.pl: OTel export integration tests (basic export, batch sizing, immediate flush, metric labels, stats accuracy, connection failure handling) - t/025_otel_reconnect.pl: OTel reconnection and failure recovery tests - t/psch.pm: Add OTel helper functions (psch_init_node_with_otel, psch_otelcol_available, psch_get_otel_histogram_total, etc.) - docker/docker-compose.otel.yml: OTel collector container definition - docker/otel/collector-config.yaml: Collector config with Prometheus exporter on :9091 for test assertions - scripts/run-tests.sh: Add otel test type that auto-starts the collector if not running - mise.toml: Add test:otel, otel:start, otel:stop, otel:logs tasks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- scripts/run-tests.sh: auto-prepend ~/perl5/lib/perl5 to PERL5LIB so cpanm-installed IPC::Run is found without shell profile changes - t/024_otel_export.pl: update metric names to include _unit suffix (OTel SDK appends unit to instrument name); replace target_info service_name assertion with job label check (target_info is not emitted by this collector config) - t/025_otel_reconnect.pl: remove send_failures assertion in the destructive subtest; OTel CommitBatch failures increment the exporter-internal consecutive_failures counter but do not call PschRecordExportFailure, so the shmem counter stays 0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add missing <cstdlib>, <type_traits>, <unistd.h> includes - Fix unsigned comparison warning: use if constexpr(is_signed_v<T>) to guard val < 0 check in HistogramColumn::Append() - Add GUCs: pg_stat_ch.exporter_backend (clickhouse/opentelemetry) and pg_stat_ch.otel_endpoint (default localhost:4317) - Wire GUC into PschExporterInit(), replacing `if (false)` dead code - Fix hostname resolution: use gethostname() with HOSTNAME env fallback - Replace placeholder "description"/"unit" in OTel instrument creation - Document ignored `len` parameter in MetricFixedString - Optimize CounterColumn::Crunch() to avoid copying current_row_tags
This reverts commit c96a007.
There was a problem hiding this comment.
Pull request overview
Adds OpenTelemetry (OTLP gRPC) export support alongside the existing ClickHouse exporter, including new GUCs, build plumbing, Docker-based collector setup, and TAP integration tests.
Changes:
- Add OTel exporter implementation and runtime selection via
pg_stat_ch.use_otel - Add OTel collector Docker/Compose config + new TAP tests for export and reconnect behavior
- Refactor ClickHouse exporter into its own compilation unit; update build system and docs
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| third_party/opentelemetry-cpp | Adds OTel C++ SDK as a submodule dependency. |
| test/regression/expected/guc.out | Updates regression expectations for new OTel/hostname GUCs. |
| t/psch.pm | Adds TAP helper functions for OTel collector + Prometheus assertions. |
| t/024_otel_export.pl | New TAP test for basic OTel export, labels, and stats invariants. |
| t/025_otel_reconnect.pl | New TAP test for collector restart/reconnect behavior. |
| src/export/stats_exporter.cc | Selects exporter at init time and factors ClickHouse exporter out. |
| src/export/otel_exporter.h | Declares the OTel exporter factory. |
| src/export/otel_exporter.cc | Implements OTel logs+metrics export via OTLP gRPC. |
| src/export/clickhouse_exporter.h | Declares the ClickHouse exporter factory. |
| src/export/clickhouse_exporter.cc | Moves ClickHouse exporter implementation out of stats_exporter.cc. |
| src/config/guc.h | Adds new GUC declarations for OTel. |
| src/config/guc.cc | Adds new GUC definitions and registration (use_otel, endpoint, hostname). |
| scripts/run-tests.sh | Adds otel test target and PERL5LIB bootstrapping. |
| mise.toml | Adds mise tasks to run OTel tests and manage collector container. |
| include/config/guc.h | Exposes new OTel GUCs to external includes. |
| docs/testing.md | Updates ClickHouse test docs; references Docker/TAP setup doc. |
| docs/docker-setup.md | Adds Docker + TAP-enabled Postgres setup documentation. |
| docker/otel/collector-config.yaml | Adds collector config for OTLP receiver + Prometheus exporter. |
| docker/docker-compose.otel.yml | Adds compose file for local OTel collector used by tests. |
| cmake/Findabsl.cmake | Adds CMake shim for find_package(absl) with vendored abseil. |
| README.md | Updates testing docs pointers and Docker prerequisites wording. |
| CMakeLists.txt | Builds OTel SDK first (grpc/absl), then clickhouse-cpp; links OTel libs. |
| .gitmodules | Registers OTel C++ SDK submodule. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (val < 0) { | ||
| LogNegativeValue(name, static_cast<int64_t>(val)); | ||
| stash_val = 0; | ||
| } else { |
There was a problem hiding this comment.
LogNegativeValue is called here but there’s no visible declaration/definition in this translation unit (and the diff shows a new LogNegativeValue in stats_exporter.cc without a header declaration). This will fail to compile unless it’s declared via an included header. Consider moving LogNegativeValue into otel_exporter.cc (e.g., in the anonymous namespace) or adding a proper declaration in a shared header and ensuring it links from exactly one TU.
| system("docker compose -f $compose_file up -d") == 0 | ||
| or die "Failed to start OTel collector container"; |
There was a problem hiding this comment.
This uses system with a single string, which invokes a shell and interpolates $compose_file. If PROJECT_DIR (or the path) contains spaces or shell metacharacters, this can break or be exploited. Prefer the multi-arg system('docker','compose','-f',$compose_file,'up','-d') form (and similarly for down -v).
| use psch; | ||
|
|
||
| # Skip if Docker not available | ||
| if (!system("docker ps >/dev/null 2>&1") == 0) { |
There was a problem hiding this comment.
The Docker availability check relies on !system(...) == 0, which is correct but hard to read due to operator precedence. Please rewrite as system(...) != 0 (or capture the exit code explicitly) to avoid confusion and make intent clear. Same pattern appears in t/025_otel_reconnect.pl.
| if (!system("docker ps >/dev/null 2>&1") == 0) { | |
| if (system("docker ps >/dev/null 2>&1") != 0) { |
| string NoStringViews(std::string_view x) { return string{x}; } | ||
|
|
||
| private: | ||
| OTelExporter* exp; | ||
| std::string name; |
There was a problem hiding this comment.
For string-like attributes, OpenTelemetry C++ commonly stores attribute values as non-owning string views. NoStringViews(std::string_view) returns a temporary std::string, which may be converted/stored as an attribute string view and then dangle after the call. Consider storing a stable std::string member (similar to CounterColumn::stash_val) and passing that stable storage to SetAttribute, or use an owning attribute type (if available in your OTel SDK version).
| string NoStringViews(std::string_view x) { return string{x}; } | |
| private: | |
| OTelExporter* exp; | |
| std::string name; | |
| const std::string& NoStringViews(std::string_view x) { | |
| stash_val.assign(x); | |
| return stash_val; | |
| } | |
| private: | |
| OTelExporter* exp; | |
| std::string name; | |
| std::string stash_val; |
| } catch (const std::exception& e) { | ||
| // PschLog(LogLevel::Warning, "pg_stat_ch: OTel init failed: %s", e.what()); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
On initialization failure, the exception is swallowed and the function returns false with no server-visible error. This makes diagnosing endpoint/config issues difficult. Please emit an elog(WARNING, ...) (or your project’s logging helper) with e.what() and consider recording an export failure so send_failures/stats reflect the error.
| | `tap` | Perl TAP tests (stress, concurrent, lifecycle) | PG built with `-Dtap_tests=enabled` ([see below](#tap-tests)) | `./scripts/run-tests.sh <pg_install> tap` | | ||
| | `isolation` | Race condition tests | None | `mise run test:isolation` | | ||
| | `stress` | High-load stress test with pgbench | None | `mise run test:stress` | | ||
| | `clickhouse` | ClickHouse integration tests | Docker | `mise run test:clickhouse` | | ||
| | `clickhouse` | ClickHouse integration tests | Docker + Docker Compose + TAP-enabled PG ([see setup guide](docker-setup.md)) | `./scripts/run-tests.sh ../postgres/install_tap clickhouse` | | ||
| | `all` | Run all tests | None (skips TAP if unavailable) | `mise run test:all` | |
There was a problem hiding this comment.
The docs list clickhouse but not the newly added otel test type (added in scripts/run-tests.sh and mise.toml). Please add an otel row describing prerequisites (Docker + TAP-enabled PG) and how to run it.
docs/docker-setup.md
Outdated
| @@ -0,0 +1,113 @@ | |||
| # Configuring Docker for ClickHouse Tests | |||
There was a problem hiding this comment.
This doc now effectively covers Docker + TAP-enabled PostgreSQL setup that also applies to OTel integration tests. Consider updating the title (and/or introductory paragraph) to reflect that it’s a shared prerequisite doc for both ClickHouse and OTel tests, so readers don’t assume it’s ClickHouse-only.
- Move GetAHostname before the anonymous namespace for external linkage - Remove the def() helper; inline the ternary at its one call site - Add test/unit/hostname_test.cc with 5 GTest cases for GetAHostname - Wire up GoogleTest via FetchContent (opt-in: PSCH_BUILD_UNIT_TESTS=ON) - Add mise task test:unit (builds into build_unit/) - Ignore build_unit/ in .gitignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: David E. Wheeler <46604+theory@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Add missing <cstdint> and <cstdlib> includes to hostname_test.cc - Fix CMakeLists comment: tests need OTel/PG headers but no running services Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| use psch; | ||
|
|
||
| # Skip if Docker not available | ||
| if (!system("docker ps >/dev/null 2>&1") == 0) { |
There was a problem hiding this comment.
This Docker availability check relies on !system(...) == 0, which is easy to misread and depends on operator precedence. Using system(...) != 0 (or reusing psch_otelcol_available() for a single skip path) would be clearer and less error-prone.
| if (!system("docker ps >/dev/null 2>&1") == 0) { | |
| if (system("docker ps >/dev/null 2>&1") != 0) { |
| # Prefix all metric names with "pg_stat_ch_" so that instrument names | ||
| # like "duration_us" become "pg_stat_ch_duration_us_count" etc. |
There was a problem hiding this comment.
The comment says instrument names like duration_us become pg_stat_ch_duration_us_count, but the tests/assertions in this PR query metrics like pg_stat_ch_duration_us_unit_*. Please align this comment with the actual exported Prometheus metric names so future test/debug work isn’t misled.
| # Prefix all metric names with "pg_stat_ch_" so that instrument names | |
| # like "duration_us" become "pg_stat_ch_duration_us_count" etc. | |
| # Prefix all metric names with "pg_stat_ch_"; instrument names like | |
| # "duration_us" are exported as "pg_stat_ch_duration_us_unit_*" etc. |
| my $project_dir = $ENV{PROJECT_DIR} // '.'; | ||
| my $compose_file = "$project_dir/docker/docker-compose.otel.yml"; | ||
|
|
||
| system("docker compose -f $compose_file down -v"); | ||
| } |
There was a problem hiding this comment.
Same as above: system("docker compose -f $compose_file down -v") uses the shell and can mis-handle paths with spaces/metacharacters. Use the multi-arg system form so the compose file path is passed safely as a single argument.
| if ($line =~ /^\Q${metric_base}\E_count(?:\{[^}]*\})?\s+(\d+(?:\.\d+)?)/) { | ||
| $total += $1; |
There was a problem hiding this comment.
The Prometheus text exposition format can emit numbers in scientific notation (e.g., 1e+06) as well as special values (NaN, +Inf). The current regex only matches plain decimals, so this helper can silently return 0/undercount if the formatter uses exponent notation. Consider parsing with a float regex that supports exponents (or using a numeric parser) so the helper is robust.
| if ($line =~ /^\Q${metric_base}\E_count(?:\{[^}]*\})?\s+(\d+(?:\.\d+)?)/) { | |
| $total += $1; | |
| if ($line =~ /^\Q${metric_base}\E_count(?:\{[^}]*\})?\s+([+-]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?|NaN|[+-]?Inf)/) { | |
| my $value = $1; | |
| next if $value eq 'NaN' || $value eq '+Inf' || $value eq '-Inf'; | |
| $total += $value; |
| if ($line =~ /^pg_stat_ch_duration_us_unit_sum(?:\{[^}]*\})?\s+(\d+(?:\.\d+)?)/) { | ||
| $sum += $1; |
There was a problem hiding this comment.
The regex used to parse Prometheus metric values only matches simple decimals, but Prometheus exposition may use scientific notation. If _sum is formatted as 1.23e+04, this will be missed and the test may fail/flap. Consider updating the value parsing to accept exponent notation (and optionally NaN/+Inf).
| if ($line =~ /^pg_stat_ch_duration_us_unit_sum(?:\{[^}]*\})?\s+(\d+(?:\.\d+)?)/) { | |
| $sum += $1; | |
| if ($line =~ /^pg_stat_ch_duration_us_unit_sum(?:\{[^}]*\})?\s+([+-]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?|NaN|[+-]Inf)/) { | |
| my $value = $1; | |
| next if $value eq 'NaN' || $value eq '+Inf' || $value eq '-Inf'; | |
| $sum += $value; |
Contains all remaining OTel Exporter changes, including GUC plumbing (OTelExport branch) and tests.