Skip to content

Improve config reload error reporting with severity-aware task logs#13090

Open
brbzull0 wants to merge 5 commits intoapache:masterfrom
brbzull0:config-reload-error-reporting
Open

Improve config reload error reporting with severity-aware task logs#13090
brbzull0 wants to merge 5 commits intoapache:masterfrom
brbzull0:config-reload-error-reporting

Conversation

@brbzull0
Copy link
Copy Markdown
Contributor

@brbzull0 brbzull0 commented Apr 15, 2026

Unify diagnostic logging (diags.log) with ConfigContext task logs (traffic_ctl config status) via CfgLoad* macros.
Each task log entry now carries a DiagsLevel severity, enabling --min-level filtering in traffic_ctl and severity tags in output ([Note], [Err], etc). Reload summaries are logged to diags.log after a grace period, with detailed per-subtask dumps under the config.reload debug tag.

the entire idea is to get the following sort of visibility:

Successful reload

$ traffic_ctl config status

✔ Reload [success] — rldtk-1739808000000
  Started : 2026 Apr 14 12:00:00.123
  Finished: 2026 Apr 14 12:00:00.368
  Duration: 245ms

  ✔ 11 success  ◌ 0 in-progress  ✗ 0 failed  (11 total)

  Tasks:
   ✔ ip_allow.yaml ·································  18ms
      [Note]  ip_allow.yaml loading ...
      [Note]  ip_allow.yaml finished loading
   ✔ ssl_client_coordinator ·························   2ms
   │  [Note]  SSL configs reloaded
   ├─ ✔ SSLConfig ···································   0ms
   │     [Note]  SSLConfig loading ...
   │     [Dbg]   Reload SSLConfig
   │     [Note]  SSLConfig reloaded
   ├─ ✔ SNIConfig ···································   1ms
   │     [Note]  sni.yaml loading ...
   │     [Note]  sni.yaml finished loading
   └─ ✔ SSLCertificateConfig ························ 119ms
         [Note]  (ssl) ssl_multicert.yaml loading ...
         [Note]  (ssl) ssl_multicert.yaml finished loading
   ✔ ssl_ticket_key ·································   1ms
      [Note]  SSL ticket key loading ...
      [Note]  SSL ticket key reloaded
   ...

Failed reload — SNI parse error

$ traffic_ctl config status -t hotfix-cert

✗ Reload [fail] — hotfix-cert
  Started : 2026 Apr 14 14:30:10.500
  Finished: 2026 Apr 14 14:30:10.810
  Duration: 310ms

  ✔ 9 success  ◌ 0 in-progress  ✗ 2 failed  (11 total)

  Tasks:
   ✔ ip_allow.yaml ·································  18ms
      [Note]  ip_allow.yaml loading ...
      [Note]  ip_allow.yaml finished loading
   ✗ ssl_client_coordinator ·························  85ms  ✗ FAIL
   │  [Note]  SSL configs reloaded
   ├─ ✔ SSLConfig ···································  10ms
   │     [Note]  SSLConfig loading ...
   │     [Dbg]   Reload SSLConfig
   │     [Note]  SSLConfig reloaded
   ├─ ✗ SNIConfig ···································  12ms  ✗ FAIL
   │     [Note]  sni.yaml loading ...
   │     [Err]   sni.yaml failed to load
   └─ ✔ SSLCertificateConfig ························  13ms
         [Note]  (ssl) ssl_multicert.yaml loading ...
         [Note]  (ssl) ssl_multicert.yaml finished loading
   ...

Failed reload — partial SSL cert failures

$ traffic_ctl config status -t ssl-bulk-partial

✗ Reload [fail] — ssl-bulk-partial
  Started : 2026 Apr 14 12:29:19.586
  Finished: 2026 Apr 14 12:29:19.700
  Duration: 114ms

  ✔ 5 success  ◌ 0 in-progress  ✗ 2 failed  (8 total)

  Tasks:
   ✗ ssl_client_coordinator ·························  106ms  ✗ FAIL
   │  [Note]  SSL configs reloaded
   ├─ ✔ SSLConfig ···································    1ms
   │     [Note]  SSLConfig loading ...
   │     [Dbg]   Reload SSLConfig
   │     [Note]  SSLConfig reloaded
   ├─ ✔ SNIConfig ···································    0ms
   │     [Note]  sni.yaml loading ...
   │     [Note]  sni.yaml finished loading
   └─ ✗ SSLCertificateConfig ························  105ms  ✗ FAIL
         [Note]  (ssl) ssl_multicert.yaml loading ...
         [Err]   Failed to load certificate 'cert-05.pem' at item 5
         [Err]   Failed to load certificate 'cert-09.pem' at item 9
         [Err]   Failed to load certificate 'cert-17.pem' at item 17
         [Err]   (ssl) ssl_multicert.yaml failed to load
   ✔ ssl_ticket_key ·································    0ms
      [Note]  SSL ticket key loading ...
      [Note]  SSL ticket key reloaded

--min-level filtering — errors and warnings only

$ traffic_ctl config status -t ssl-bulk-partial --min-level warning

✗ Reload [fail] — ssl-bulk-partial
  ...
  Tasks:
   ✗ ssl_client_coordinator ·························  106ms  ✗ FAIL
   ├─ ✔ SSLConfig ···································    1ms
   ├─ ✔ SNIConfig ···································    0ms
   └─ ✗ SSLCertificateConfig ························  105ms  ✗ FAIL
         [Err]   Failed to load certificate 'cert-05.pem' at item 5
         [Err]   Failed to load certificate 'cert-09.pem' at item 9
         [Err]   Failed to load certificate 'cert-17.pem' at item 17
         [Err]   (ssl) ssl_multicert.yaml failed to load
   ✔ ssl_ticket_key ·································    0ms

diags.log — one-line reload summaries

NOTE: Config reload [my-token] completed successfully: 3 tasks succeeded (3 total)
WARNING: Config reload [my-token] finished with failures: 1 succeeded, 1 failed (3 total) — run: traffic_ctl config status -t my-token

diags.log — detailed dump via config.reload debug tag

DIAG: (config.reload)   [fail] ssl_client_coordinator
DIAG: (config.reload)     [Note] SSL configs reloaded
DIAG: (config.reload)     [success] SSLConfig
DIAG: (config.reload)       [Note] SSLConfig loading ...
DIAG: (config.reload)       [Dbg] Reload SSLConfig
DIAG: (config.reload)       [Note] SSLConfig reloaded
DIAG: (config.reload)     [fail] SNIConfig
DIAG: (config.reload)       [Note] sni.yaml loading ...
DIAG: (config.reload)       [Err] sni.yaml failed to load
DIAG: (config.reload)   [success] ssl_ticket_key
DIAG: (config.reload)     [Note] SSL ticket key loading ...
DIAG: (config.reload)     [Note] SSL ticket key reloaded

Severity tags

Tag Level Source
[Dbg] Debug CfgLoadDbg macro
[Note] Note CfgLoadInProgress, CfgLoadComplete, CfgLoadLog(DL_Note), ctx.in_progress(text), ctx.complete(text)
[Warn] Warning CfgLoadFail(DL_Warning), CfgLoadLog(DL_Warning)
[Err] Error CfgLoadFail(DL_Error), ctx.fail(text)

Fixes: #12963

Damian Meden added 3 commits April 15, 2026 12:12
Unify diagnostic logging (diags.log) with ConfigContext task logs
(traffic_ctl config status) via CfgLoad* macros. Each task log
entry now carries a DiagsLevel severity, enabling --min-level
filtering in traffic_ctl and severity tags in output ([Note],
[Err], etc). Reload summaries are logged to diags.log after a
grace period, with detailed per-subtask dumps under the
config.reload debug tag.

Fixes: apache#12963
@brbzull0 brbzull0 self-assigned this Apr 15, 2026
@brbzull0 brbzull0 changed the title Config reload error reporting Improve config reload error reporting with severity-aware task logs Apr 15, 2026
@brbzull0 brbzull0 added this to the 11.0.0 milestone Apr 15, 2026
@brbzull0 brbzull0 marked this pull request as ready for review April 16, 2026 15:55
@bryancall bryancall requested review from cmcfarlen and Copilot April 20, 2026 22:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR unifies configuration reload diagnostics between diags.log and traffic_ctl config status by introducing severity-aware task logs and shared CfgLoad* logging macros, plus adding CLI filtering for displayed log severities.

Changes:

  • Add CfgLoad* macros to emit a single formatted message to both diags and ConfigContext task logs, including errata-aware failures.
  • Extend reload task logs to carry DiagsLevel severity end-to-end (task storage, YAML encode/decode, traffic_ctl rendering with tags, and --min-level filtering).
  • Add post-reload summary logging (and optional detailed tree dump under the config.reload debug tag) after a grace period.

Reviewed changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/traffic_server/traffic_server.cc Switch records reload reporting to CfgLoad* macros.
src/traffic_ctl/traffic_ctl.cc Add config status --min-level CLI option.
src/traffic_ctl/jsonrpc/ctrl_yaml_codecs.h Decode new structured log entries {level,text} with backward compatibility for string logs.
src/traffic_ctl/jsonrpc/CtrlRPCRequests.h Add LogEntry{DiagsLevel,text} to reload response model.
src/traffic_ctl/CtrlPrinters.h Track printer _min_level for filtering task logs.
src/traffic_ctl/CtrlPrinters.cc Render severity tags and apply --min-level filtering in tree output.
src/traffic_ctl/CtrlCommands.cc Parse --min-level and configure ConfigReloadPrinter.
src/records/unit_tests/test_ConfigReloadTask.cc Update tests for structured log entries (entry.text).
src/proxy/logging/YamlLogConfig.cc Route YAML logging config parse errors through reload context logging macros.
src/proxy/logging/LogConfig.cc Convert logging reconfigure messages to unified reload/diags logging and adjust “missing file” handling.
src/proxy/http/remap/UrlRewrite.cc Thread ConfigContext through remap load/build and emit unified reload log messages.
src/proxy/http/remap/RemapYamlConfig.cc Add ConfigContext to YAML remap parsing and log parse failures via reload macros.
src/proxy/http/remap/RemapConfig.cc Add ConfigContext to remap.config parsing and log parse/open errors via reload macros.
src/proxy/http/PreWarmConfig.cc Minor reload completion message tweak.
src/proxy/ReverseProxy.cc Convert remap reload to pass ConfigContext and use CfgLoad* macros.
src/proxy/ParentSelection.cc Pass ConfigContext into parent table load and use unified logging.
src/proxy/IPAllow.cc Convert ip_allow reload messaging to CfgLoad* macros, including errata-aware failures.
src/proxy/ControlMatcher.cc Add ConfigContext plumb-through for table builds and log parsing errors through reload macros.
src/proxy/CacheControl.cc Pass ConfigContext into cache control table load and use unified logging.
src/mgmt/config/ConfigReloadTrace.cc Add reload summary logging and debug-dump of subtask tree; store severity with log entries.
src/mgmt/config/ConfigContext.cc Add severity-aware ctx.log(level, ...) and ctx.log(errata); assign implicit severities on state messages.
src/iocore/net/quic/QUICConfig.cc Plumb ConfigContext into QUIC config initialization and log SSL init failures into task logs.
src/iocore/net/SSLUtils.cc Improve SSL multicert error detail (include cert name when available).
src/iocore/net/SSLSNIConfig.cc Plumb ConfigContext through SNI reload and emit unified reload messages.
src/iocore/net/SSLConfig.cc Plumb ConfigContext into SSL config and ticket key reload paths; emit severity-tagged task logs.
src/iocore/net/QUICMultiCertConfigLoader.cc Emit QUIC multicert reload progress/fail/complete via CfgLoad* macros and log errata into task logs.
src/iocore/net/P_SSLConfig.h Update SSL config APIs to accept optional ConfigContext.
src/iocore/dns/SplitDNS.cc Convert SplitDNS reload messages to unified macros and pass ConfigContext into table load.
src/iocore/cache/P_CacheHosting.h Plumb ConfigContext into cache hosting table build APIs.
src/iocore/cache/CacheHosting.cc Convert cache hosting load logs to unified macros and accept ConfigContext for better reload visibility.
src/iocore/cache/Cache.cc Pass reload ConfigContext into CacheHostTable build and emit unified completion message.
include/records/YAMLConfigReloadTaskEncoder.h Encode reload logs as structured YAML entries with level and text.
include/proxy/http/remap/UrlRewrite.h Update public signatures to accept optional ConfigContext.
include/proxy/http/remap/RemapYamlConfig.h Add optional ConfigContext to YAML remap parsing interfaces.
include/proxy/http/remap/RemapConfig.h Add optional ConfigContext to remap.config parsing interfaces.
include/proxy/ControlMatcher.h Add optional ConfigContext to matcher build interfaces and constructor.
include/mgmt/config/ConfigReloadTrace.h Change task log storage to LogEntry{level,text} and add summary/dump helpers.
include/mgmt/config/ConfigContextDiags.h New header providing unified diags + reload-task logging macros (CfgLoad*).
include/mgmt/config/ConfigContext.h Add severity-aware logging APIs and include diag-level definitions.
include/iocore/net/quic/QUICConfig.h Update QUIC config initialization API to accept optional ConfigContext.
include/iocore/net/SSLSNIConfig.h Update SNI config initialization APIs to accept optional ConfigContext.
doc/developer-guide/config-reload-framework.en.rst Document unified macros, severity-aware task logs, filtering, and diags summaries/dumps.
doc/appendices/command-line/traffic_ctl.en.rst Document --min-level and updated config status output examples.


.. code-block:: text

NOTE: Config reload [my-token] completed successfully: 3 tasks succeeded (3 total)
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented success summary line here ("completed successfully: 3 tasks succeeded (3 total)") doesn’t match the current implementation in ConfigReloadTask::log_reload_summary() (which logs "completed: %d/%d tasks succeeded"). Please align the documentation with the actual output, or adjust the code to emit the documented message format.

Suggested change
NOTE: Config reload [my-token] completed successfully: 3 tasks succeeded (3 total)
NOTE: Config reload [my-token] completed: 3/3 tasks succeeded

Copilot uses AI. Check for mistakes.
Comment on lines +300 to +306
ConfigReloadTask::log_reload_summary(State final_state)
{
if (!_info.main_task || !is_terminal(final_state) || _summary_logged) {
return;
}
_summary_logged = true;

Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigReloadTask::log_reload_summary reads/writes _info (e.g., _info.main_task, _info.sub_tasks, _info.token) and _summary_logged without holding _mutex. Other methods in this class treat _info as protected by _mutex, and readers (e.g., status queries) can run concurrently, so this risks data races. Consider taking a lock (or copying via get_info() under lock) and making _summary_logged atomic or also guarded by the same lock.

Copilot uses AI. Check for mistakes.
}

if (final_state == State::SUCCESS) {
Note("Config reload [%s] completed: %d/%d tasks succeeded", _info.token.c_str(), success_count, total);
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The success summary log text here ("completed: %d/%d tasks succeeded") does not match the documented example in doc/developer-guide/config-reload-framework.en.rst ("completed successfully: 3 tasks succeeded (3 total)"). Either update the log format to match the docs/examples or adjust the documentation so operators can grep/recognize the exact message.

Suggested change
Note("Config reload [%s] completed: %d/%d tasks succeeded", _info.token.c_str(), success_count, total);
Note("Config reload [%s] completed successfully: %d tasks succeeded (%d total)", _info.token.c_str(), success_count,
total);

Copilot uses AI. Check for mistakes.
Comment on lines +175 to +179
/// Entries from log() carry a DiagsLevel; entries from state-change methods
/// (in_progress, complete, fail) use DL_Undefined (the task state itself
/// conveys severity).
struct LogEntry {
DiagsLevel level{DL_Undefined}; ///< DL_Undefined for state-change messages
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says state-change methods (in_progress/complete/fail) store log entries with DL_Undefined, but ConfigContext now records DL_Note for in_progress/complete messages and DL_Error for fail(reason) messages (see src/mgmt/config/ConfigContext.cc). Please update the comment to reflect the new severity behavior so it stays consistent with the implementation.

Suggested change
/// Entries from log() carry a DiagsLevel; entries from state-change methods
/// (in_progress, complete, fail) use DL_Undefined (the task state itself
/// conveys severity).
struct LogEntry {
DiagsLevel level{DL_Undefined}; ///< DL_Undefined for state-change messages
/// Entries from log() carry a DiagsLevel; state-change methods use
/// DL_Note for in_progress/complete messages and DL_Error for fail()
/// messages.
struct LogEntry {
DiagsLevel level{DL_Undefined}; ///< log() supplies the level; state changes use DL_Note or DL_Error

Copilot uses AI. Check for mistakes.
Comment on lines +271 to +281
if (!min_level.empty()) {
static const std::unordered_map<std::string_view, DiagsLevel> level_map = {
{"debug", DL_Debug },
{"note", DL_Note },
{"warning", DL_Warning},
{"error", DL_Error },
};
if (auto it = level_map.find(min_level); it != level_map.end()) {
_printer->as<ConfigReloadPrinter>()->set_min_level(it->second);
} else {
_printer->write_output("Invalid --min-level value. Use: debug, note, warning, error");
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--min-level parsing is currently case-sensitive (level_map.find(min_level)), but the CLI docs describe the values as case-insensitive. Consider normalizing the input (e.g., lowercase) before lookup (and possibly trimming) so --min-level Warning works as documented.

Copilot uses AI. Check for mistakes.
Emergency("Can't initialize the SSL client, HTTPS in remap rules will not function");
} else {
SSLError("Can't initialize the SSL client, HTTPS in remap rules will not function");
CfgLoadLog(ctx, DL_Warning, "Can't initialize the SSL client, HTTPS in remap rules will not function");
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logs the same message twice to diags: once via SSLError(...) (which already emits to the diagnostics system) and again via CfgLoadLog(...) (which also emits to diags via DiagsError). If the goal is to get the message into the reload task log, consider using ctx.log(DL_Warning, ...) (task-log-only) instead of CfgLoadLog, or otherwise avoid duplicating the diags entry.

Suggested change
CfgLoadLog(ctx, DL_Warning, "Can't initialize the SSL client, HTTPS in remap rules will not function");
ctx.log(DL_Warning, "Can't initialize the SSL client, HTTPS in remap rules will not function");

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ATS Reload migration task. Improve error reporting

2 participants