Skip to content

Conversation

@freemandealer
Copy link
Contributor

@freemandealer freemandealer commented Jan 16, 2026

Use shared_ptr instead of a raw pointer to extend the lifespan of the ioctx until the readers are destroyed
to avoid use-after-free crash.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

use shared_ptr instead of raw pointer to manage lifespan
of ioctx in storage layer.

Signed-off-by: zhengyu <[email protected]>
@Thearas
Copy link
Contributor

Thearas commented Jan 16, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@freemandealer
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32771 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b6e8f6952cfb1184fa62c9bae14b599a3f43c277, data reload: false

------ Round 1 ----------------------------------
q1	17616	4228	4074	4074
q2	2154	407	251	251
q3	9968	1267	740	740
q4	10210	842	318	318
q5	7584	2165	1953	1953
q6	202	182	141	141
q7	945	810	663	663
q8	9278	1499	1206	1206
q9	5034	4613	4560	4560
q10	6746	1817	1420	1420
q11	541	321	290	290
q12	705	761	614	614
q13	17783	3912	3157	3157
q14	296	310	272	272
q15	592	530	505	505
q16	712	706	638	638
q17	742	803	548	548
q18	6890	6594	7192	6594
q19	1197	1079	744	744
q20	471	416	281	281
q21	3416	2748	2777	2748
q22	1166	1108	1054	1054
Total cold run time: 104248 ms
Total hot run time: 32771 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4577	4361	4411	4361
q2	335	410	346	346
q3	2319	2824	2519	2519
q4	1477	1955	1422	1422
q5	4498	4430	4351	4351
q6	226	167	123	123
q7	1982	1973	1895	1895
q8	2658	2471	2423	2423
q9	7336	7409	7084	7084
q10	2527	2799	2306	2306
q11	567	473	464	464
q12	724	750	625	625
q13	3604	3845	3089	3089
q14	271	276	274	274
q15	525	487	469	469
q16	622	690	626	626
q17	1173	1273	1316	1273
q18	7308	7371	7445	7371
q19	910	859	872	859
q20	1881	2032	1818	1818
q21	4603	4363	4206	4206
q22	1093	1054	1007	1007
Total cold run time: 51216 ms
Total hot run time: 48911 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 177988 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b6e8f6952cfb1184fa62c9bae14b599a3f43c277, data reload: false

query5	4846	645	528	528
query6	385	247	225	225
query7	4244	483	274	274
query8	371	269	262	262
query9	8771	3167	3232	3167
query10	498	447	337	337
query11	15320	15191	14955	14955
query12	187	124	117	117
query13	1276	527	420	420
query14	6378	3167	2962	2962
query14_1	2882	2671	2792	2671
query15	212	196	181	181
query16	996	519	498	498
query17	1099	713	597	597
query18	2613	466	357	357
query19	239	232	200	200
query20	137	119	121	119
query21	227	147	131	131
query22	3990	4200	4139	4139
query23	16305	15725	15406	15406
query23_1	15567	15633	15563	15563
query24	7193	1585	1206	1206
query24_1	1241	1220	1213	1213
query25	568	481	436	436
query26	1303	276	167	167
query27	2754	461	298	298
query28	4502	2349	2362	2349
query29	759	567	440	440
query30	330	251	214	214
query31	847	650	598	598
query32	94	78	78	78
query33	556	367	318	318
query34	955	920	566	566
query35	759	805	672	672
query36	958	975	907	907
query37	159	104	93	93
query38	2738	2683	2726	2683
query39	775	765	769	765
query39_1	720	754	709	709
query40	221	143	123	123
query41	73	68	68	68
query42	111	107	113	107
query43	491	510	453	453
query44	1418	806	804	804
query45	192	188	184	184
query46	894	974	608	608
query47	1504	1549	1399	1399
query48	339	350	265	265
query49	614	453	359	359
query50	666	285	221	221
query51	3812	3799	3830	3799
query52	106	109	139	109
query53	293	330	278	278
query54	310	282	263	263
query55	90	84	84	84
query56	337	327	334	327
query57	1084	1022	947	947
query58	281	272	266	266
query59	2325	2356	2301	2301
query60	352	362	341	341
query61	166	162	161	161
query62	413	365	324	324
query63	306	273	259	259
query64	4851	1328	1028	1028
query65	3829	3766	3768	3766
query66	1441	450	337	337
query67	15813	15756	15673	15673
query68	2612	1176	832	832
query69	470	373	347	347
query70	1051	990	936	936
query71	337	323	309	309
query72	5454	3373	3597	3373
query73	684	757	356	356
query74	8883	8829	8619	8619
query75	2820	2868	2442	2442
query76	2311	1090	694	694
query77	383	405	326	326
query78	10034	10093	9265	9265
query79	1327	973	623	623
query80	1367	607	520	520
query81	574	268	240	240
query82	1029	152	117	117
query83	376	269	260	260
query84	267	124	108	108
query85	938	538	478	478
query86	437	284	318	284
query87	2906	2878	2765	2765
query88	3884	2882	2839	2839
query89	392	375	350	350
query90	1930	186	170	170
query91	191	189	148	148
query92	84	74	71	71
query93	1214	971	591	591
query94	663	331	307	307
query95	611	353	324	324
query96	735	539	254	254
query97	2358	2423	2351	2351
query98	214	209	204	204
query99	620	604	556	556
Total cold run time: 252603 ms
Total hot run time: 177988 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.5 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b6e8f6952cfb1184fa62c9bae14b599a3f43c277, data reload: false

query1	0.06	0.05	0.05
query2	0.12	0.05	0.05
query3	0.26	0.09	0.09
query4	1.61	0.12	0.11
query5	0.28	0.25	0.27
query6	1.15	0.68	0.65
query7	0.04	0.02	0.02
query8	0.06	0.04	0.04
query9	0.59	0.50	0.50
query10	0.58	0.56	0.57
query11	0.16	0.10	0.11
query12	0.16	0.12	0.12
query13	0.60	0.59	0.58
query14	0.96	0.95	0.96
query15	0.81	0.80	0.80
query16	0.40	0.42	0.41
query17	1.06	1.11	1.01
query18	0.24	0.22	0.21
query19	1.99	1.77	1.85
query20	0.02	0.01	0.02
query21	15.46	0.30	0.14
query22	5.19	0.06	0.05
query23	16.10	0.28	0.11
query24	1.07	0.63	0.60
query25	0.09	0.08	0.07
query26	0.16	0.15	0.15
query27	0.07	0.06	0.05
query28	3.25	1.09	0.91
query29	12.51	4.15	3.34
query30	0.28	0.14	0.13
query31	2.82	0.66	0.40
query32	3.25	0.57	0.47
query33	3.09	3.03	3.09
query34	16.19	5.07	4.44
query35	4.54	4.44	4.46
query36	0.67	0.52	0.52
query37	0.12	0.07	0.06
query38	0.08	0.05	0.04
query39	0.06	0.03	0.03
query40	0.18	0.14	0.14
query41	0.09	0.03	0.03
query42	0.05	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 96.52 s
Total hot run time: 27.5 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 17.53% (27/154) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.99% (19000/35859)
Line Coverage 39.03% (176089/451166)
Region Coverage 33.60% (136290/405675)
Branch Coverage 34.66% (58971/170157)

@dataroaring
Copy link
Contributor

Code Review: fix fix IOContext Use-After-Free

Summary

This PR addresses a use-after-free crash by changing IOContext lifetime management from raw pointers to shared_ptr. The core issue is that IOContext objects were being destroyed while file readers still held references to them.

★ Insight ─────────────────────────────────────
This is a classic C++ lifetime management problem. When async operations (like prefetch) capture raw pointers to stack-allocated objects, the objects may be destroyed before the async operation completes. Using shared_ptr with weak capture or explicit lifetime extension is the standard solution.
─────────────────────────────────────────────────


Critical Issues

1. Incomplete Fix - IOContext Contains Pointers (HIGH)

The backward-compatible API copies IOContext to the heap:

// buffered_reader.cpp:853-856
if (io_ctx != nullptr) {
    // Old API: best-effort safety by copying the IOContext onto the heap.
    io_ctx_holder = std::make_shared<IOContext>(*io_ctx);
}

However, IOContext contains pointer members that point to other objects:

// From io_common.h (inferred from internal_service.cpp changes)
io_ctx->file_cache_stats = &file_cache_statis;  // raw pointer
io_ctx->file_reader_stats = &file_reader_stats; // raw pointer

Copying IOContext copies these pointers, not the objects they point to. If the original FileCacheStatistics or FileReaderStats objects are destroyed, the copied IOContext now has dangling pointers.

Evidence: The internal_service.cpp change shows these need to also be heap-allocated:

auto file_cache_statis = std::make_shared<io::FileCacheStatistics>();
auto file_reader_stats = std::make_shared<io::FileReaderStats>();
io_ctx->file_cache_stats = file_cache_statis.get();
io_ctx->file_reader_stats = file_reader_stats.get();

Recommendation:

  • Document that the "old API" copy is NOT safe if IOContext contains pointers to stack-allocated stats objects
  • Consider making IOContext store shared_ptr to stats instead of raw pointers
  • Or add validation/assertion in the copy path

2. Dual-Ownership Pattern is Error-Prone (MEDIUM)

The pattern of keeping both _io_ctx_holder (shared_ptr) and _io_ctx (raw pointer) is fragile:

std::shared_ptr<const IOContext> _io_ctx_holder;
const IOContext* _io_ctx = nullptr;

With initialization like:

_io_ctx_holder(std::move(io_ctx_holder)),
_io_ctx(_io_ctx_holder.get()),

Risk: If _io_ctx_holder is moved from or reset, _io_ctx becomes dangling.

Recommendation: Either:

  • Always access through _io_ctx_holder.get() (no raw pointer member), or
  • Use a getter method that returns _io_ctx_holder.get()

3. Inconsistent Constructor Patterns (MEDIUM)

The constructors have inconsistent handling:

CsvReader:

CsvReader(..., io::IOContext* io_ctx, std::shared_ptr<io::IOContext> io_ctx_holder = nullptr)
    : ..., _io_ctx(io_ctx), _io_ctx_holder(std::move(io_ctx_holder)) {
    if (_io_ctx == nullptr && _io_ctx_holder) {
        _io_ctx = _io_ctx_holder.get();
    }
}

This allows callers to pass mismatched parameters (raw pointer to one object, shared_ptr to another).

Recommendation: Use a single parameter - either always shared_ptr or a factory method that handles both cases.


Code Quality Issues

4. Code Duplication in _create_file_reader

Multiple files have this duplicated pattern:

io::FileReaderSPtr file_reader;
if (_io_ctx_holder) {
    file_reader = DORIS_TRY(io::DelegateReader::create_file_reader(
            ..., std::static_pointer_cast<const io::IOContext>(_io_ctx_holder), ...));
} else {
    file_reader = DORIS_TRY(io::DelegateReader::create_file_reader(
            ..., _io_ctx, ...));
}

Recommendation: Centralize this in a helper or always use the shared_ptr overload (which already handles nullptr).

5. Missing Documentation

The PR description lacks:

  • Issue number (what crash was observed?)
  • Stack trace or reproduction steps
  • Which code paths triggered the UAF

Recommendation: Add details for future debugging reference.


Positive Aspects

  1. Backward Compatibility: The old API signature is preserved with best-effort safety
  2. Consistent Pattern: All file readers (CSV, JSON, ORC, Parquet) are updated
  3. Benchmark Results: No performance regression observed (TPC-H, TPC-DS, ClickBench results look normal)

Suggestions

Short-term

  1. Add comments explaining when to use shared_ptr vs raw pointer API
  2. Add assertion in copy path to warn about non-null stats pointers
  3. Consider deprecating the raw pointer API

Long-term

  1. Refactor IOContext to own its stats objects via shared_ptr
  2. Remove the dual-pointer pattern in favor of single shared_ptr member

Verdict

The fix addresses the immediate UAF issue, but the solution has gaps that could lead to similar issues:

  1. The "copy IOContext" backward-compat path doesn't deep-copy pointer members - this can still cause UAF if stats objects are stack-allocated
  2. The dual-ownership pattern (_io_ctx + _io_ctx_holder) is fragile

Recommendation: Address issue #1 before merging, or at minimum add clear documentation that the old API requires heap-allocated stats objects.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 21, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@freemandealer
Copy link
Contributor Author

run NonConcurrent

@freemandealer
Copy link
Contributor Author

run nonConcurrent

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 68.39% (106/155) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.10% (25337/35142)
Line Coverage 58.88% (265325/450584)
Region Coverage 53.64% (220024/410186)
Branch Coverage 55.35% (94600/170918)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gavinchou gavinchou merged commit 7b91c49 into apache:master Jan 30, 2026
32 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.x dev/4.0.x dev/4.0.x-conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants