[fix](cloud) Fix auto-start functionality when encountering TVF and external queries #59963

deardeng · 2026-01-16T07:14:53Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-01-16T07:15:03Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

dataroaring · 2026-01-20T17:49:53Z

run buildall

doris-robot · 2026-01-20T18:26:03Z

TPC-H: Total hot run time: 31165 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 626070b977a32d935d16091cd1657dd893488994, data reload: false

------ Round 1 ----------------------------------
q1	17598	4213	4051	4051
q2	2024	346	237	237
q3	10177	1269	696	696
q4	10208	799	315	315
q5	7554	2069	1794	1794
q6	186	169	135	135
q7	912	794	660	660
q8	9282	1379	1070	1070
q9	4965	4546	4540	4540
q10	6779	1804	1410	1410
q11	515	312	279	279
q12	689	734	596	596
q13	17786	3838	3071	3071
q14	285	290	279	279
q15	584	512	499	499
q16	676	679	641	641
q17	648	776	532	532
q18	6804	6424	6346	6346
q19	1473	967	593	593
q20	372	362	237	237
q21	2937	2394	2210	2210
q22	1056	1027	974	974
Total cold run time: 103510 ms
Total hot run time: 31165 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4155	4079	4031	4031
q2	322	388	330	330
q3	2115	2599	2213	2213
q4	1292	1749	1362	1362
q5	4068	3931	3985	3931
q6	205	172	126	126
q7	1871	1839	1675	1675
q8	2812	2444	2522	2444
q9	7439	7178	7202	7178
q10	2586	2797	2312	2312
q11	556	485	453	453
q12	726	752	643	643
q13	3602	4093	3490	3490
q14	326	342	287	287
q15	537	510	491	491
q16	664	695	656	656
q17	1175	1383	1382	1382
q18	7975	7849	8028	7849
q19	842	807	813	807
q20	1984	2082	1959	1959
q21	4797	4550	4161	4161
q22	1106	1045	976	976
Total cold run time: 51155 ms
Total hot run time: 48756 ms

doris-robot · 2026-01-20T18:36:47Z

TPC-DS: Total hot run time: 173793 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 626070b977a32d935d16091cd1657dd893488994, data reload: false

query5	4439	624	479	479
query6	329	225	213	213
query7	4230	468	264	264
query8	351	250	242	242
query9	8757	2900	2880	2880
query10	500	373	325	325
query11	15187	15154	14953	14953
query12	173	114	112	112
query13	1255	483	377	377
query14	5577	3012	2788	2788
query14_1	2673	2706	2609	2609
query15	198	195	168	168
query16	962	394	457	394
query17	1071	627	543	543
query18	2414	420	318	318
query19	213	214	185	185
query20	115	117	111	111
query21	214	136	115	115
query22	4095	4005	4050	4005
query23	15879	15634	15370	15370
query23_1	15358	15484	15429	15429
query24	7179	1513	1154	1154
query24_1	1142	1159	1193	1159
query25	501	419	383	383
query26	1245	263	148	148
query27	2782	441	273	273
query28	4604	2177	2152	2152
query29	751	546	431	431
query30	322	249	213	213
query31	764	614	558	558
query32	87	79	75	75
query33	531	353	322	322
query34	893	881	523	523
query35	730	776	685	685
query36	882	901	836	836
query37	136	102	90	90
query38	2672	2653	2622	2622
query39	797	784	750	750
query39_1	712	713	722	713
query40	220	139	131	131
query41	74	72	69	69
query42	108	104	103	103
query43	432	424	435	424
query44	1321	748	754	748
query45	189	200	185	185
query46	854	947	573	573
query47	1409	1457	1376	1376
query48	320	334	259	259
query49	626	435	351	351
query50	610	268	205	205
query51	3754	3701	3745	3701
query52	119	108	105	105
query53	287	320	273	273
query54	318	283	267	267
query55	82	85	84	84
query56	321	327	321	321
query57	1029	1005	954	954
query58	282	263	259	259
query59	2045	2245	2068	2068
query60	344	345	329	329
query61	173	171	167	167
query62	389	354	322	322
query63	293	271	265	265
query64	5040	1340	927	927
query65	3888	3742	3738	3738
query66	1442	415	303	303
query67	15522	15517	15448	15448
query68	2398	1095	757	757
query69	446	351	325	325
query70	1003	839	947	839
query71	322	310	280	280
query72	5288	3106	3223	3106
query73	605	721	321	321
query74	8733	8724	8672	8672
query75	2735	2812	2436	2436
query76	2266	1049	664	664
query77	350	398	307	307
query78	9697	9896	9184	9184
query79	1203	906	576	576
query80	1377	571	492	492
query81	545	261	233	233
query82	1012	146	108	108
query83	333	247	240	240
query84	254	116	91	91
query85	886	501	415	415
query86	499	292	287	287
query87	2855	2847	2786	2786
query88	3471	2593	2550	2550
query89	385	346	335	335
query90	1986	173	163	163
query91	171	159	133	133
query92	75	70	69	69
query93	1093	915	537	537
query94	644	316	290	290
query95	576	387	310	310
query96	624	505	232	232
query97	2327	2395	2298	2298
query98	214	212	201	201
query99	619	547	496	496
Total cold run time: 246094 ms
Total hot run time: 173793 ms

doris-robot · 2026-01-20T18:41:47Z

ClickBench: Total hot run time: 26.95 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 626070b977a32d935d16091cd1657dd893488994, data reload: false

query1	0.05	0.04	0.05
query2	0.10	0.04	0.05
query3	0.27	0.08	0.08
query4	1.61	0.12	0.10
query5	0.28	0.25	0.25
query6	1.14	0.66	0.66
query7	0.04	0.03	0.03
query8	0.06	0.04	0.04
query9	0.55	0.48	0.49
query10	0.56	0.56	0.55
query11	0.15	0.10	0.09
query12	0.14	0.11	0.11
query13	0.60	0.58	0.58
query14	0.96	0.95	0.95
query15	0.78	0.78	0.78
query16	0.42	0.42	0.40
query17	1.07	1.07	1.02
query18	0.23	0.21	0.20
query19	1.94	1.80	1.79
query20	0.01	0.02	0.01
query21	15.41	0.27	0.14
query22	4.87	0.06	0.05
query23	15.94	0.28	0.10
query24	1.07	0.64	0.60
query25	0.10	0.04	0.07
query26	0.14	0.13	0.13
query27	0.07	0.06	0.05
query28	4.85	1.06	0.88
query29	12.53	3.86	3.16
query30	0.28	0.13	0.11
query31	2.83	0.61	0.39
query32	3.25	0.55	0.44
query33	2.98	3.00	3.04
query34	15.91	5.11	4.42
query35	4.46	4.44	4.42
query36	0.64	0.50	0.48
query37	0.11	0.07	0.07
query38	0.07	0.04	0.03
query39	0.05	0.03	0.03
query40	0.17	0.15	0.13
query41	0.09	0.04	0.03
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 96.86 s
Total hot run time: 26.95 s

hello-stephen · 2026-01-20T19:04:11Z

FE UT Coverage Report

Increment line coverage 21.49% (26/121) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-01-20T23:04:06Z

FE Regression Coverage Report

Increment line coverage 33.88% (41/121) 🎉
Increment coverage report
Complete coverage report

deardeng

Code Review Summary

This PR addresses an important issue with auto-start functionality when encountering TVF and external queries. The refactoring improves code quality, but there are some critical issues that need to be addressed:

Critical Issues

Semantic Mismatch in Timeout Configuration (RewriteJob.java:45, SimpleJobScheduler.java:45)
- Config.auto_start_wait_to_resume_times represents retry count, not seconds
- Using it directly as a timeout in seconds is semantically incorrect
- While it works with default value (300), it's confusing and error-prone
Warmup Flag Not Reset (CloudSystemInfoService.java:1619)
- The warmup flag is set to true but never reset
- This could cause subsequent queries to be incorrectly treated as warmup queries
- Should use try-finally to ensure cleanup

Positive Aspects

Good code refactoring: extraction of helper methods improves readability
Excellent log optimization to reduce spam
Better use of ThreadLocalRandom for thread safety
Improved error handling with parseClusterStatusOrNull

Minor Issues

Field naming inconsistency in SummaryProfile (isWarmUp vs isWarmup methods)

Please address the critical issues before merging.

fe/fe-core/src/main/java/org/apache/doris/nereids/jobs/scheduler/SimpleJobScheduler.java

fe/fe-core/src/main/java/org/apache/doris/cloud/system/CloudSystemInfoService.java

fe/fe-core/src/main/java/org/apache/doris/common/profile/SummaryProfile.java

fe/fe-core/src/main/java/org/apache/doris/nereids/jobs/rewrite/RewriteJob.java

fe/fe-core/src/main/java/org/apache/doris/cloud/system/CloudSystemInfoService.java

deardeng · 2026-01-27T12:52:45Z

run buildall

doris-robot · 2026-01-27T13:14:49Z

TPC-H: Total hot run time: 32960 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 15be4cfa573ee53bab1da68275af46405228b1c9, data reload: false

------ Round 1 ----------------------------------
q1	17639	5364	5085	5085
q2	2048	319	188	188
q3	10186	1313	754	754
q4	10216	871	319	319
q5	7534	2186	1905	1905
q6	201	185	150	150
q7	870	741	612	612
q8	9255	1397	1063	1063
q9	5257	4826	4803	4803
q10	6750	1968	1573	1573
q11	501	278	272	272
q12	361	371	229	229
q13	17805	4067	3182	3182
q14	234	232	227	227
q15	880	844	810	810
q16	667	671	623	623
q17	630	785	517	517
q18	6781	6514	7381	6514
q19	1114	1037	713	713
q20	450	372	255	255
q21	3059	2264	2170	2170
q22	1105	1111	996	996
Total cold run time: 103543 ms
Total hot run time: 32960 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5596	5543	5598	5543
q2	265	347	261	261
q3	2355	2861	2453	2453
q4	1505	1903	1465	1465
q5	4721	4508	4531	4508
q6	231	178	139	139
q7	2053	2040	1741	1741
q8	2690	2549	2452	2452
q9	7543	7403	7539	7403
q10	2820	2985	2575	2575
q11	541	472	455	455
q12	646	713	592	592
q13	3942	4204	3238	3238
q14	281	286	266	266
q15	841	789	786	786
q16	653	682	638	638
q17	1077	1262	1330	1262
q18	7564	7293	7345	7293
q19	867	850	807	807
q20	1957	2056	1889	1889
q21	4549	4251	4091	4091
q22	1040	1017	955	955
Total cold run time: 53737 ms
Total hot run time: 50812 ms

doris-robot · 2026-01-27T13:31:35Z

ClickBench: Total hot run time: 28.32 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 15be4cfa573ee53bab1da68275af46405228b1c9, data reload: false

query1	0.05	0.04	0.04
query2	0.10	0.05	0.05
query3	0.26	0.08	0.08
query4	1.60	0.11	0.12
query5	0.27	0.25	0.24
query6	1.16	0.69	0.67
query7	0.03	0.03	0.02
query8	0.05	0.04	0.04
query9	0.58	0.50	0.50
query10	0.56	0.55	0.55
query11	0.14	0.10	0.10
query12	0.14	0.10	0.10
query13	0.64	0.61	0.61
query14	1.08	1.05	1.07
query15	0.87	0.87	0.87
query16	0.39	0.39	0.43
query17	1.13	1.11	1.14
query18	0.22	0.21	0.20
query19	2.05	1.98	2.08
query20	0.02	0.01	0.02
query21	15.40	0.26	0.14
query22	5.10	0.04	0.04
query23	16.01	0.27	0.12
query24	1.49	0.36	0.44
query25	0.06	0.05	0.06
query26	0.14	0.13	0.13
query27	0.08	0.04	0.06
query28	3.76	1.16	0.97
query29	12.61	3.89	3.14
query30	0.28	0.13	0.11
query31	2.82	0.64	0.40
query32	3.23	0.59	0.51
query33	3.33	3.23	3.21
query34	16.03	5.33	4.72
query35	4.75	4.81	4.78
query36	0.66	0.50	0.49
query37	0.11	0.07	0.07
query38	0.08	0.04	0.04
query39	0.05	0.03	0.03
query40	0.20	0.17	0.16
query41	0.09	0.03	0.04
query42	0.05	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 97.72 s
Total hot run time: 28.32 s

hello-stephen · 2026-01-27T15:47:36Z

FE Regression Coverage Report

Increment line coverage 33.88% (41/121) 🎉
Increment coverage report
Complete coverage report

github-actions · 2026-02-03T10:43:25Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-02-03T10:43:27Z

PR approved by anyone and no changes requested.

…xternal queries

dataroaring · 2026-02-08T16:43:21Z

run buildall

doris-robot · 2026-02-08T18:06:59Z

TPC-H: Total hot run time: 30771 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e462c91746f52c55a7bf98be67ea3e0127351e1b, data reload: false

------ Round 1 ----------------------------------
q1	17637	4468	4306	4306
q2	2033	361	261	261
q3	10124	1304	726	726
q4	10192	780	306	306
q5	7554	2175	1957	1957
q6	193	178	144	144
q7	869	741	611	611
q8	9271	1388	1148	1148
q9	4738	4663	4653	4653
q10	6840	1929	1540	1540
q11	506	315	289	289
q12	376	374	220	220
q13	17792	4063	3233	3233
q14	240	241	228	228
q15	887	813	796	796
q16	684	709	618	618
q17	688	797	538	538
q18	6886	5813	6259	5813
q19	1197	1054	640	640
q20	536	537	427	427
q21	2879	2039	2054	2039
q22	370	320	278	278
Total cold run time: 102492 ms
Total hot run time: 30771 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4552	4501	4575	4501
q2	274	344	244	244
q3	2402	2822	2399	2399
q4	1419	1893	1433	1433
q5	4689	4733	4798	4733
q6	219	179	138	138
q7	1969	1960	1798	1798
q8	2573	2484	2533	2484
q9	7562	7547	7373	7373
q10	2849	2991	2601	2601
q11	552	476	528	476
q12	864	714	644	644
q13	3848	4402	3266	3266
q14	281	288	260	260
q15	818	788	800	788
q16	663	681	653	653
q17	1079	1271	1321	1271
q18	7423	7449	7460	7449
q19	872	832	817	817
q20	2019	2019	1890	1890
q21	4634	4301	4101	4101
q22	576	554	516	516
Total cold run time: 52137 ms
Total hot run time: 49835 ms

doris-robot · 2026-02-08T18:23:43Z

ClickBench: Total hot run time: 28.41 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e462c91746f52c55a7bf98be67ea3e0127351e1b, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.04	0.04
query3	0.26	0.09	0.08
query4	1.61	0.10	0.11
query5	0.28	0.25	0.25
query6	1.17	0.66	0.66
query7	0.02	0.02	0.02
query8	0.06	0.04	0.04
query9	0.56	0.48	0.49
query10	0.56	0.57	0.54
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.64	0.62	0.61
query14	1.06	1.05	1.06
query15	0.88	0.87	0.87
query16	0.41	0.38	0.40
query17	1.16	1.13	1.17
query18	0.24	0.22	0.22
query19	2.06	2.01	2.06
query20	0.02	0.01	0.02
query21	15.40	0.24	0.15
query22	5.01	0.06	0.05
query23	15.77	0.28	0.10
query24	2.31	0.31	0.31
query25	0.09	0.06	0.06
query26	0.14	0.13	0.13
query27	0.06	0.06	0.06
query28	3.83	1.15	0.97
query29	12.58	3.87	3.15
query30	0.28	0.13	0.11
query31	2.82	0.62	0.41
query32	3.24	0.59	0.49
query33	3.32	3.33	3.29
query34	16.04	5.46	4.71
query35	4.72	4.80	4.82
query36	0.65	0.50	0.50
query37	0.11	0.07	0.07
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.20	0.17	0.15
query41	0.08	0.04	0.03
query42	0.04	0.02	0.03
query43	0.04	0.03	0.04
Total cold run time: 98.26 s
Total hot run time: 28.41 s

hello-stephen · 2026-02-08T21:32:59Z

FE Regression Coverage Report

Increment line coverage 33.88% (41/121) 🎉
Increment coverage report
Complete coverage report

deardeng requested review from dataroaring, gavinchou and w41ter as code owners January 16, 2026 07:14

gavinchou added dev/3.1.x dev/4.0.x labels Jan 21, 2026

deardeng commented Jan 27, 2026

View reviewed changes

gavinchou approved these changes Feb 3, 2026

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 3, 2026

github-actions bot added the reviewed label Feb 3, 2026

deardeng added 3 commits February 8, 2026 08:43

[fix](cloud) Fix auto-start functionality when encountering TVF and e…

f97025c

…xternal queries

fix

6d454ae

add tvf case

e462c91

dataroaring force-pushed the fix-auto-start-1 branch from 15be4cf to e462c91 Compare February 8, 2026 16:43

[fix](cloud) Fix auto-start functionality when encountering TVF and external queries #59963

Are you sure you want to change the base?

[fix](cloud) Fix auto-start functionality when encountering TVF and external queries #59963

Conversation

deardeng commented Jan 16, 2026

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Jan 16, 2026

Uh oh!

dataroaring commented Jan 20, 2026

Uh oh!

doris-robot commented Jan 20, 2026

Uh oh!

doris-robot commented Jan 20, 2026

Uh oh!

doris-robot commented Jan 20, 2026

Uh oh!

hello-stephen commented Jan 20, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Jan 20, 2026

FE Regression Coverage Report

Uh oh!

deardeng left a comment

Choose a reason for hiding this comment

Code Review Summary

Critical Issues

Positive Aspects

Minor Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deardeng commented Jan 27, 2026

Uh oh!

doris-robot commented Jan 27, 2026

Uh oh!

doris-robot commented Jan 27, 2026

Uh oh!

hello-stephen commented Jan 27, 2026

FE Regression Coverage Report

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

dataroaring commented Feb 8, 2026

Uh oh!

doris-robot commented Feb 8, 2026

Uh oh!

doris-robot commented Feb 8, 2026

Uh oh!

hello-stephen commented Feb 8, 2026

FE Regression Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants