Skip to content

HIVE-28773: CBO fails when a materialized view is dropped but its pre-compiled plan remains in the registry.#6322

Open
kasakrisz wants to merge 11 commits intoapache:masterfrom
kasakrisz:HIVE-28773-master-mv-reg2
Open

HIVE-28773: CBO fails when a materialized view is dropped but its pre-compiled plan remains in the registry.#6322
kasakrisz wants to merge 11 commits intoapache:masterfrom
kasakrisz:HIVE-28773-master-mv-reg2

Conversation

@kasakrisz
Copy link
Contributor

What changes were proposed in this pull request?

  • The materialized view registry is initialized by loading all MVs from HMS that are enabled for query rewrite and precompiled. Later, a scheduled thread refreshes the registry periodically in the same way. This patch adds logic to remove MVs from the registry that no longer exist in HMS or have been disabled for query rewrite since the last refresh.
  • Rename MaterializedViewCache to MaterializedViewMap because it is not a cache.
  • Modify the way MVs are stored in the map: remove the database grouping and add the database name as a key prefix to the MV name. Since all MVs are queried and added to the planner during query rewrite, this approach aligns better with the most common function of the registry/map.

Why are the changes needed?

Dropped MVs may casue CBO failure. See jira HIVE-28773 for details

Does this PR introduce any user-facing change?

Yes. CBO failure is intermittent in clusters with multiple HS2s and depending on the query the whole compilation process could fail. This patch fixes these intermittent issues.

Is the change a dependency upgrade?

No.

How was this patch tested?

mvn test -Dtest=TestQueryRewrite,TestHiveMaterializedViewRegistry -pl itests/hive-unit -Pitests
mvn test -Dtest=TestMaterializedViewMap -pl ql

@sonarqubecloud
Copy link

import static org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.executeStatementOnDriver;
import static org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.executeStatementOnDriverSilently;

public class TestQueryRewrite extends CompactorOnTezTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why extend from Compaction base test class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need anything specific from compaction but CompactorOnTezTest has utility methods like

executeStatementOnDriver
executeStatementOnDriverSilently

ASTNode astNode, Set<TableName> tablesUsed, Supplier<String> validTxnsList, HiveTxnManager txnMgr) throws HiveException {

List<HiveRelOptMaterialization> materializedViews =
Collection<HiveRelOptMaterialization> materializedViews =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not keep List?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no strong motivation behind this change. None of the List specific functionality is used.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes CBO (Cost-Based Optimizer) failures that occur when materialized views are dropped from HMS but their pre-compiled plans remain in the HiveMaterializedViewsRegistry. This is particularly problematic in multi-HS2 clusters where one HS2 instance drops an MV while another instance still has it cached.

Changes:

  • Added logic to the registry refresh mechanism to remove MVs that no longer exist in HMS or have been disabled for query rewrite
  • Renamed MaterializedViewsCache to MaterializedViewMap to better reflect its purpose
  • Refactored MV storage from nested maps (database → table → materialization) to a flat map with composite keys (database+table → materialization)

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewMap.java Renamed from MaterializedViewsCache and refactored to use flat map structure with composite keys instead of nested maps
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMaterializedViewsRegistry.java Added cleanup logic in refresh() to remove stale MVs and changed API signatures from EnumSet to Set
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java Enhanced error handling with try-catch-finally to gracefully handle exceptions during MV rewriting
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveRelOptMaterialization.java Changed isSupported() parameter from EnumSet to Set for API flexibility
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java Updated method signature to use Collection instead of List for flexibility
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestMaterializedViewMap.java Renamed test class and updated test assertions to work with Collection instead of List
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestQueryRewrite.java New integration test validating that queries fall back to original plans when MVs are dropped or disabled
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestHiveMaterializedViewRegistry.java New integration test validating registry refresh behavior for adding, updating, and removing MVs
ql/src/test/results/clientpositive/llap/materialized_view_partitioned_2.q.out Test result changes reflecting optimizer choosing partition_mv_1 instead of partition_mv_3 (both valid choices)
Comments suppressed due to low confidence (4)

ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewMap.java:65

  • There's a potential thread safety issue with the concurrent modification of the ArrayList in sqlToMaterializedView. While ConcurrentHashMap is thread-safe for map operations, the ArrayList stored as values is not thread-safe. Multiple threads could simultaneously call putIfAbsent or refresh, leading to concurrent calls to materializationList.add() on the same ArrayList. This could result in lost updates, ArrayIndexOutOfBoundsException, or corrupted list state. Consider using CopyOnWriteArrayList or Collections.synchronizedList for the lists stored in sqlToMaterializedView, or ensure proper synchronization around list operations.
    ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewMap.java:147
  • There's a thread safety issue with the ArrayList modification. While the call to remove() on line 143 is within the computeIfPresent callback (which provides atomicity for the map operation), the list itself is not thread-safe. If another thread is simultaneously accessing the same list through get() or refresh(), there could be a ConcurrentModificationException or inconsistent reads. The same applies to lines 88, 94-95 in the refresh method. Consider using CopyOnWriteArrayList or Collections.synchronizedList to protect the list from concurrent modifications.
    ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewMap.java:77
  • The genKey method concatenates dbName and viewName without a delimiter, which can lead to key collisions. For example, database "db" with view "name" would produce the same key "dbname" as database "d" with view "bname". This could cause incorrect behavior where different materialized views are treated as the same entry. Consider adding a delimiter between the database and view names.
    ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewMap.java:53
  • The comment on line 51 is outdated. It states "Key is the database name. Value a map from the qualified name to the view object" but this is no longer accurate. The map structure changed from a nested map (db -> table -> materialization) to a flat map with a composite key. The comment should be updated to reflect that the key is now a composite of database name and view name.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// We replace if the existing MV is not newer
Table existingMVTable = HiveMaterializedViewUtils.extractTable(existingMV);
remaining.remove(new TableName(
existingMVTable.getCatName(), existingMVTable.getDbName(), existingMVTable.getTableName()));
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an inconsistency in how the 'remaining' set is populated versus how items are removed from it. On line 186, new MVs are added with a hardcoded catalog name "hive", but on line 195-196, existing MVs are removed using the actual catalog name from the table (existingMVTable.getCatName()). If the catalog name is not "hive", the removal won't find a match, and these MVs will be incorrectly dropped from the registry. Consider using the actual catalog name consistently or verifying that all MVs in the registry have the same catalog name.

Suggested change
existingMVTable.getCatName(), existingMVTable.getDbName(), existingMVTable.getTableName()));
"hive", existingMVTable.getDbName(), existingMVTable.getTableName()));

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the hardcoded "hive" literal when populating the remaining set.

Comment on lines 69 to 80
@Test
public void testQueryIsNotRewrittenWhenMVIsDropped() throws Exception {

// Simulate a multi HS2 cluster.
// Drop the MV using a direct API call to HMS. This is similar what happening when the drop MV is executed by
// another HS2.
// In this case the MV is not removed from HiveMaterializedViewsRegistry of HS2 which runs the explain query.
msClient.dropTable(DB, MV1);

List<String> result = execSelectAndDumpData("explain cbo select a, b from " + TABLE1 + " where a > 0", driver, "");
Assert.assertEquals(ORIGINAL_QUERY_PLAN, result);
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test testQueryIsNotRewrittenWhenMVIsDropped simulates a multi-HS2 scenario where an MV is dropped via HMS but the registry is not updated. However, the test doesn't explicitly call HiveMaterializedViewsRegistry.get().refresh(Hive.get()) to verify that the refresh mechanism correctly removes the dropped MV from the registry. Without this explicit refresh call, the test may not be validating the core functionality added by this PR. Consider adding a refresh call before the query execution to ensure the registry cleanup logic is properly tested.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the test testQueryIsNotRewrittenWhenMVIsDroppedButTheRegistryIsNotRefreshed because the goal is to confirm that the dropped MV doesn't block the compiler even if the registry is not up-to-date.

Comment on lines 82 to 93
@Test
public void testQueryIsNotRewrittenWhenMVIsDisabledForRewrite() throws Exception {
Table mvTable = Hive.get().getTable(DB, MV1);
mvTable.setRewriteEnabled(false);

EnvironmentContext environmentContext = new EnvironmentContext();
environmentContext.putToProperties(StatsSetupConst.DO_NOT_UPDATE_STATS, StatsSetupConst.TRUE);
Hive.get().alterTable(mvTable, false, environmentContext, true);

List<String> result = execSelectAndDumpData("explain cbo select a, b from " + TABLE1 + " where a > 0", driver, "");
Assert.assertEquals(ORIGINAL_QUERY_PLAN, result);
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test testQueryIsNotRewrittenWhenMVIsDisabledForRewrite disables an MV for rewrite but doesn't explicitly call HiveMaterializedViewsRegistry.get().refresh(Hive.get()) to trigger the registry update. The test may be relying on implicit behavior or scheduled refresh. Consider adding an explicit refresh call to ensure the test validates that the registry correctly removes MVs disabled for rewrite, which is a key feature of this PR.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the test testQueryIsNotRewrittenWhenMVIsDisabledForRewriteButTheRegistryIsNotRefreshed because the goal is to confirm that the disabled MV doesn't block the compiler even if the registry is not up-to-date.

if (initialized.get()) {
Set<TableName> remaining = getRewritingMaterializedViews().stream()
.map(materialization -> new TableName(
"hive", materialization.qualifiedTableName.get(0), materialization.qualifiedTableName.get(1)))
Copy link
Member

@deniskuzZ deniskuzZ Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you hardcoding the catalog?
i think better to use fromString(final String name, final String defaultCatalog, final String defaultDatabase)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this part to extract the Table object represents the MV from the wrapped TableScan of the MV. So Table.getFullTableName is used.

LOG.error("Problem connecting to the metastore when initializing the view registry", e);

for (TableName tableName : remaining) {
dropMaterializedView(tableName.getDb(), tableName.getTable());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not pass TableName?
remaining.foreach(this::dropMaterializedView)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to use TableName

EnumSet<RewriteAlgorithm> scope) {
HiveRelOptMaterialization materialization = materializedViewsCache.get(dbName, viewName);
public HiveRelOptMaterialization getRewritingMaterializedView(
String dbName, String viewName, Set<RewriteAlgorithm> scope) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use TableName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to use TableName

}
return dbMap;
private String genKey(Table materializedViewTable) {
return genKey(materializedViewTable.getDbName(), materializedViewTable.getTableName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it better to add method to Table obj like getFullName or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the genKey method to indicate that a map key is generated but the new method is using TableName.toString()


if (dbMap.isEmpty()) {
materializedViews.remove(materializedViewTable.getDbName());
// Delete only if the create time for the input materialized view table and the table
Copy link
Member

@deniskuzZ deniskuzZ Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

materializedViewTable.equals(oldTable) where is the create time comparison, maybe update the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

SELECT * FROM src_txn_2 where key > 224 and key < 226
PREHOOK: type: QUERY
PREHOOK: Input: default@partition_mv_3
PREHOOK: Input: default@partition_mv_1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why table name is changed? did we pick some older entry from the registry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last query can be served from 3 MVs out of 4. The reason why it is changed becasue the MVs are stored in an unsorted map. I updated the test: drop 2 MVs of 3 before this query to avoid flakiness.

kasakrisz and others added 7 commits February 18, 2026 11:46
…ompactor/TestHiveMaterializedViewRegistry.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ompactor/TestQueryRewrite.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ompactor/TestQueryRewrite.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants