HIVE-28773: CBO fails when a materialized view is dropped but its pre-compiled plan remains in the registry.#6322
Conversation
|
| import static org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.executeStatementOnDriver; | ||
| import static org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.executeStatementOnDriverSilently; | ||
|
|
||
| public class TestQueryRewrite extends CompactorOnTezTest { |
There was a problem hiding this comment.
why extend from Compaction base test class?
There was a problem hiding this comment.
I don't need anything specific from compaction but CompactorOnTezTest has utility methods like
executeStatementOnDriver
executeStatementOnDriverSilently
| ASTNode astNode, Set<TableName> tablesUsed, Supplier<String> validTxnsList, HiveTxnManager txnMgr) throws HiveException { | ||
|
|
||
| List<HiveRelOptMaterialization> materializedViews = | ||
| Collection<HiveRelOptMaterialization> materializedViews = |
There was a problem hiding this comment.
There is no strong motivation behind this change. None of the List specific functionality is used.
There was a problem hiding this comment.
Pull request overview
This pull request fixes CBO (Cost-Based Optimizer) failures that occur when materialized views are dropped from HMS but their pre-compiled plans remain in the HiveMaterializedViewsRegistry. This is particularly problematic in multi-HS2 clusters where one HS2 instance drops an MV while another instance still has it cached.
Changes:
- Added logic to the registry refresh mechanism to remove MVs that no longer exist in HMS or have been disabled for query rewrite
- Renamed
MaterializedViewsCachetoMaterializedViewMapto better reflect its purpose - Refactored MV storage from nested maps (database → table → materialization) to a flat map with composite keys (database+table → materialization)
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewMap.java | Renamed from MaterializedViewsCache and refactored to use flat map structure with composite keys instead of nested maps |
| ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMaterializedViewsRegistry.java | Added cleanup logic in refresh() to remove stale MVs and changed API signatures from EnumSet to Set |
| ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java | Enhanced error handling with try-catch-finally to gracefully handle exceptions during MV rewriting |
| ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveRelOptMaterialization.java | Changed isSupported() parameter from EnumSet to Set for API flexibility |
| ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java | Updated method signature to use Collection instead of List for flexibility |
| ql/src/test/org/apache/hadoop/hive/ql/metadata/TestMaterializedViewMap.java | Renamed test class and updated test assertions to work with Collection instead of List |
| itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestQueryRewrite.java | New integration test validating that queries fall back to original plans when MVs are dropped or disabled |
| itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestHiveMaterializedViewRegistry.java | New integration test validating registry refresh behavior for adding, updating, and removing MVs |
| ql/src/test/results/clientpositive/llap/materialized_view_partitioned_2.q.out | Test result changes reflecting optimizer choosing partition_mv_1 instead of partition_mv_3 (both valid choices) |
Comments suppressed due to low confidence (4)
ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewMap.java:65
- There's a potential thread safety issue with the concurrent modification of the ArrayList in sqlToMaterializedView. While ConcurrentHashMap is thread-safe for map operations, the ArrayList stored as values is not thread-safe. Multiple threads could simultaneously call putIfAbsent or refresh, leading to concurrent calls to materializationList.add() on the same ArrayList. This could result in lost updates, ArrayIndexOutOfBoundsException, or corrupted list state. Consider using CopyOnWriteArrayList or Collections.synchronizedList for the lists stored in sqlToMaterializedView, or ensure proper synchronization around list operations.
ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewMap.java:147 - There's a thread safety issue with the ArrayList modification. While the call to remove() on line 143 is within the computeIfPresent callback (which provides atomicity for the map operation), the list itself is not thread-safe. If another thread is simultaneously accessing the same list through get() or refresh(), there could be a ConcurrentModificationException or inconsistent reads. The same applies to lines 88, 94-95 in the refresh method. Consider using CopyOnWriteArrayList or Collections.synchronizedList to protect the list from concurrent modifications.
ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewMap.java:77 - The genKey method concatenates dbName and viewName without a delimiter, which can lead to key collisions. For example, database "db" with view "name" would produce the same key "dbname" as database "d" with view "bname". This could cause incorrect behavior where different materialized views are treated as the same entry. Consider adding a delimiter between the database and view names.
ql/src/java/org/apache/hadoop/hive/ql/metadata/MaterializedViewMap.java:53 - The comment on line 51 is outdated. It states "Key is the database name. Value a map from the qualified name to the view object" but this is no longer accurate. The map structure changed from a nested map (db -> table -> materialization) to a flat map with a composite key. The comment should be updated to reflect that the key is now a composite of database name and view name.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.../src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestHiveMaterializedViewRegistry.java
Outdated
Show resolved
Hide resolved
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestQueryRewrite.java
Outdated
Show resolved
Hide resolved
| // We replace if the existing MV is not newer | ||
| Table existingMVTable = HiveMaterializedViewUtils.extractTable(existingMV); | ||
| remaining.remove(new TableName( | ||
| existingMVTable.getCatName(), existingMVTable.getDbName(), existingMVTable.getTableName())); |
There was a problem hiding this comment.
There's an inconsistency in how the 'remaining' set is populated versus how items are removed from it. On line 186, new MVs are added with a hardcoded catalog name "hive", but on line 195-196, existing MVs are removed using the actual catalog name from the table (existingMVTable.getCatName()). If the catalog name is not "hive", the removal won't find a match, and these MVs will be incorrectly dropped from the registry. Consider using the actual catalog name consistently or verifying that all MVs in the registry have the same catalog name.
| existingMVTable.getCatName(), existingMVTable.getDbName(), existingMVTable.getTableName())); | |
| "hive", existingMVTable.getDbName(), existingMVTable.getTableName())); |
There was a problem hiding this comment.
Removed the hardcoded "hive" literal when populating the remaining set.
| @Test | ||
| public void testQueryIsNotRewrittenWhenMVIsDropped() throws Exception { | ||
|
|
||
| // Simulate a multi HS2 cluster. | ||
| // Drop the MV using a direct API call to HMS. This is similar what happening when the drop MV is executed by | ||
| // another HS2. | ||
| // In this case the MV is not removed from HiveMaterializedViewsRegistry of HS2 which runs the explain query. | ||
| msClient.dropTable(DB, MV1); | ||
|
|
||
| List<String> result = execSelectAndDumpData("explain cbo select a, b from " + TABLE1 + " where a > 0", driver, ""); | ||
| Assert.assertEquals(ORIGINAL_QUERY_PLAN, result); | ||
| } |
There was a problem hiding this comment.
The test testQueryIsNotRewrittenWhenMVIsDropped simulates a multi-HS2 scenario where an MV is dropped via HMS but the registry is not updated. However, the test doesn't explicitly call HiveMaterializedViewsRegistry.get().refresh(Hive.get()) to verify that the refresh mechanism correctly removes the dropped MV from the registry. Without this explicit refresh call, the test may not be validating the core functionality added by this PR. Consider adding a refresh call before the query execution to ensure the registry cleanup logic is properly tested.
There was a problem hiding this comment.
Renamed the test testQueryIsNotRewrittenWhenMVIsDroppedButTheRegistryIsNotRefreshed because the goal is to confirm that the dropped MV doesn't block the compiler even if the registry is not up-to-date.
| @Test | ||
| public void testQueryIsNotRewrittenWhenMVIsDisabledForRewrite() throws Exception { | ||
| Table mvTable = Hive.get().getTable(DB, MV1); | ||
| mvTable.setRewriteEnabled(false); | ||
|
|
||
| EnvironmentContext environmentContext = new EnvironmentContext(); | ||
| environmentContext.putToProperties(StatsSetupConst.DO_NOT_UPDATE_STATS, StatsSetupConst.TRUE); | ||
| Hive.get().alterTable(mvTable, false, environmentContext, true); | ||
|
|
||
| List<String> result = execSelectAndDumpData("explain cbo select a, b from " + TABLE1 + " where a > 0", driver, ""); | ||
| Assert.assertEquals(ORIGINAL_QUERY_PLAN, result); | ||
| } |
There was a problem hiding this comment.
The test testQueryIsNotRewrittenWhenMVIsDisabledForRewrite disables an MV for rewrite but doesn't explicitly call HiveMaterializedViewsRegistry.get().refresh(Hive.get()) to trigger the registry update. The test may be relying on implicit behavior or scheduled refresh. Consider adding an explicit refresh call to ensure the test validates that the registry correctly removes MVs disabled for rewrite, which is a key feature of this PR.
There was a problem hiding this comment.
Renamed the test testQueryIsNotRewrittenWhenMVIsDisabledForRewriteButTheRegistryIsNotRefreshed because the goal is to confirm that the disabled MV doesn't block the compiler even if the registry is not up-to-date.
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestQueryRewrite.java
Outdated
Show resolved
Hide resolved
| if (initialized.get()) { | ||
| Set<TableName> remaining = getRewritingMaterializedViews().stream() | ||
| .map(materialization -> new TableName( | ||
| "hive", materialization.qualifiedTableName.get(0), materialization.qualifiedTableName.get(1))) |
There was a problem hiding this comment.
are you hardcoding the catalog?
i think better to use fromString(final String name, final String defaultCatalog, final String defaultDatabase)
There was a problem hiding this comment.
I refactored this part to extract the Table object represents the MV from the wrapped TableScan of the MV. So Table.getFullTableName is used.
| LOG.error("Problem connecting to the metastore when initializing the view registry", e); | ||
|
|
||
| for (TableName tableName : remaining) { | ||
| dropMaterializedView(tableName.getDb(), tableName.getTable()); |
There was a problem hiding this comment.
why not pass TableName?
remaining.foreach(this::dropMaterializedView)
There was a problem hiding this comment.
Refactored to use TableName
| EnumSet<RewriteAlgorithm> scope) { | ||
| HiveRelOptMaterialization materialization = materializedViewsCache.get(dbName, viewName); | ||
| public HiveRelOptMaterialization getRewritingMaterializedView( | ||
| String dbName, String viewName, Set<RewriteAlgorithm> scope) { |
There was a problem hiding this comment.
Refactored to use TableName
| } | ||
| return dbMap; | ||
| private String genKey(Table materializedViewTable) { | ||
| return genKey(materializedViewTable.getDbName(), materializedViewTable.getTableName()); |
There was a problem hiding this comment.
isn't it better to add method to Table obj like getFullName or something?
There was a problem hiding this comment.
I kept the genKey method to indicate that a map key is generated but the new method is using TableName.toString()
|
|
||
| if (dbMap.isEmpty()) { | ||
| materializedViews.remove(materializedViewTable.getDbName()); | ||
| // Delete only if the create time for the input materialized view table and the table |
There was a problem hiding this comment.
materializedViewTable.equals(oldTable) where is the create time comparison, maybe update the comment?
| SELECT * FROM src_txn_2 where key > 224 and key < 226 | ||
| PREHOOK: type: QUERY | ||
| PREHOOK: Input: default@partition_mv_3 | ||
| PREHOOK: Input: default@partition_mv_1 |
There was a problem hiding this comment.
why table name is changed? did we pick some older entry from the registry?
There was a problem hiding this comment.
This last query can be served from 3 MVs out of 4. The reason why it is changed becasue the MVs are stored in an unsorted map. I updated the test: drop 2 MVs of 3 before this query to avoid flakiness.
…-compiled plan remains in the registry.
…ompactor/TestHiveMaterializedViewRegistry.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ompactor/TestQueryRewrite.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ompactor/TestQueryRewrite.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
58830c2 to
ceffbdb
Compare



What changes were proposed in this pull request?
MaterializedViewCachetoMaterializedViewMapbecause it is not a cache.Why are the changes needed?
Dropped MVs may casue CBO failure. See jira HIVE-28773 for details
Does this PR introduce any user-facing change?
Yes. CBO failure is intermittent in clusters with multiple HS2s and depending on the query the whole compilation process could fail. This patch fixes these intermittent issues.
Is the change a dependency upgrade?
No.
How was this patch tested?