Skip to content

HIVE-27328: Acid dirCache is not invalidated in TezAMs while dropping table#6309

Merged
abstractdog merged 2 commits intoapache:masterfrom
abstractdog:HIVE-27328
Feb 18, 2026
Merged

HIVE-27328: Acid dirCache is not invalidated in TezAMs while dropping table#6309
abstractdog merged 2 commits intoapache:masterfrom
abstractdog:HIVE-27328

Conversation

@abstractdog
Copy link
Contributor

@abstractdog abstractdog commented Feb 9, 2026

What changes were proposed in this pull request?

See jira for initial analysis.
The patch introduces a table createTime check while using the acid dir cache. This create_time property is propagated to the TezAM in the vertex-level JobConf, making it able to invalidate stale entries that belong to the previous instance of the same table (before DROP).
An RPC call-based solution wouldn't work because HS2 has no such interface to the AMs, and introducing such functionality to the TezClient/DagClient would be an epic hack, so it's not an option.

Why are the changes needed?

Because stale cache can cause problems, that are hard to investigate and that weren't taken care of by HIVE-26060.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually tested with minihs2.

mvn clean install -Dtest=StartMiniHS2Cluster -DminiHS2.clusterType=llap -DminiHS2.run=true -DminiHS2.usePortsFromConf=true -T 1C -Denforcer.skip=true -pl itests/hive-unit -pl itests/util -Pitests -nsu DminiHS2.isMetastoreRemote=true

set hive.explain.user=false;
set hive.query.results.cache.enabled=false;
set hive.fetch.task.conversion=none;

set hive.support.concurrency=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

set hive.exec.orc.split.strategy=BI;


CREATE TABLE test_part(id int) PARTITIONED BY(dt string) CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');
INSERT OVERWRITE TABLE test_part PARTITION (dt) SELECT 1, '1';

SELECT * FROM test_part;

DROP TABLE test_part;


CREATE TABLE test_part(id int) PARTITIONED BY(dt string) CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');
INSERT OVERWRITE TABLE test_part PARTITION (dt) SELECT 1, '1';

SELECT * FROM test_part;

looking for log entry in the AM log:

find . -name "syslog" | xargs grep "invalidating entry"
...
2026-02-09T08:12:00,509 INFO [ORC_GET_SPLITS #1] io.AcidUtils: Table default.test_part was recreated (at: 1770653502) since it was stored in acid cache (at: 1770653453), invalidating entry

@abstractdog abstractdog changed the title HIVE-27328: Acid dirCache is not invalidated in TezAMs while dropping… HIVE-27328: Acid dirCache is not invalidated in TezAMs while dropping table Feb 9, 2026
@abstractdog
Copy link
Contributor Author

abstractdog commented Feb 10, 2026

oh, this causes a lot of qtest noise because of the create_time value added to TableDesc properties, need to think about this
I think I can put it directly to the MapWork's conf somehow to make it consumed in the split generation process

@abstractdog
Copy link
Contributor Author

oh, this causes a lot of qtest noise because of the create_time value added to TableDesc properties, need to think about this I think I can put it directly to the MapWork's conf somehow to make it consumed in the split generation process

UPDATE: solved from MapWork

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @abstractdog for chasing this, dropped minor comments, rest looks good

Comment on lines 5112 to 5115
* @param tableCreateTime
* table creation time to store, represented as a string
*/
public static void setTableCreateTime(Configuration conf, Table table) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this param doesn't exist

public static void setTableCreateTime(Configuration conf, Table table) {
Objects.requireNonNull(table, "Cannot get table create time. Table object is expected to be non-null.");
String tableCreateTime = String.valueOf(table.getCreateTime());
String fullTableName = String.format("%s.%s", table.getDbName(), table.getTableName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use:

table.getFullyQualifiedName();

or

TableName.getDbTable(table.getDbName(), table.getTableName());

Objects.requireNonNull(table, "Cannot get table create time. Table object is expected to be non-null.");
String tableCreateTime = String.valueOf(table.getCreateTime());
String fullTableName = String.format("%s.%s", table.getDbName(), table.getTableName());
conf.set(String.format("%s.%s", fullTableName, CREATE_TIME), tableCreateTime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do setInt, rather than converting it into String and then setting

    conf.setInt(String.format("%s.%s", fullTableName, CREATE_TIME), table.getCreateTime());

Comment on lines 5138 to 5140
public static int getTableCreateTime(Configuration conf, String tableName) {
String createTime = conf.get(String.format("%s.%s", tableName, CREATE_TIME));
return createTime == null ? 0 : Integer.parseInt(createTime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

conf.getInt(...)


// Check whether the table was re-created after being stored in the cache.
// The value null check avoids a noisy log message during the initial lookup, when no cache entry exists.
if (value != null && tableCreateTimeInCache < tableCreateTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is in int. Doubting about the Interger Overflow case, Should we just do

tableCreateTimeInCache != tableCreateTime

mapWork.configureJobConf(jobConf);

// Then the table's create time should be present in the JobConf
String fullTableName = dbName + "." + tableName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String fullTableName = TableName.getDbTable(dbName, tableName);


@Override
public void configureJobConf(JobConf job) {
Table table = getConf().getTableMetadata();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can getConf() be null?, maybe we can do, if there is such a chance

  public void configureJobConf(JobConf job) {
    if (getConf() != null && getConf().getTableMetadata() != null) {
      Utilities.setTableCreateTime(job, getConf().getTableMetadata());
    }

@abstractdog
Copy link
Contributor Author

thanks @ayushtkn for your comments, all made sense, fixed them!

@sonarqubecloud
Copy link

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abstractdog abstractdog merged commit 7060d94 into apache:master Feb 18, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments