MDEV-36436 Assertion "unexpected references" == 0 after ALTER IGNORE TABLE#4636
MDEV-36436 Assertion "unexpected references" == 0 after ALTER IGNORE TABLE#4636Thirunarayanan wants to merge 1 commit into10.11from
Conversation
|
|
835e1e1 to
15db6fb
Compare
…TABLE Problem: ========= An assertion failure occurs in InnoDB during consecutive ALTER TABLE operations using the COPY algorithm. The crash happens during the table rename phase because the source table (dict_table_t::n_ref_count) is non-zero, despite the thread holding an exclusive Metadata Lock (MDL). Reason: ======== When ALTER IGNORE TABLE is executed via the COPY algorithm, it generates undo logs for every row inserted into the intermediate table (e.g., #sql-alter-...). The background Purge Thread, responsible for cleaning up these undo logs, attempts to take an MDL on the table to prevent the table from being dropped while in use. Race condition: ================== First ALTER: Creates #sql-alter-, copies data, and renames it to t1. Purge Activation: The Purge thread picks up the undo logs from step 1. It takes an MDL on the temporary name (#sql-alter-) and increments the table's n_ref_count. Identity Shift: InnoDB renames the physical table object to t1, but the Purge thread still holds a reference to this object. Second ALTER: Starts a new copy process. When it attempts to rename the "new" t1 to a backup name, it checks if n_ref_count == 0. Because the Purge thread is still "pinning" the object to clean up logs from the first ALTER, the count is > 0, triggering the assertion failure. Solution: ======== ALTER IGNORE TABLE needs row-level undo logging to easily roll back the last inserted row in case of duplicate key errors. By discarding the last undo log record after inserting each row, purge will not process any log records generated by ALTER IGNORE TABLE, preventing unexpected access from the purge subsystem during subsequent DDL operations. Make skip_alter_undo (1-bit) to (2-bit enum) to support different ALTER operation modes: - NO_UNDO (0): Normal mode with standard undo logging - SKIP_UNDO (1): ALTER mode that skips undo logging - IGNORE_UNDO (2): ALTER IGNORE mode that rewrites undo blocks trx_undo_report_row_operation(): Add ALTER IGNORE undo rewriting logic. Store old undo record info before writing new records for IGNORE_UNDO mode. Reset undo top_offset to maintain only latest insert undo row_ins_clust_index_entry_low(): Prevent ALTER IGNORE from using bulk insert optimization. since it requires to write undo log for each row operation that's incompatible with existing bulk operation.
15db6fb to
dad4b3e
Compare
| /** Overwrite the old undo log for each insert for | ||
| ALTER IGNORE TABLE .. ALGORITHM= COPY */ | ||
| HA_EXTRA_BEGIN_ALTER_IGNORE_COPY |
There was a problem hiding this comment.
This file should describe things from the SQL layer point of view. The implementation details of storage engines do not belong here. Because this is not part of any persistent data structure, I think that we should place this constant right after the most closely related one, like this:
/** Start writing rows during ALTER TABLE...ALGORITHM=COPY. */
HA_EXTRA_BEGIN_ALTER_COPY,
/** Start writing rows during ALTER IGNORE TABLE...ALGORITHM=COPY. */
HA_EXTRA_BEGIN_ALTER_IGNORE_COPY,| if (thd->lex->duplicates == DUP_ERROR) | ||
| table->file->extra(thd->lex->ignore | ||
| ? HA_EXTRA_BEGIN_ALTER_IGNORE_COPY | ||
| : HA_EXTRA_BEGIN_ALTER_COPY); |
There was a problem hiding this comment.
TAB should not be used for indentation.
I would suggest to replace a run-time condition with arithmetics:
static_assert(int{HA_EXTRA_BEGIN_ALTER_IGNORE_COPY} == int{HA_EXTRA_BEGIN_ALTER_COPY} + 1, "");
table->file->extra(ha_extra_function(int{HA_EXTRA_BEGIN_ALTER_COPY} + thd->lex->ignore));I checked that in sql/sql_lex.h the type of LEX::ignore is bool:1, so only the values 0 and 1 are possible.
| to->file->extra(ignore | ||
| ? HA_EXTRA_BEGIN_ALTER_IGNORE_COPY | ||
| : HA_EXTRA_BEGIN_ALTER_COPY); |
There was a problem hiding this comment.
static_assert(int{HA_EXTRA_BEGIN_ALTER_IGNORE_COPY} == int{HA_EXTRA_BEGIN_ALTER_COPY} + 1, "");
table->file->extra(ha_extra_function(int{HA_EXTRA_BEGIN_ALTER_COPY} + ignore));| /* Abort the ALTER COPY operation. For ALTER IGNORE, this | ||
| prevents undo logs from being added to the purge thread, | ||
| allowing proper cleanup of the temporary undo state. */ | ||
| to->file->extra(HA_EXTRA_ABORT_ALTER_COPY); |
There was a problem hiding this comment.
This comment would better belong to ha_innobase::extra(). But the comment of HA_EXTRA_ABORT_ALTER_COPY needs to be revised to say that it must be used if an operation that invoked extra(HA_EXTRA_BEGIN_ALTER_COPY) or extra(HA_EXTRA_BEGIN_ALTER_IGNORE_COPY) is aborted.
| trx->undo_no = 0; | ||
| trx_undo_try_truncate(*trx); |
There was a problem hiding this comment.
This should really be done in a member function of trx_t. In that way, we’d also avoid adding `#include "trx0undo.h" to this file. The member function should start as follows:
ut_ad(undo_no <= 1);
undo_no= 0;Can we assert anything about mod_tables?
| /** Undo log handling modes for ALTER TABLE operations */ | ||
| enum skip_alter_undo_t { | ||
| /** Normal mode - standard undo logging */ | ||
| NORMAL_UNDO = 0, | ||
| /** ALTER mode - skip undo logging */ | ||
| NO_UNDO = 1, | ||
| /** ALTER IGNORE mode - rewrite undo blocks to | ||
| maintain only latest insert undo log */ | ||
| IGNORE_UNDO = 2 | ||
| }; | ||
|
|
||
| /** Mode for handling undo logs during ALTER TABLE operations. | ||
| Set during copy alter operations or partition/subpartition operations. | ||
| When set, controls undo log behavior for row operations in the table. | ||
| For ALTER IGNORE, this enables rewriting old insert undo blocks | ||
| to maintain only the latest insert undo log. | ||
| This variable is set and unset during extra(), or during the | ||
| process of altering partitions */ | ||
| unsigned skip_alter_undo:1; | ||
| unsigned skip_alter_undo:2; |
There was a problem hiding this comment.
public static constexpr unsinged constants could better align with the unsigned skip_alter_undo. Either way is fine.
The comments for NO_UNDO and IGNORE_UNDO as well as the comment about skip_alter_undo should more specifically refer to ALTER [IGNORE] TABLE...ALGORITHM=COPY. This will not be consulted in ha_innobase::inplace_alter_table(); it never wrote row-level undo log records.
| && index->table->skip_alter_undo != | ||
| dict_table_t::IGNORE_UNDO) { | ||
|
|
||
| ut_ad(!index->table->skip_alter_undo); |
There was a problem hiding this comment.
Can we retain the assertion in some form?
| bool ignore_ddl= | ||
| (index->table->skip_alter_undo == dict_table_t::IGNORE_UNDO); |
There was a problem hiding this comment.
I’d write
const bool alter_ignore_table=Or even better, check the variable directly when needed, after checking the cheaper conditions that do not involve dereferencing two pointers.
| if (ignore_ddl && !undo->empty() && | ||
| undo->old_offset <= undo->top_offset) { | ||
|
|
||
| undo->top_offset = undo->old_offset; | ||
| undo->top_undo_no = 0; | ||
| mtr.write<2>(*undo_block, | ||
| undo_block->page.frame + TRX_UNDO_PAGE_HDR + | ||
| TRX_UNDO_PAGE_FREE, undo->old_offset); | ||
| } |
There was a problem hiding this comment.
I would suggest to move this to the else branch of the if (is_temp) earlier in this function. We should compare the cheapest conditions first, before checking index->table->skip_alter_undo == dict_table_t::IGNORE_UNDO.
It seems that we could substitute the undo->empty() with const bool empty{!*pundo}; before invoking trx_undo_assign_low<false>().
Let’s keep in mind that this function is very time critical, because it is invoked at the start of each row operation that is inserting, updating, or deleting a record.
| /* For ALTER IGNORE, store the current record position | ||
| as the rollback point for the next operation */ | ||
| if (ignore_ddl && undo->empty()) { | ||
| undo->old_offset = offset; | ||
| } |
There was a problem hiding this comment.
The offset can only grow by 1. Can we avoid checking this condition twice in the normal code path? In an error handling code path we could reset undo->old_offset-- if it is really necessary. (Consider what would happen on a subsequent rollback of the transaction.)
Problem:
An assertion failure occurs in InnoDB during consecutive ALTER TABLE operations using the COPY algorithm. The crash happens during the table rename phase because the
source table (dict_table_t::n_ref_count) is non-zero, despite the thread holding an exclusive Metadata Lock (MDL).
Reason:
When ALTER IGNORE TABLE is executed via the COPY algorithm, it generates undo logs for every row inserted into the intermediate table (e.g., #sql-alter-...). The background Purge Thread, responsible for cleaning up these undo logs, attempts to take an MDL on the table to prevent the table from being dropped while in use.
Race condition:
First ALTER: Creates #sql-alter-, copies data, and renames it to t1.
Purge Activation: The Purge thread picks up the undo logs from step 1. It takes an MDL on the temporary name (#sql-alter-) and increments the table's n_ref_count.
Identity Shift: InnoDB renames the physical table object to t1, but the Purge thread still holds a reference to this object.
Second ALTER: Starts a new copy process. When it attempts to rename the "new" t1 to a backup name, it checks if n_ref_count == 0. Because the Purge thread is still "pinning" the object to clean up logs from the first ALTER, the count is > 0, triggering the assertion failure.
Solution:
ALTER IGNORE TABLE needs row-level undo logging to easily roll back the last inserted row in case of duplicate key errors. By discarding the last undo log record after inserting each row, purge will not process any log records generated by ALTER IGNORE TABLE, preventing unexpected access from the purge subsystem during subsequent DDL operations.
Make skip_alter_undo (1-bit) to (2-bit enum)
to support different ALTER operation modes:
trx_undo_report_row_operation(): Add ALTER IGNORE undo rewriting logic. Store old undo record info before writing new records for IGNORE_UNDO mode. Reset undo top_offset to maintain only latest insert undo