Skip to content
/ server Public

MDEV-36436 Assertion "unexpected references" == 0 after ALTER IGNORE TABLE#4636

Open
Thirunarayanan wants to merge 1 commit into10.11from
10.11-MDEV-36436
Open

MDEV-36436 Assertion "unexpected references" == 0 after ALTER IGNORE TABLE#4636
Thirunarayanan wants to merge 1 commit into10.11from
10.11-MDEV-36436

Conversation

@Thirunarayanan
Copy link
Member

Problem:

An assertion failure occurs in InnoDB during consecutive ALTER TABLE operations using the COPY algorithm. The crash happens during the table rename phase because the
source table (dict_table_t::n_ref_count) is non-zero, despite the thread holding an exclusive Metadata Lock (MDL).

Reason:

When ALTER IGNORE TABLE is executed via the COPY algorithm, it generates undo logs for every row inserted into the intermediate table (e.g., #sql-alter-...). The background Purge Thread, responsible for cleaning up these undo logs, attempts to take an MDL on the table to prevent the table from being dropped while in use.

Race condition:

First ALTER: Creates #sql-alter-, copies data, and renames it to t1.

Purge Activation: The Purge thread picks up the undo logs from step 1. It takes an MDL on the temporary name (#sql-alter-) and increments the table's n_ref_count.

Identity Shift: InnoDB renames the physical table object to t1, but the Purge thread still holds a reference to this object.

Second ALTER: Starts a new copy process. When it attempts to rename the "new" t1 to a backup name, it checks if n_ref_count == 0. Because the Purge thread is still "pinning" the object to clean up logs from the first ALTER, the count is > 0, triggering the assertion failure.

Solution:

ALTER IGNORE TABLE needs row-level undo logging to easily roll back the last inserted row in case of duplicate key errors. By discarding the last undo log record after inserting each row, purge will not process any log records generated by ALTER IGNORE TABLE, preventing unexpected access from the purge subsystem during subsequent DDL operations.

Make skip_alter_undo (1-bit) to (2-bit enum)
to support different ALTER operation modes:

  • NO_UNDO (0): Normal mode with standard undo logging
  • SKIP_UNDO (1): ALTER mode that skips undo logging
  • IGNORE_UNDO (2): ALTER IGNORE mode that rewrites undo blocks

trx_undo_report_row_operation(): Add ALTER IGNORE undo rewriting logic. Store old undo record info before writing new records for IGNORE_UNDO mode. Reset undo top_offset to maintain only latest insert undo

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Thirunarayanan Thirunarayanan requested a review from dr-m February 9, 2026 13:14
@Thirunarayanan Thirunarayanan force-pushed the 10.11-MDEV-36436 branch 3 times, most recently from 835e1e1 to 15db6fb Compare February 10, 2026 10:00
…TABLE

Problem:
=========
An assertion failure occurs in InnoDB during consecutive
ALTER TABLE operations using the COPY algorithm. The crash
happens during the table rename phase because the
source table (dict_table_t::n_ref_count) is non-zero, despite
the thread holding an exclusive Metadata Lock (MDL).

Reason:
========
When ALTER IGNORE TABLE is executed via the COPY algorithm,
it generates undo logs for every row inserted into the
intermediate table (e.g., #sql-alter-...). The background Purge
Thread, responsible for cleaning up these undo logs, attempts
to take an MDL on the table to prevent the table from
being dropped while in use.

Race condition:
==================
First ALTER: Creates #sql-alter-, copies data, and renames it to t1.

Purge Activation: The Purge thread picks up the undo logs from step 1.
It takes an MDL on the temporary name (#sql-alter-) and increments
the table's n_ref_count.

Identity Shift: InnoDB renames the physical table object to t1, but
the Purge thread still holds a reference to this object.

Second ALTER: Starts a new copy process. When it attempts to rename
the "new" t1 to a backup name, it checks if n_ref_count == 0.
Because the Purge thread is still "pinning" the object to
clean up logs from the first ALTER, the count is > 0,
triggering the assertion failure.

Solution:
========
ALTER IGNORE TABLE needs row-level undo logging to easily
roll back the last inserted row in case of duplicate key errors.
By discarding the last undo log record after inserting each row,
purge will not process any log records generated by
ALTER IGNORE TABLE, preventing unexpected access from the purge
subsystem during subsequent DDL operations.

Make skip_alter_undo (1-bit) to (2-bit enum)
to support different ALTER operation modes:

- NO_UNDO (0): Normal mode with standard undo logging
- SKIP_UNDO (1): ALTER mode that skips undo logging
- IGNORE_UNDO (2): ALTER IGNORE mode that rewrites undo blocks

trx_undo_report_row_operation(): Add ALTER IGNORE undo
rewriting logic. Store old undo record info before writing
new records for IGNORE_UNDO mode. Reset undo top_offset
to maintain only latest insert undo

row_ins_clust_index_entry_low(): Prevent ALTER IGNORE from
using bulk insert optimization. since it requires to write
undo log for each row operation that's incompatible with
existing bulk operation.
Comment on lines +233 to +235
/** Overwrite the old undo log for each insert for
ALTER IGNORE TABLE .. ALGORITHM= COPY */
HA_EXTRA_BEGIN_ALTER_IGNORE_COPY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should describe things from the SQL layer point of view. The implementation details of storage engines do not belong here. Because this is not part of any persistent data structure, I think that we should place this constant right after the most closely related one, like this:

  /** Start writing rows during ALTER TABLE...ALGORITHM=COPY. */
  HA_EXTRA_BEGIN_ALTER_COPY,
  /** Start writing rows during ALTER IGNORE TABLE...ALGORITHM=COPY. */
  HA_EXTRA_BEGIN_ALTER_IGNORE_COPY,

Comment on lines +5168 to +5171
if (thd->lex->duplicates == DUP_ERROR)
table->file->extra(thd->lex->ignore
? HA_EXTRA_BEGIN_ALTER_IGNORE_COPY
: HA_EXTRA_BEGIN_ALTER_COPY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TAB should not be used for indentation.

I would suggest to replace a run-time condition with arithmetics:

static_assert(int{HA_EXTRA_BEGIN_ALTER_IGNORE_COPY} == int{HA_EXTRA_BEGIN_ALTER_COPY} + 1, "");
table->file->extra(ha_extra_function(int{HA_EXTRA_BEGIN_ALTER_COPY} + thd->lex->ignore));

I checked that in sql/sql_lex.h the type of LEX::ignore is bool:1, so only the values 0 and 1 are possible.

Comment on lines +12354 to +12356
to->file->extra(ignore
? HA_EXTRA_BEGIN_ALTER_IGNORE_COPY
: HA_EXTRA_BEGIN_ALTER_COPY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_assert(int{HA_EXTRA_BEGIN_ALTER_IGNORE_COPY} == int{HA_EXTRA_BEGIN_ALTER_COPY} + 1, "");
table->file->extra(ha_extra_function(int{HA_EXTRA_BEGIN_ALTER_COPY} + ignore));

Comment on lines +12459 to +12462
/* Abort the ALTER COPY operation. For ALTER IGNORE, this
prevents undo logs from being added to the purge thread,
allowing proper cleanup of the temporary undo state. */
to->file->extra(HA_EXTRA_ABORT_ALTER_COPY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment would better belong to ha_innobase::extra(). But the comment of HA_EXTRA_ABORT_ALTER_COPY needs to be revised to say that it must be used if an operation that invoked extra(HA_EXTRA_BEGIN_ALTER_COPY) or extra(HA_EXTRA_BEGIN_ALTER_IGNORE_COPY) is aborted.

Comment on lines +15919 to +15920
trx->undo_no = 0;
trx_undo_try_truncate(*trx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be done in a member function of trx_t. In that way, we’d also avoid adding `#include "trx0undo.h" to this file. The member function should start as follows:

  ut_ad(undo_no <= 1);
  undo_no= 0;

Can we assert anything about mod_tables?

Comment on lines +2225 to +2243
/** Undo log handling modes for ALTER TABLE operations */
enum skip_alter_undo_t {
/** Normal mode - standard undo logging */
NORMAL_UNDO = 0,
/** ALTER mode - skip undo logging */
NO_UNDO = 1,
/** ALTER IGNORE mode - rewrite undo blocks to
maintain only latest insert undo log */
IGNORE_UNDO = 2
};

/** Mode for handling undo logs during ALTER TABLE operations.
Set during copy alter operations or partition/subpartition operations.
When set, controls undo log behavior for row operations in the table.
For ALTER IGNORE, this enables rewriting old insert undo blocks
to maintain only the latest insert undo log.
This variable is set and unset during extra(), or during the
process of altering partitions */
unsigned skip_alter_undo:1;
unsigned skip_alter_undo:2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public static constexpr unsinged constants could better align with the unsigned skip_alter_undo. Either way is fine.

The comments for NO_UNDO and IGNORE_UNDO as well as the comment about skip_alter_undo should more specifically refer to ALTER [IGNORE] TABLE...ALGORITHM=COPY. This will not be consulted in ha_innobase::inplace_alter_table(); it never wrote row-level undo log records.

Comment on lines +2789 to -2786
&& index->table->skip_alter_undo !=
dict_table_t::IGNORE_UNDO) {

ut_ad(!index->table->skip_alter_undo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we retain the assertion in some form?

Comment on lines +1921 to +1922
bool ignore_ddl=
(index->table->skip_alter_undo == dict_table_t::IGNORE_UNDO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d write

const bool alter_ignore_table=

Or even better, check the variable directly when needed, after checking the cheaper conditions that do not involve dereferencing two pointers.

Comment on lines +1938 to +1946
if (ignore_ddl && !undo->empty() &&
undo->old_offset <= undo->top_offset) {

undo->top_offset = undo->old_offset;
undo->top_undo_no = 0;
mtr.write<2>(*undo_block,
undo_block->page.frame + TRX_UNDO_PAGE_HDR +
TRX_UNDO_PAGE_FREE, undo->old_offset);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to move this to the else branch of the if (is_temp) earlier in this function. We should compare the cheapest conditions first, before checking index->table->skip_alter_undo == dict_table_t::IGNORE_UNDO.

It seems that we could substitute the undo->empty() with const bool empty{!*pundo}; before invoking trx_undo_assign_low<false>().

Let’s keep in mind that this function is very time critical, because it is invoked at the start of each row operation that is inserting, updating, or deleting a record.

Comment on lines +2023 to +2027
/* For ALTER IGNORE, store the current record position
as the rollback point for the next operation */
if (ignore_ddl && undo->empty()) {
undo->old_offset = offset;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offset can only grow by 1. Can we avoid checking this condition twice in the normal code path? In an error handling code path we could reset undo->old_offset-- if it is really necessary. (Consider what would happen on a subsequent rollback of the transaction.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants