MDEV-38315 Optimize NULL only reads for wide tables in core engines by alexaustin007 · Pull Request #4627 · MariaDB/server

alexaustin007 · 2026-02-06T02:01:41Z

Implement NULL-only read optimization for InnoDB, Aria, and MyISAM.

SQL layer:

Handle WHERE, HAVING, JOIN ON, and subqueries.
Clear null-only marks when column value is needed elsewhere.
Optimization is enabled only for handlers with HA_CAN_NULL_ONLY.
Observability:
Add session status metric Handler_null_only_columns.
InnoDB/Aria/MyISAM consume null_set and skip value materialization
for null-only columns.
Federated is not covered in this patch.
Tests:
Extend targeted tests:
- maria.maria_null_only
Cover WHERE, JOIN ON, EXISTS-subquery, and negative "value needed"
cases.
InnoDB test also validates physical impact via blob-page read metrics.

This PR adds NULL-only read optimization for wide rows where query logic
needs only NULL-ness, not the column value.

Aria

Not covered:

Federated
Scenarios covered
JOIN ... ON ... AND col IS NULL
EXISTS (subquery ... col IS NULL)
Negative cases where the value is needed (for example LENGTH(col),
CRC32(col)), where optimization must not apply.

Conditions / gating

SQL computes a per-table NULL-only bitmap (null_set).
Columns are marked only for direct-field NULL predicates.
Optimization is used only if handler advertises HA_CAN_NULL_ONLY.

Observability

Added session status variable: Handler_null_only_columns.
Meaning: number of columns that remain in NULL-only mode for a
statement after all SQL-layer filtering.
Tests assert:
- > 0 for positive NULL-only scenarios
- = 0 for value-needed scenarios.

Engine-level validation

InnoDB:
- correctness + Handler_null_only_columns
- physical effect validation via innodb_metrics blob-page counters.
Aria/MyISAM:
- correctness + Handler_null_only_columns assertions.

Tests updated

innodb.innodb_null_only
maria.maria_null_only
myisam.myisam_null_only

dr-m

Here is an initial review, focusing on the InnoDB part, and not reviewing the changes in row0sel.cc in detail.

dr-m · 2026-02-06T05:17:51Z

mysql-test/suite/innodb/t/innodb_null_only.test

+--disable_query_log
+--disable_warnings
+SET GLOBAL innodb_monitor_disable='module_buffer_page';
+SET GLOBAL innodb_monitor_reset_all='module_buffer_page';
+SET GLOBAL innodb_monitor_enable='module_buffer_page';
+--enable_warnings
+
+CREATE TABLE t (
+  id INT PRIMARY KEY,
+  b MEDIUMBLOB NULL,
+  c INT
+) ENGINE=InnoDB ROW_FORMAT=DYNAMIC;
+
+INSERT INTO t VALUES
+  (1, REPEAT('a', 100000), 1),
+  (2, NULL, 2),
+  (3, REPEAT('b', 100000), 3),
+  (4, NULL, 4);
+
+--let $restart_noprint=2
+--let $restart_parameters=--innodb-buffer-pool-load-at-startup=0 --innodb-buffer-pool-dump-at-shutdown=0
+--source include/restart_mysqld.inc
+--let $restart_parameters=
+--disable_query_log


Why is the server being restarted after the data load? Restarting the server can significantly increase the execution time.

The ROW_FORMAT=DYNAMIC is redundant; this test should work with any value of innodb_default_row_format.

Hiding the queries from the output will make the .result file harder to read.

It is worth noting that in InnoDB, all of VARCHAR, TEXT, BLOB, MEDIUMBLOB and so on are being treated in the same way. I see that we are writing to the column b values that will never fit in a single InnoDB page. Because the maximum length of an InnoDB index record is 16383 bytes, shorter BLOB columns would work as well. Why does the column c exist and why does it allow NULL even if we are only inserting NOT NULL values?

The server restart is intentional for test correctness so physical BLOB page reads from information_schema.innodb_metrics. Without a cold start after every data load, BLOB pages may still be in buffer pool and the assertion can pass falsely. I already removed unnecessary restarts and kept only those needed for I/O validation.

Why does the column c exist and why does it allow NULL even if we are only inserting NOT NULL values?
Good point, fixed in the latest update

dr-m · 2026-02-06T05:19:08Z

mysql-test/suite/innodb/t/innodb_null_only.test

+CREATE TABLE t_red (
+  id INT PRIMARY KEY,
+  b MEDIUMBLOB NULL,
+  c INT
+) ENGINE=InnoDB ROW_FORMAT=REDUNDANT;


Instead of repeating the test for each ROW_FORMAT, you should make use of the following:

--source include/innodb_default_row_format.inc

dr-m · 2026-02-06T05:27:51Z

storage/innobase/handler/ha_innodb.cc


+	templ->null_only = false;
+	if (!templ->is_virtual
+	    && bitmap_is_set(&table->null_set, field->field_index)) {
+		templ->null_only = true;
+		if (prebuilt->template_type == ROW_MYSQL_WHOLE_ROW
+		    || prebuilt->select_lock_type == LOCK_X
+		    || prebuilt->pk_filter
+		    || prebuilt->in_fts_query) {
+			templ->null_only = false;
+		}
+	}


This is introducing a large number of frequently executed conditional branches. The conditions on prebuilt could be initialized to a new bool parameter of the function, which the caller would pass as a local variable that is initialized outside the loop, like this:

const bool maybe_null_only= whole_row || prebuilt->select_lock_type == LOCK_X || prebuilt->pk_filter || prebuilt->in_fts_query;

Side note: It looks like the prebuilt->template_type could have been shrunk to a single bit in ab01901.

Then, in this function, inside the pre-existing if (!templ->is_virtual) block we can use the following simple assignment:

templ->null_only = maybe_null_only && bitmap_is_set(&table->null_set, field->field_index);

dr-m · 2026-02-06T05:33:26Z

storage/innobase/row/row0sel.cc

+			if (templ->mysql_null_bit_mask) {
+				mysql_rec[templ->mysql_null_byte_offset]
+					&= static_cast<byte>
+					(~templ->mysql_null_bit_mask);
+			}


Why the condition? We could just unconditionally perform the assignment; it should be smaller and faster.

Which test case in the regression test suite is exercising this code path (index condition pushdown)?

dr-m · 2026-02-06T05:37:05Z

storage/innobase/include/row0mysql.h

@@ -447,6 +447,7 @@ struct mysql_row_templ_t {
 					type and this field is != 0, then
 					it is an unsigned integer type */
 	ulint	is_virtual;		/*!< if a column is a virtual column */
+	bool	null_only;		/*!< only NULL status is required */


Please check the data structure with GDB ptype/o and try to rearrange the fields and change some data types so that we get better locality of reference and a smaller footprint. I would assume that null_only and mysql_null_bit_mask will be accessed together. The ulint (an alias of size_t) is an overkill and some old legacy that we could fix. For example, would byte be a more suitable data type of mysql_null_bit_mask?

Addressed, Changed mysql_null_bit_mask from ulint to byte

gkodinov

This is a preliminary review.
Marko has already provided some feedback on the innodb part. When done with the preliminary review I'll ask for a reviewer from optimizer too.

Please fill in the commit message according to the coding style.

Also, please add details of your implementation: what SEs are covered, what scenarios, what conditions etc.

gkodinov · 2026-02-13T12:42:06Z

sql/handler.h

@@ -199,6 +192,7 @@ enum chf_create_flags {
 #define HA_HAS_OLD_CHECKSUM    (1ULL << 24)
 /* Table data are stored in separate files (for lower_case_table_names) */
 #define HA_FILE_BASED	       (1ULL << 26)
+#define HA_CAN_NULL_ONLY       (1ULL << 27)


It's probably left empty for some historical reason. I'd avoid refilling it, unless I can prove that the historical thing doesn't apply anymore.

bit 27 was old HA_NO_VARCHAR (unused) and removed in commit 6a6cc8a653a
Let me know if my assumption is wrong.

gkodinov · 2026-02-13T12:53:04Z

mysql-test/suite/myisam/t/myisam_null_only.test

+  (3, REPEAT('b', 10000)),
+  (4, NULL);
+
+SELECT COUNT(*) FROM t_v WHERE v IS NULL;


This is probably running through the new code. But how do you measure this? IMHO you need some observability here to verify that the new code path is indeed taken.

Updated the tests to see observability that the new code path is used

gkodinov · 2026-02-13T12:55:05Z

mysql-test/suite/maria/maria_null_only.test

+  (3, REPEAT('y', 200000)),
+  (4, NULL);
+
+SELECT COUNT(*) FROM t_v WHERE v IS NULL;


You need some observability here too to ensure that the new path is being taken. I can imagine a couple of status metrics:

a metric to measure how many columns does the optimizer put in "is null" mode.

a metric to show if the SE supports this

a metric per SE to demonstrate if the SE actually uses the data.

You also need tests for all scenarios: a JOIN with an ON clauses and a subquery in addition to the one you already have (a WHERE clause).

Added tests for all scenarios. Haven't done the "metric per SE using the data" since it requires extra path instrumentation increasing complexity and the current coverage proves optimizer selection + correctness for all engines and physical effect for InnoDB
However if needed per-SE observability in this PR, I can add those counters in a follow up patch.
@gkodinov Requesting to review. Thanks

alexaustin007 · 2026-02-13T22:16:41Z

This is a preliminary review. Marko has already provided some feedback on the innodb part. When done with the preliminary review I'll ask for a reviewer from optimizer too.

Please fill in the commit message according to the coding style.

Also, please add details of your implementation: what SEs are covered, what scenarios, what conditions etc.

Added in the description above

MDEV-38315 Optimize NULL only reads for wide tables in core engines

782c896

dr-m reviewed Feb 6, 2026

View reviewed changes

MDEV-38315 Optimize NULL only reads for InnoDB, Aria and MyISAM

2e3b05a

gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 9, 2026

gkodinov self-assigned this Feb 13, 2026

gkodinov requested changes Feb 13, 2026

View reviewed changes

MDEV-38315 Add NULL-only read optimization for core engines

8954531

alexaustin007 requested a review from gkodinov February 13, 2026 22:16

Uh oh!

Conversation

alexaustin007 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conditions / gating

Observability

Engine-level validation

Tests updated

Uh oh!

dr-m left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gkodinov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexaustin007 commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

alexaustin007 commented Feb 6, 2026 •

edited

Loading