[C] Fix negative block size validation in datafile reader#3623
[C] Fix negative block size validation in datafile reader#3623martin-g merged 2 commits intoapache:mainfrom
Conversation
The file_read_block_count() function in datafile.c reads block size
using zigzag encoding, which can produce negative numbers from
malicious Avro container files. These negative values were passed
directly to avro_malloc(), causing allocation failures.
This patch adds validation to reject negative block size values with
a clear error message before attempting memory allocation.
Bug: Negative block size from varint decoding causes
allocation-size-too-big when cast to size_t
Impact: DoS via crafted .avro file
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This pull request adds validation to prevent crashes from malformed Avro container files that contain negative block size values. The fix addresses a security vulnerability where negative values from zigzag-encoded data could be passed to memory allocation functions, causing allocation errors or undefined behavior.
Changes:
- Added validation in
file_read_block_count()to reject negative block size values - Returns
EINVALwith a descriptive error message when invalid block size is detected - Prevents allocation-size-too-big errors from AddressSanitizer
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (len < 0) { | ||
| avro_set_error("Invalid block size: %" PRId64, len); | ||
| return EINVAL; | ||
| } |
There was a problem hiding this comment.
Consider adding a regression test that verifies the negative block size validation. While the fix was verified with AddressSanitizer fuzzing, a unit test with a malformed Avro file containing a negative block size would help prevent regressions. This could follow the pattern of other test files like test_avro_1237.c which test handling of malformed Avro files.
|
Thanks for the reply! Here is the context of the crash we found: DescriptionNegative block size in Avro container file (OCF) causes Version
Steps to ReproduceMethod 1: Using avrocat (Easiest)Step 1: Build Avro C library with AddressSanitizer: git clone https://github.com/apache/avro.git
cd avro/lang/c
mkdir build && cd build
cmake .. \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_C_FLAGS="-fsanitize=address -g -O1" \
-DCMAKE_EXE_LINKER_FLAGS="-fsanitize=address"
make -j$(nproc)Step 2: Create the malicious Avro container file (83 bytes): # One-liner to create poc.avro
echo '4f626a0104166176726f2e736368656d611e7b2274797065223a226e756c6c227d146176726f2e636f646563086e756c6c000102030405060708090a0b0c0d0e0f10000102030405060708090a0b0c0d0e0f10' | xxd -r -p > poc.avroStep 3: Trigger the crash: ./src/avrocat poc.avroMethod 2: Using the fuzzerStep 1: Build Avro C library with AddressSanitizer (same as above). Step 2: Save the fuzzer code below as Step 3: Build the fuzzer: clang -g -O1 -fsanitize=address,fuzzer \
-I../src \
datafile_fuzzer.c \
-L./src -lavro \
-Wl,-rpath,./src \
-o datafile_fuzzerStep 4: Create PoC and run: echo '4f626a0104166176726f2e736368656d611e7b2274797065223a226e756c6c227d146176726f2e636f646563086e756c6c000102030405060708090a0b0c0d0e0f10000102030405060708090a0b0c0d0e0f10' | xxd -r -p > poc.avro
./datafile_fuzzer poc.avroExpected BehaviorThe Avro C library should validate that block size is non-negative and return an error for malformed files. Actual BehaviorRoot Cause AnalysisIn static int file_read_block_count(avro_file_reader_t r)
{
int64_t len;
...
check_prefix(rval, enc->read_long(r->reader, &len),
"Cannot read file block size: ");
if (!r->current_blockdata) {
r->current_blockdata = (char *) avro_malloc(len); // BUG: len can be negative!
...
}
}Control flow for crash file:
File structure of PoC: Impact
Attack vectors:
Suggested Fix
Add validation for negative block size in static int file_read_block_count(avro_file_reader_t r)
{
int64_t len;
...
check_prefix(rval, enc->read_long(r->reader, &len),
"Cannot read file block size: ");
if (len < 0) {
avro_set_error("Invalid block size: %" PRId64, len);
return EINVAL;
}
if (!r->current_blockdata) {
r->current_blockdata = (char *) avro_malloc(len);
...
}
}FuzzerThis issue was discovered using a custom Avro datafile fuzzer: /*
* Copyright 2026 O2Lab @ Texas A&M University
*
* Fuzzer for Avro C DataFile Reader
* Target: avro_file_reader_fp() and avro_file_reader_read_value()
*/
#include <stdint.h>
#include <stddef.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <avro.h>
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size < 4) {
return 0;
}
/* Write fuzz data to a temporary file */
char template[] = "/tmp/avro_fuzz_XXXXXX";
int fd = mkstemp(template);
if (fd < 0) {
return 0;
}
ssize_t written = write(fd, data, size);
if (written != (ssize_t)size) {
close(fd);
unlink(template);
return 0;
}
lseek(fd, 0, SEEK_SET);
FILE *fp = fdopen(fd, "rb");
if (fp == NULL) {
close(fd);
unlink(template);
return 0;
}
avro_file_reader_t reader = NULL;
avro_value_iface_t *iface = NULL;
avro_value_t value;
int rc;
rc = avro_file_reader_fp(fp, template, 0, &reader);
if (rc != 0 || reader == NULL) {
fclose(fp);
unlink(template);
return 0;
}
avro_schema_t schema = avro_file_reader_get_writer_schema(reader);
if (schema == NULL) {
avro_file_reader_close(reader);
fclose(fp);
unlink(template);
return 0;
}
iface = avro_generic_class_from_schema(schema);
if (iface == NULL) {
avro_schema_decref(schema);
avro_file_reader_close(reader);
fclose(fp);
unlink(template);
return 0;
}
memset(&value, 0, sizeof(value));
rc = avro_generic_value_new(iface, &value);
if (rc != 0) {
avro_value_iface_decref(iface);
avro_schema_decref(schema);
avro_file_reader_close(reader);
fclose(fp);
unlink(template);
return 0;
}
/* Read up to 100 values */
for (int i = 0; i < 100; i++) {
rc = avro_file_reader_read_value(reader, &value);
if (rc != 0) {
break;
}
avro_value_reset(&value);
}
avro_value_decref(&value);
avro_value_iface_decref(iface);
avro_schema_decref(schema);
avro_file_reader_close(reader);
fclose(fp);
unlink(template);
return 0;
} |
This addresses reviewer feedback about handling allocation failures. When block_size is a very large positive number, malloc/realloc will fail and return NULL. Without NULL checks, this would lead to a null pointer dereference. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@martin-g All changes are done and passed all test cases. Please LMK if you have any concern. |
|
Thank you, @OwenSanzas ! |
|
@martin-g Thanks for the guidance. You're the hero! |
* [C] Fix negative block size validation in datafile reader
The file_read_block_count() function in datafile.c reads block size
using zigzag encoding, which can produce negative numbers from
malicious Avro container files. These negative values were passed
directly to avro_malloc(), causing allocation failures.
This patch adds validation to reject negative block size values with
a clear error message before attempting memory allocation.
Bug: Negative block size from varint decoding causes
allocation-size-too-big when cast to size_t
Impact: DoS via crafted .avro file
Co-Authored-By: Claude <noreply@anthropic.com>
* Add NULL checks after malloc/realloc in file_read_block_count
This addresses reviewer feedback about handling allocation failures.
When block_size is a very large positive number, malloc/realloc
will fail and return NULL. Without NULL checks, this would lead
to a null pointer dereference.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Summary
Fix missing validation for negative block size values in
file_read_block_count()function indatafile.c.Problem
The block size is read using zigzag encoding which can decode to negative numbers from malicious Avro container files. These negative values were passed directly to
avro_malloc(), causing:int64_tis cast tosize_t.avrofilesChanges
len < 0check infile_read_block_count()before allocationEINVALwith descriptive error message on invalid inputTesting
Verified with AddressSanitizer fuzzing - crash no longer reproduces.