-
-
Notifications
You must be signed in to change notification settings - Fork 34.2k
Description
Description
Currently, bytearray.resize is not thread-safe (as opposed to, from what I can see, other bytearray operations).
The problem seems to come from the fact bytearray.resize (which wraps bytearray_resize_impl) calls into PyByteArray_Resize (which locks), then does its own thing (setting the new bytes to 0) without the lock held. This means if another thread is calling into resize at the same time, one of the threads may end up writing to memory that doesn't "exist" anymore.
To be thread safe, resize would need to lock the bytearray for the duration of the whole resize operation.
Example of a problematic sequence:
- Main thread initialises a
bytearraywith 10 elements - T1 calls
resize(1000), completes thePyByteArray_Resizecall - T2 calls
resize(10), completes thePyByteArray_Resizecall - T1 proceeds to filling (or continues to fill) the "new buffer space" with 0's, but most of it (in our case
1000 - 10) is now invalid memory as the underlying buffer has been replaced with a smaller one.
(Note: there are several paths to PyByteArray_Resize, some of which keep the same existing buffer and just change the "apparent size" of the bytearray, but going from 1000 to 10 crosses the n / 2 threshold at which a new, smaller buffer is allocated and replaces the existing one through PyBytes_FromStringAndSize.)
Proposed fix
Add a @critical_section like other methods in bytearray do it.
I have a branch in my fork PR #145714 that implements this fix and that I can open once the bug is confirmed.
After making this change, my reproducer does not crash anymore.
Reproducer
Running the following on the latest main, free-threaded build with TSan...
from threading import Thread
ba = bytearray(100)
def f():
for _ in range(100_000):
try:
ba.resize(10_000)
ba.resize(1)
except (BufferError, ValueError):
pass
threads = [Thread(target=f) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()... gives me ...
==================
WARNING: ThreadSanitizer: data race (pid=67656)
Write of size 8 at 0x00012491b848 by thread T2:
#0 bytearray_resize_lock_held bytearrayobject.c:285 (python.exe:arm64+0x10006875c)
#1 bytearray_resize bytearrayobject.c.h:628 (python.exe:arm64+0x100072d6c)
#2 _PyEval_EvalFrameDefault generated_cases.c.h:4041 (python.exe:arm64+0x10028fa58)
#3 _PyEval_Vector ceval.c:2132 (python.exe:arm64+0x1002856a8)
#4 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008e0f0)
#5 method_vectorcall classobject.c:74 (python.exe:arm64+0x1000923ac)
#6 context_run context.c:727 (python.exe:arm64+0x1002d0d64)
#7 method_vectorcall_FASTCALL_KEYWORDS descrobject.c:421 (python.exe:arm64+0x1000a5258)
#8 PyObject_Vectorcall call.c:327 (python.exe:arm64+0x10008daac)
#9 _Py_VectorCallInstrumentation_StackRefSteal ceval.c:769 (python.exe:arm64+0x100286174)
#10 _PyEval_EvalFrameDefault generated_cases.c.h:1817 (python.exe:arm64+0x1002929f0)
#11 _PyEval_Vector ceval.c:2132 (python.exe:arm64+0x1002856a8)
#12 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008e0f0)
#13 method_vectorcall classobject.c:74 (python.exe:arm64+0x1000923ac)
#14 _PyObject_Call call.c:348 (python.exe:arm64+0x10008dd68)
#15 PyObject_Call call.c:373 (python.exe:arm64+0x10008dddc)
#16 thread_run _threadmodule.c:387 (python.exe:arm64+0x10043afe8)
#17 pythread_wrapper thread_pthread.h:234 (python.exe:arm64+0x10037b2a0)
Previous read of size 8 at 0x00012491b848 by thread T1:
#0 bytearray_resize bytearrayobject.c.h:628 (python.exe:arm64+0x100072e08)
#1 _PyEval_EvalFrameDefault generated_cases.c.h:4041 (python.exe:arm64+0x10028fa58)
#2 _PyEval_Vector ceval.c:2132 (python.exe:arm64+0x1002856a8)
#3 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008e0f0)
#4 method_vectorcall classobject.c:74 (python.exe:arm64+0x1000923ac)
#5 context_run context.c:727 (python.exe:arm64+0x1002d0d64)
#6 method_vectorcall_FASTCALL_KEYWORDS descrobject.c:421 (python.exe:arm64+0x1000a5258)
#7 PyObject_Vectorcall call.c:327 (python.exe:arm64+0x10008daac)
#8 _Py_VectorCallInstrumentation_StackRefSteal ceval.c:769 (python.exe:arm64+0x100286174)
#9 _PyEval_EvalFrameDefault generated_cases.c.h:1817 (python.exe:arm64+0x1002929f0)
#10 _PyEval_Vector ceval.c:2132 (python.exe:arm64+0x1002856a8)
#11 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008e0f0)
#12 method_vectorcall classobject.c:74 (python.exe:arm64+0x1000923ac)
#13 _PyObject_Call call.c:348 (python.exe:arm64+0x10008dd68)
#14 PyObject_Call call.c:373 (python.exe:arm64+0x10008dddc)
#15 thread_run _threadmodule.c:387 (python.exe:arm64+0x10043afe8)
#16 pythread_wrapper thread_pthread.h:234 (python.exe:arm64+0x10037b2a0)
Thread T2 (tid=26268866, running) created by main thread at:
...
Thread T1 (tid=26268865, running) created by main thread at:
...
SUMMARY: ThreadSanitizer: data race bytearrayobject.c:285 in bytearray_resize_lock_held
==================
==================
WARNING: ThreadSanitizer: data race (pid=67656)
Write of size 8 at 0x00012491b848 by thread T2:
#0 bytearray_resize_lock_held bytearrayobject.c:285 (python.exe:arm64+0x10006875c)
#1 bytearray_resize bytearrayobject.c.h:628 (python.exe:arm64+0x100072d98)
#2 _PyEval_EvalFrameDefault generated_cases.c.h:4041 (python.exe:arm64+0x10028fa58)
#3 _PyEval_Vector ceval.c:2132 (python.exe:arm64+0x1002856a8)
#4 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008e0f0)
#5 method_vectorcall classobject.c:74 (python.exe:arm64+0x1000923ac)
#6 context_run context.c:727 (python.exe:arm64+0x1002d0d64)
#7 method_vectorcall_FASTCALL_KEYWORDS descrobject.c:421 (python.exe:arm64+0x1000a5258)
#8 PyObject_Vectorcall call.c:327 (python.exe:arm64+0x10008daac)
#9 _Py_VectorCallInstrumentation_StackRefSteal ceval.c:769 (python.exe:arm64+0x100286174)
#10 _PyEval_EvalFrameDefault generated_cases.c.h:1817 (python.exe:arm64+0x1002929f0)
#11 _PyEval_Vector ceval.c:2132 (python.exe:arm64+0x1002856a8)
#12 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008e0f0)
#13 method_vectorcall classobject.c:74 (python.exe:arm64+0x1000923ac)
#14 _PyObject_Call call.c:348 (python.exe:arm64+0x10008dd68)
#15 PyObject_Call call.c:373 (python.exe:arm64+0x10008dddc)
#16 thread_run _threadmodule.c:387 (python.exe:arm64+0x10043afe8)
#17 pythread_wrapper thread_pthread.h:234 (python.exe:arm64+0x10037b2a0)
Previous read of size 8 at 0x00012491b848 by thread T3:
#0 bytearray_resize bytearrayobject.c.h:628 (python.exe:arm64+0x100072e08)
#1 _PyEval_EvalFrameDefault generated_cases.c.h:4041 (python.exe:arm64+0x10028fa58)
#2 _PyEval_Vector ceval.c:2132 (python.exe:arm64+0x1002856a8)
#3 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008e0f0)
#4 method_vectorcall classobject.c:74 (python.exe:arm64+0x1000923ac)
#5 context_run context.c:727 (python.exe:arm64+0x1002d0d64)
#6 method_vectorcall_FASTCALL_KEYWORDS descrobject.c:421 (python.exe:arm64+0x1000a5258)
#7 PyObject_Vectorcall call.c:327 (python.exe:arm64+0x10008daac)
#8 _Py_VectorCallInstrumentation_StackRefSteal ceval.c:769 (python.exe:arm64+0x100286174)
#9 _PyEval_EvalFrameDefault generated_cases.c.h:1817 (python.exe:arm64+0x1002929f0)
#10 _PyEval_Vector ceval.c:2132 (python.exe:arm64+0x1002856a8)
#11 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008e0f0)
#12 method_vectorcall classobject.c:74 (python.exe:arm64+0x1000923ac)
#13 _PyObject_Call call.c:348 (python.exe:arm64+0x10008dd68)
#14 PyObject_Call call.c:373 (python.exe:arm64+0x10008dddc)
#15 thread_run _threadmodule.c:387 (python.exe:arm64+0x10043afe8)
#16 pythread_wrapper thread_pthread.h:234 (python.exe:arm64+0x10037b2a0)
Thread T2 (tid=26268866, running) created by main thread at:
...
Thread T3 (tid=26268867, running) created by main thread at:
...
SUMMARY: ThreadSanitizer: data race bytearrayobject.c:285 in bytearray_resize_lock_held
==================
ThreadSanitizer:DEADLYSIGNAL
==67656==ERROR: ThreadSanitizer: SEGV on unknown address 0x0000000000f0 (pc 0x00010276beb4 bp 0x0001719ee340 sp 0x0001719ee310 T26268868)
==67656==The signal is caused by a READ memory access.
==67656==Hint: address points to the zero page.
#0 _PyForIter_VirtualIteratorNext ceval.c:3715 (python.exe:arm64+0x1002a3eb4)
#1 _PyEval_EvalFrameDefault generated_cases.c.h:5842 (python.exe:arm64+0x10029944c)
#2 _PyEval_Vector ceval.c:2132 (python.exe:arm64+0x1002856a8)
#3 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008e0f0)
#4 method_vectorcall classobject.c:74 (python.exe:arm64+0x1000923ac)
#5 context_run context.c:727 (python.exe:arm64+0x1002d0d64)
#6 method_vectorcall_FASTCALL_KEYWORDS descrobject.c:421 (python.exe:arm64+0x1000a5258)
#7 PyObject_Vectorcall call.c:327 (python.exe:arm64+0x10008daac)
#8 _Py_VectorCallInstrumentation_StackRefSteal ceval.c:769 (python.exe:arm64+0x100286174)
#9 _PyEval_EvalFrameDefault generated_cases.c.h:1817 (python.exe:arm64+0x1002929f0)
#10 _PyEval_Vector ceval.c:2132 (python.exe:arm64+0x1002856a8)
#11 _PyFunction_Vectorcall call.c (python.exe:arm64+0x10008e0f0)
#12 method_vectorcall classobject.c:74 (python.exe:arm64+0x1000923ac)
#13 _PyObject_Call call.c:348 (python.exe:arm64+0x10008dd68)
#14 PyObject_Call call.c:373 (python.exe:arm64+0x10008dddc)
#15 thread_run _threadmodule.c:387 (python.exe:arm64+0x10043afe8)
#16 pythread_wrapper thread_pthread.h:234 (python.exe:arm64+0x10037b2a0)
#17 __tsan_thread_start_func <null> (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x2f678)
#18 _pthread_start <null> (libsystem_pthread.dylib:arm64e+0x6c04)
#19 thread_start <null> (libsystem_pthread.dylib:arm64e+0x1ba4)
==67656==Register values:
x[0] = 0x00000001054f8000 x[1] = 0x00001000000001e0 x[2] = 0x000000004df51dff x[3] = 0x00000001719ee490
x[4] = 0x0000000000000003 x[5] = 0x0000000000000000 x[6] = 0x0000000000000005 x[7] = 0x0000000000000000
x[8] = 0x000000000f080d9b x[9] = 0x000000000df51dff x[10] = 0x000000000df51d00 x[11] = 0x000000010f9607c0
x[12] = 0x00001000000001e0 x[13] = 0x0000000000000000 x[14] = 0x0000000000000000 x[15] = 0x000000000000000e
x[16] = 0x0000000199a1ac18 x[17] = 0x000000010345c9f8 x[18] = 0x0000000000000000 x[19] = 0x0000000132060140
x[20] = 0x0000000122100000 x[21] = 0x00000001054fc1a8 x[22] = 0x0000000000000000 x[23] = 0x0000000000000001
x[24] = 0x00000001054fc210 x[25] = 0x0000000122100000 x[26] = 0x000000013207019c x[27] = 0x0000000102acbde0
x[28] = 0x00000001054fc1a8 fp = 0x00000001719ee340 lr = 0x000000010276beb4 sp = 0x00000001719ee310
ThreadSanitizer can not provide additional info.
SUMMARY: ThreadSanitizer: SEGV ceval.c:3715 in _PyForIter_VirtualIteratorNext
==67656==ABORTING
zsh: abort ./python.exe