Skip to content

Conversation

@jinpzhanAMD
Copy link

@jinpzhanAMD jinpzhanAMD commented Jan 19, 2026

Summary
This PR fixes a startup crash in the bundled mimalloc on Windows observed in TLS-slot-heavy processes. The crash occurred during mimalloc initialization when allocating “fast TLS user slots”.

Problem
mimalloc’s Windows fast-path TLS model relies on directly accessing the TEB TlsSlots[64] array. Current implementation assumed TlsAlloc() would return a TLS index that fits into the first 63 entries.
In environments where many DLLs/layers already consume TLS indices, TlsAlloc() returns an index >= 63, which cannot be addressed via the TEB's fixed 64-slot array, resulted in _mi_error_message(EFAULT, ...), causing a crash during mimalloc initialization.

Solution

  • If the allocated TLS indices are >= 63, fall back to the Win32 TLS APIs (TlsGetValue/TlsSetValue) for correctness.
  • If TlsAlloc() fails with TLS_OUT_OF_INDEXES, fall back to compiler TLS (__declspec(thread)) to store the default/cached theap pointers, avoiding reliance on OS TLS indices.
  • Update the inline accessors in so reads and writes consistently follow the selected fallback mode.

Behavioral impact
Prevents mimalloc initialization from aborting in TLS-index-constrained processes.
Preserves the fast path when possible; uses progressively safer fallbacks only when required.
Expected minor performance impact only in the fallback modes.

@daanx
Copy link
Collaborator

daanx commented Jan 31, 2026

Hi @jinpzhanAMD -- thanks again for putting in the work on this PR! The original code was indeed counting on not more than 64 TlsAlloc's and that is clearly too low a limit. I looked at the PR but we need to write this code quite carefully as it is all meant to improve the fast path over regular thread_local variables. I pushed a commit that now uses the Tls expansion slots when needed and this should work up to 1088 TlsAlloc's (which is the maximum anyway).

I think this will also address your other PR with the recursive initialization: that was an artifact of the initial limit as the initial slot was now containing an abritrary value instead of NULL. Let me know how it goes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants