Fix crash due to asio object lifetime and thread safety issue#551
Fix crash due to asio object lifetime and thread safety issue#551BewareMyPower wants to merge 5 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a crash caused by ASIO completion handlers running during/after Client::close() when there are still in-flight network operations (notably pending async_connect), which can lead to use-after-free of socket / io_context internals during shutdown.
Changes:
- Adds tracking of in-flight (non-timer) async operations in
ClientConnectionand exposes apendingOperations()counter. - Adjusts connection shutdown behavior to avoid closing the socket while the TCP connect is still pending, deferring close handling to
handleTcpConnected. - Makes
ConnectionPool::close()wait (up to 5s) for each connection’s pending operations to complete before clearing the pool.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/PulsarFriend.h | Uses C++17 lock-guard deduction when locking ClientConnection internals in tests. |
| lib/MockServer.h | Updates mutex locking to use C++17 lock-guard deduction for consistency. |
| lib/ConnectionPool.cc | Waits for pending async ops per connection during pool shutdown to reduce shutdown-time crashes. |
| lib/ClientConnection.h | Introduces pendingOperations_ counter, exposes accessor, and changes mutex type to recursive_mutex. |
| lib/ClientConnection.cc | Instruments connect/handshake/read/write paths for pending-op tracking and modifies shutdown/connect completion handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
6e3901c to
94b70bf
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Fixes: #550
Motivation
ClientTest.testCloseClientis flaky that sometimes it could cause segmentation fault. This is becauseclient.close()could be called quickly aftercreateProducerAsync. In this case, theasync_connectoperation might not complete. In theclosemethod,socket_->close()is called, which could complete theasync_connectwithoperation_aborted. In asio, when the completion handler (callback) is called while the caller (socket) is destroyed, it will crash because the callback refers the caller internally.There are several issues.
Issue 1: Thread safety issue when accessing asio objects (e.g. socket)
ip::tcp::socket::closeis called byClientConnection::close, which could be called in a separated thread created byClientImpl::handleCloseor directly byClientImpl::shutdown. However, it's unsafe to accesssocketconcurrently, see https://think-async.com/Asio/asio-1.36.0/doc/asio/reference/ip__tcp/socket.htmlssl::streamhas the same issue, that could be why the operations ontlsSocket_were bound tostrand_before. Actually, it's also unsafe to callasync_writedirectly onsocket_because it could be called byProducerImpl::sendMessagein a different thread from the event loop thread.Issue 2: The lifetime of asio objects might be shorter than the callbacks
#317 might not fix the null access issue correctly. It introduced several crash issues (#325, #326) by capturing a
weak_ptr. The issue is thatConnectionPool::closedon't wait until all pending asynchronous operations are done. Then, after theClientConnectionobject is destroyed, the asio objects are also destroyed. However, theio_contextcould still try accessing them when destroying the internal callback queue.Modifications
shared_from_this()in all callbacks of asynchronous operations.close, dispatch thesocket::close()call in event loop thread. Forssl::streamobject, callasync_shutdown. The previous close onlower_layeris actually equivalent with closing the socket.ConnectionPool::close, wait until all pending operations are done by checking the reference count ofClientConnectionPeriodicalTaskwith a simple timer and cancel it inclose