[uclibc-ng-devel] bug: stdio funcs, thread cancellation with futex hold
dalias at libc.org
Fri Sep 27 17:03:06 CEST 2019
On Thu, Sep 26, 2019 at 05:06:51PM +0200, Yann Sionneau wrote:
> I would like to know if there is consensus on the fact that this
> program hanging on a deadlock is a uClibc-ng bug:
> I've asked people to try this on several arch/libc combination, it
> works well (no hang) on:
> * x86_64/glibc
> * x86_64/musl
> * armv7l/musl
> Also, I've run the puts.c test (linked above) on 3 archs with
> uClibc-ng, it fails (hangs) on:
> * or1k/uClibc-ng
> * mips32r6/uClibc-ng
> * k1c/uClibc-ng (port not completely published yet)
> See failure logs + strace -f of what happens when hanging: https://mypads.framapad.org/mypads/?/mypads/group/uclibc-ng-10be37ap/pad/view/puts-issue-qt62e74n
> My understanding of the issue is that:
> puts, and possibly other libc functions, are taking a lock ( https://elixir.bootlin.com/uclibc-ng/latest/source/libc/stdio/puts.c#L17
> ) and end up calling write() which is a cancellation point. ( https://elixir.bootlin.com/uclibc-ng/latest/source/libc/stdio/_stdio.h#L150
> So, if a thread is canceled, is asynchronous mode (which is the
> default one), and the cancelation is triggered by the write() inside
> the puts(), then the thread will unwind and exit without unlocking
> the puts lock.
> Then, any other thread calling puts() will hang indefinetely (and
> hang other threads if it hangs with locks held...).
> My understanding of what can be done to fix this issue:
> 1/ Either make puts a non cancelation point (see man 7 pthreads,
> puts is not listed in mandatory cancelation point, only in "may").
> For instance it is not a cancelation point in glibc.
> 2/ Or keep puts as a cancelation point and fix the puts code so that
> it releases the lock upon cancelation (using
> pthread_cleanup_push/pop for instance)
> In case people think this is indeed a bug, here are examples of code
> fixes that I have in mind, please don't hesitate to comment or/and
> propose something else:
> 1/ https://pastebin.com/ePsWJzdi
> 2/ https://pastebin.com/5EA4RedS
> One problem of those fixes is that we need to identify all libc
> functions that take a lock and call a cancellable function and apply
> such kind of fixes... This is not easy and a bit painful.
I think your analysis is correct here. On top of that, though, uclibc
has the broken, inherently-racy cancellation implementation inherited
from glibc. See:
As such, even if you fix the above bug, it will be unsafe to use it,
and critically unsafe unless you block cancellation around at least
resource-freeing operations like close. I'd go so far as to say that,
if uclibc can't fix this, it should ignore cancellation points which
are resource-freeing operations (close, maybe a few others).
Resource-allocating ones are also problematic but "just" resource
leaks at worst; maybe cancellation should be ignored for them too.
More information about the devel