[uclibc-ng-devel] bug: stdio funcs, thread cancellation with futex hold

Rich Felker dalias at libc.org
Fri Sep 27 17:03:06 CEST 2019


On Thu, Sep 26, 2019 at 05:06:51PM +0200, Yann Sionneau wrote:
> Hello,
> 
> I would like to know if there is consensus on the fact that this
> program hanging on a deadlock is a uClibc-ng bug:
> https://pastebin.com/11qLsTW5
> 
> I've asked people to try this on several arch/libc combination, it
> works well (no hang) on:
> 
> * x86_64/glibc
> 
> * x86_64/musl
> 
> * armv7l/musl
> 
> Also, I've run the puts.c test (linked above) on 3 archs with
> uClibc-ng, it fails (hangs) on:
> 
> * or1k/uClibc-ng
> 
> * mips32r6/uClibc-ng
> 
> * k1c/uClibc-ng (port not completely published yet)
> 
> See failure logs + strace -f of what happens when hanging: https://mypads.framapad.org/mypads/?/mypads/group/uclibc-ng-10be37ap/pad/view/puts-issue-qt62e74n
> 
> 
> My understanding of the issue is that:
> 
> puts, and possibly other libc functions, are taking a lock ( https://elixir.bootlin.com/uclibc-ng/latest/source/libc/stdio/puts.c#L17
> ) and end up calling write() which is a cancellation point. ( https://elixir.bootlin.com/uclibc-ng/latest/source/libc/stdio/_stdio.h#L150
> )
> 
> So, if a thread is canceled, is asynchronous mode (which is the
> default one), and the cancelation is triggered by the write() inside
> the puts(), then the thread will unwind and exit without unlocking
> the puts lock.
> 
> Then, any other thread calling puts() will hang indefinetely (and
> hang other threads if it hangs with locks held...).
> 
> My understanding of what can be done to fix this issue:
> 
> 1/ Either make puts a non cancelation point (see man 7 pthreads,
> puts is not listed in mandatory cancelation point, only in "may").
> For instance it is not a cancelation point in glibc.
> 
> 2/ Or keep puts as a cancelation point and fix the puts code so that
> it releases the lock upon cancelation (using
> pthread_cleanup_push/pop for instance)
> 
> In case people think this is indeed a bug, here are examples of code
> fixes that I have in mind, please don't hesitate to comment or/and
> propose something else:
> 
> 1/ https://pastebin.com/ePsWJzdi
> 
> 2/ https://pastebin.com/5EA4RedS
> 
> 
> One problem of those fixes is that we need to identify all libc
> functions that take a lock and call a cancellable function and apply
> such kind of fixes... This is not easy and a bit painful.

I think your analysis is correct here. On top of that, though, uclibc
has the broken, inherently-racy cancellation implementation inherited
from glibc. See:

- https://sourceware.org/bugzilla/show_bug.cgi?id=12683
- https://ewontfix.com/2/
- https://ewontfix.com/4/
- https://ewontfix.com/16/

As such, even if you fix the above bug, it will be unsafe to use it,
and critically unsafe unless you block cancellation around at least
resource-freeing operations like close. I'd go so far as to say that,
if uclibc can't fix this, it should ignore cancellation points which
are resource-freeing operations (close, maybe a few others).
Resource-allocating ones are also problematic but "just" resource
leaks at worst; maybe cancellation should be ignored for them too.

Rich


More information about the devel mailing list