From: karmadon@my-dejanews.com (karmadon@my-dejanews.com) Subject: nanosleep/sleep/SIGCONT bug Newsgroups: comp.os.linux.development.system Date: 1998/11/16 Hi, On 2.0.x kernel with glibc.2.x try the following: $ sleep 100 ^Z $ fg sleep will return immediately, instead of sleeping for the remaining interval. This did not happen with older libc.5.x The bug lies in the implementation of sleep(2) in glibc, which does it through nanosleep(2) call. (older libc used alarm(2) and a signal handler). nanosleep (or sys_nanosleep) in kernel/sched.c returns -EINTR when interrupted: 1652 expire = timespectojiffies(&t) + (t.tv_sec || t.tv_nsec) + jiffi 1653 current->timeout = expire; 1654 current->state = TASK_INTERRUPTIBLE; 1655 schedule(); 1656 1657 if (expire > jiffies) { 1658 if (rmtp) { 1659 jiffiestotimespec(expire - jiffies - 1660 (expire > jiffies + 1), &t); 1661 memcpy_tofs(rmtp, &t, sizeof(struct timespec)); 1662 } 1663 return -EINTR; 1664 } However, from the logic in arch/i386/kernel/signal.c, I can see that the function which wants to be re-started after benign signal such as SIGCONT, SIGWINCH, SIGCHLD should return -ERESTARTNOHAND, like a few hundred lines above in sys_pause() implementation is kernel/sched.c: 507 asmlinkage int sys_pause(void) 508 { 509 current->state = TASK_INTERRUPTIBLE; 510 schedule(); 511 return -ERESTARTNOHAND; 512 } 513 I re-compiled the kernel with the return changed to -ERESTARTNOHAND, and, lo and behold, everything works properly now. Could some of you, system gurus, take a look at it? Thanks, Igor -- Igor Shpigelman "Homo sapiens are always capable of thinking, Yet Another UNIX Hacker but not always able to think" A.B.Strugatski -----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your OwnFrom: Linus Torvalds (torvalds@transmeta.com) Subject: Re: nanosleep/sleep/SIGCONT bug Newsgroups: comp.os.linux.development.system Date: 1998/11/16 In article <72q4hp$59d$1@nnrp1.dejanews.com>, <karmadon@my-dejanews.com> wrote: > >I re-compiled the kernel with the return changed to -ERESTARTNOHAND, and, >lo and behold, everything works properly now. > >Could some of you, system gurus, take a look at it? Your change makes the thing restart. Which is a good thing in general. HOWEVER, it doesn't address the fact that the "nanosleep()" system call by design simply is not restartable. With your change, it will now restart with the same timeout it had originally, unless you have made sure to alias the original and the result timers. As such, suddenly "nanosleep()" isn't nanosleep() any more, but "sleep for arbitrarily long if certain signals happen". Which may be the right behaviour for your application, but not in general. The right thing to do is to _not_ use "nanosleep()" for sleeping, and that requires a glibc change. Linus
From: Erik Westlin (westlin@msi.se) Subject: Re: nanosleep/sleep/SIGCONT bug Newsgroups: comp.os.linux.development.system Date: 1998/11/18 karmadon@my-dejanews.com wrote: > > Hi, > > On 2.0.x kernel with glibc.2.x try the following: > > $ sleep 100 > ^Z > $ fg > > sleep will return immediately, instead of sleeping for the remaining interval. > This did not happen with older libc.5.x > > The bug lies in the implementation of sleep(2) in glibc, which does it through > nanosleep(2) call. (older libc used alarm(2) and a signal handler). > I suspect this is causing trouble in the dce-rpc 1.1 package. Maybe it would be possible to replace the sleep(2) in glibc2 with the one in libc5? If so maybe you could post it if you have the code at hand. ------------------------------------------------------------------------ Erik Westlin Manne Siegbahn Laboratory email: westlin@msi.se
From: Matthew Hannigan (mlh@zipper.zip.com.au) Subject: Re: nanosleep/sleep/SIGCONT bug Newsgroups: comp.os.linux.development.system Date: 1998/11/19 In article <365286AF.4306@msi.se>, Erik Westlin <westlin@msi.se> wrote: >karmadon@my-dejanews.com wrote: >> >> Hi, >> >> On 2.0.x kernel with glibc.2.x try the following: >> >> $ sleep 100 >> ^Z >> $ fg >> >> sleep will return immediately, instead of sleeping for the remaining interval. >> This did not happen with older libc.5.x >> >> The bug lies in the implementation of sleep(2) in glibc, which does it through >> nanosleep(2) call. (older libc used alarm(2) and a signal handler). >> > >I suspect this is causing trouble in the dce-rpc 1.1 package. >Maybe it would be possible to replace the sleep(2) in glibc2 with the >one in libc5? If so maybe you could post it if you have the code >at hand. > H.J.Lu just sent some nanosleep code for this on the linux-kernel mailing list. -- -Matt Hannigan
From: Linus Torvalds (torvalds@transmeta.com) Subject: Re: nanosleep/sleep/SIGCONT bug Newsgroups: comp.os.linux.development.system Date: 1998/11/19 In article <730611$rj4$1@the-fly.zip.com.au>, Matthew Hannigan <mlh@zipper.zip.com.au> wrote: > >H.J.Lu just sent some nanosleep code for this on the linux-kernel >mailing list. I don't think that is enough. What hjl did was a special case for SIGCHLD only. It doesn't fix the fact that nanosleep() simply _cannot_ be restarted. You can hide some of the problems (SIGCHLD) by letting zombies stay around, but there is no way you can hide the basic restartability issue. I think the only fix is to go back to the libc-5 implementation. Alternatively, the interface to "nanosleep()" inside the kernel can be completely revamped so that there is only one buffer that holds both the incoming and the outgoing values, so that nanosleep can be restarted with the proper timeout. For example, doing something like this is likely to be ok: nanosleep(×pec, ×pec); where the _modified_ timespec is required to be the _same_ as the incoming timespec, and then you change "sys_nanosleep()" to do what the original email in this thread suggested, ie make it return -ERESTARTNOHAND. Then it correctly handles the case of being restarted. But then it's not the POSIX nanosleep any more. Hjl, please go back to using the old "sleep()", because the current glibc one is buggy (even with your SIGCHLD changes). Users who use "nanosleep()" directly are supposed to restart by hand - but "sleep()" cannot do this correctly with nanosleep(). Linus --- [ hjl, in case you didn't follow it, the problem is a program that does sleep(10) and is suspended with ^Z only to be re-woken with "fg" - at which point it returns immediately. Which is wrong. ]