#1 ✓ resolved
Echo Nolan

Exits immediately on "thread blocked indefinitely" exception

Reported by Echo Nolan | May 31st, 2009 @ 12:05 AM

-- stm-testframework-testcase.hs
import Control.Concurrent.STM
import Test.Framework
import Test.Framework.Providers.HUnit

main = defaultMain [testCase "should fail" (atomically retry)]

Compiling and running this:

enolan at enolan-laptop in ~/code 
$ ghc --make stm-testframework-testcase.hs 
[1 of 1] Compiling Main             ( stm-testframework-testcase.hs, stm-testframework-testcase.o )
Linking stm-testframework-testcase ...

enolan at enolan-laptop in ~/code 
$ ./stm-testframework-testcase 
stm-testframework-testcase: thread blocked indefinitely

Running it in interpreted mode:

enolan at enolan-laptop in ~/code 
$ runhaskell stm-testframework-testcase.hs 
should fail: [Failed]
ERROR: thread blocked indefinitely

         Test Cases  Total      
 Passed  0           0          
 Failed  1           1          
 Total   1           1

After that, my cursor no longer blinks. Expected is behavior is interpreted and compiled modes both printing the summary and neither killing my cursor blink.

Calling error "foo" instead of atomically retry give correct behavior in compiled code, but the same cursor blink trouble. I don't know if the cursor blinking issue is related or not.

Comments and changes to this ticket

  • Max Bolingbroke

    Max Bolingbroke June 10th, 2009 @ 05:37 PM

    Thanks for the report.

    This really is a weird bug! I've gone through the code closing a number of loopholes in my handling of exceptions, but despite actually catching this exception perfectly, the thread still dies.

    What's more, when exactly it dies is nondeterministic. Here is the output of my version of test-framework augmented with tons of print statements to try and figure out what is going on:

    mbolingbroke@mb566 ~/Programming/Complete/test-framework/example
    $ cabal build && dist/build/test-framework-example/test-framework-example
    Preprocessing executables for test-framework-example-0.2.1...
    Building test-framework-example-0.2.1...
    Mehhh: # The example test case
    "Executing action" # Thread pool tells me it's doing something
    "ENTER" # Enter a bracket wrapped around testcase execution
    "Before" # Inside the bracket, about to perform test case
    "After" # Performed - execption presumably raised by now
    "Evaled" # After Exception.evaluate the result for good measure
    test-framework-example: thread blocked indefinitely # Printed by the RTS
    "Got ERROR" # Yep, the result from HUnit is Just (False, "thread blocked indefinitely") - as you would expect
    "thread blocked indefinitely"
    
    mbolingbroke@mb566 ~/Programming/Complete/test-framework/example
    $ dist/build/test-framework-example/test-framework-example
    Mehhh:
    "Executing action"
    "ENTER"
    "Before"
    "After"
    "Evaled"
    test-framework-example: thread blocked indefinitely
    "Got ERROR"
    "thread blocked indefinitely"
    "EXIT" # Doesn't happen in the other run - this is printed on LEAVING that bracket I mentioned abvoe
    "really done" # Printed after the whole bracket is done, but before I Exception.evaluate the value which must be TestCaseError - but that must be in WHNF, so that can't be causing us to die
    

    Very very confusing.

  • Max Bolingbroke

    Max Bolingbroke June 10th, 2009 @ 05:48 PM

    Here is a minimal reproduction without using test-framework:

    import Test.HUnit.Lang
    import Control.Concurrent.STM
    
    import Control.Exception
    
    import Control.Concurrent
    import Control.Concurrent.MVar
    
    main = do
      mv <- newEmptyMVar
      forkIO $ do
            r <- performTestCase (atomically retry)
            print "Yeaahh!"
            print r
            evaluate r
            putMVar mv r
      print "Cool, let's see what we get"
      r <- takeMVar mv
      print r
      print "Am I printed?"
    
    mbolingbroke@mb566 ~/Junk
    $ ./Repro 
    "Cool, let's see what we get"
    "Yeaahh!"
    Just (False,"thread blocked indefinitely")
    Repro: thread blocked indefinitely
    

    Expected output would include "Am I printed?".

    I think this is a RTS bug, so I'm going to file it on their Trac and see what they say:

    http://hackage.haskell.org/trac/ghc/ticket/3291

  • Max Bolingbroke

    Max Bolingbroke June 17th, 2009 @ 10:21 PM

    Hmm, tricky.

    I can make the test actually hang by adding:

    myThreadId >>= newStablePtr

    To the thread executing the test. This is because the GC won't be able to detect the indefinite block. Alternatively, if I add that code to the guy spawning the worker threads, then I won't hang then, but I'll hang the main thread later because the worker thread will die and hence never send back the result of the action being run on the channel the main thread listens on.

    Soooo. Not sure what to do about this to make it robust. Really we need this blocking exception to be cancellable....

  • Echo Nolan

    Echo Nolan June 19th, 2009 @ 03:19 AM

    I've fixed this. Doing the stableptr trick in the thread spawning the workers works. The main thread doesn't hang later because the worker thread doesn't die, it just gets and handles the exception properly. The semantics of this are lame, however. But I had fun nonetheless.

  • Max Bolingbroke

    Max Bolingbroke June 19th, 2009 @ 03:39 AM

    That's kind of suprising. From talking to Simon Marlow today only the exception handlers of the thread being killed are guaranteed to run, so I wouldn't have thought that the thread pool worker would be guaranteed to get a chance to write back a WorkerItem into the shared channel - hence my comment above.

    If it does appear to work, I need to find a new understanding of the RTS to understand this :)

    Thanks for the investigation!

  • Echo Nolan

    Echo Nolan June 19th, 2009 @ 05:29 AM

    Hi again.
    Didn't realize you were in Cambridge. Seems everyone I talk to about Haskell is either a doctor or will be one soon. Anyway, I think you're a bit confused. The thread pool worker is where the exception should be caught. In an ideal world, everywhere else wouldn't get one. The myThreadId >>= newStablePtr trick makes the current thread (in my patch the current thread = the worker spawner) reachable, and by extension makes any threads blocked on vars the current thread holds references to also reachable. So no new understanding of the RTS needed :) Unless I'm the one who's confused. Obviously, Simon is one of the people who knows the most about GHC's RTS, so if he contradicts me, nevermind. But I think it doesn't matter whether the other threads' exception handlers would get called because our stableptr trick makes the exception never get thrown.

    Regards, Echo.

  • Max Bolingbroke

    Max Bolingbroke June 19th, 2009 @ 11:44 AM

    Echo,

    The problem is that the write to the channel happens on the worker thread. So what normally happens is:

    Main thread:
    1) Creates channel for workers to communicate results on
    2) Spawns N worker threads
    3) Waits for M WorkerItems on the channel to correlate with the actions being run

    Worker threads:
    4) Run test
    5) Write result to channel as WorkerItem
    6) Loop until out of actions

    Main thread:
    7) Got all results! Done

    I completely agree that we can stop the main thread dying if we use the StablePtr trick. However, if we do this and then we get the exception on the worker thread at 4) then we never reach 5) and hence the main thread can never reach 7). Hence I think a correct fix would be to install an exception handler writing an appropriate WorkerItem to the channel (so the main thread can reach 7), and also cause the threadpool to respawn the dying worker (to prevent e.g. running out of threads sucking stuff from the pool).

    Alternatively, it might be cleaner to spawn ANOTHER thread within the worker threads just for the purpose of running the test, and do the StablePtr hack on the threads created by the threadpool.

    Does that make sense, or am I talking nonsense?

  • Echo Nolan

    Echo Nolan June 19th, 2009 @ 05:52 PM

    I think you're talking nonsense :)

    Here's some output from a program that aggravates the bug in question with my above patch to test-framework installed:

    enolan at enolan-laptop in ~/code/whiteout [master*]
    $ ./dist/build/runTests/runTests
    Internal.BEncode:
      bencodeRoundTrip: [OK, passed 100 tests]
    Internal.Peer.Messages:
      handshakeRoundTrip: [OK, passed 100 tests]
      peerMsgRoundTrip: [OK, passed 100 tests]
    Network.Whiteout:
      verify:

    singleFileShouldSucceed: [Failed]
    
    
    
    
    ERROR: thread blocked indefinitely
    singleFileShouldFail: [Failed]
    
    
    
    
    ERROR: thread blocked indefinitely
    multiFileShouldSucceed: [Failed]
    
    
    
    
    ERROR: thread blocked indefinitely
    multiFileShouldFail: [Failed]
    
    
    
    
    ERROR: thread blocked indefinitely
         Properties  Test Cases  Total
    
    
    
    
    Passed 3 0 3
    Failed 0 4 4
    Total 3 4 7

    The BlockedIndefinitely doesn't necessarily kill the thread. If the exception handler on the stack in the worker thread catches it, it can return a pure value, and send it to the result channel.

  • Max Bolingbroke
  • Max Bolingbroke

    Max Bolingbroke June 28th, 2009 @ 03:14 PM

    • State changed from “new” to “resolved”

    OK, you've convinced me :-)

    I've applied your patch along with some other improvements to exception handling - the result is on Hackage as 0.2.4.

    Thanks a bunch!

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

Framework for running and organising QuickCheck test properties

People watching this ticket

Referenced by

Pages