Hacker News new | past | comments | ask | show | jobs | submit login

With this type of thread management they're required to trap into the kernel to acquire a mutex. There are obviously severe performance issues with this which partly explains some of the blocking and performance issues in Androids garbage collector

Android runs on Linux. Linux uses Futexes (Fast Userspace Mutexes) for locking abstractions like semaphores and mutexes. From Wikipedia -

"A futex consists of a kernelspace wait queue that is attached to an aligned integer in userspace. Multiple processes or threads operate on the integer entirely in user space (using atomic operations to avoid interfering with one another), and only resort to relatively expensive system calls to request operations on the wait queue (for example to wake up waiting processes, or to put the current process on the wait queue). A properly programmed futex-based lock will not use system calls except when the lock is contended; since most operations do not require arbitration between processes, this will not happen in most cases."




Android's garbage collection is handled by Dalvik.


Huh?

thread management they're required to trap into the kernel to acquire a mutex.

Of course GC is handled by the VM. That's not the point here though - the point I thought you made was that the GC uses mutexes and they're required to trap into the kernel. (At least that's how it reads above) That's not the case on Linux. GC uses mutexes which are implemented as futexes[1] that stay in user space for most of the time.

[1] http://www.mail-archive.com/uclibc@uclibc.org/msg02787.html


HeapWorker.c uses pthreads. doing a quick check on one of the functions dvmLockMutex (in Thread.h) uses pthread_mutex_lock.

// Grab a plain mutex.

INLINE void dvmLockMutex(pthread_mutex_t pMutex) {

    int cc __attribute__ ((__unused__)) = 
pthread_mutex_lock(pMutex);

    assert(cc == 0);
}

and here's pthread_mutex_lock.

int pthread_mutex_lock(pthread_mutex_t mutex) { if (mutex->kind == PTHREAD_MUTEX_NORMAL) { if (atomic_exchange(&mutex->lock, 1) != 0) { while (atomic_exchange(&mutex->lock, -1) != 0) {

        if (wait(mutex->event, INFINITE) != 0) return EINVAL;

      }

    }

  }
  else

  {

    pthread_t self = pthread_self();

    if (atomic_exchange(&mutex->lock, 1) == 0)
    {
      mutex->recursion = 1;

      mutex->owner = self;

    }

    else

    {

      if (pthread_equal(mutex->owner, self))

      {
        if (mutex->kind == PTHREAD_MUTEX_RECURSIVE)
          mutex->recursion++;
        else
          return EDEADLK;
      }
      else
      {
        while (atomic_exchange(&mutex->lock, -1) != 0)
        {
          if (wait(mutex->event, INFINITE) != 0) return EINVAL;
          mutex->recursion = 1;
          mutex->owner = self;
        }
      }
    }
  }

  return 0;
}

and here's an excerpt of dalvik/vm/Thread.c

Notes on Threading

All threads are native pthreads. All threads, except the JDWP debugger thread, are visible to code running in the VM and to the debugger. (We don't want the debugger to try to manipulate the thread that listens for instructions from the debugger.) Internal VM threads are in the "system" ThreadGroup, all others are in the "main" ThreadGroup, per convention.

The GC only runs when all threads have been suspended.

...


Regarding the pthread_mutex_lock code you posted: I'd say this shows that in the non-contention case there is no kernel call involved, assuming the used mutexes are of kind PTHREAD_MUTEX_NORMAL; atomic_exchange is likely entirely in user space[1] and thus only if the mutex is already locked will pthread_mutex_lock reach "wait" (wait_event ?) which is probably the kernel call; i.e. those are futexes.

Although I don't understand how the memory management makes use of mutexes (once a (any?) thread realizes that the GC needs to be run, how does it wait until all threads have reached a safe point?), and I don't have time to check ATM.

[1] http://www.insomniacgames.com/tech/articles/0807/files/multi... mentions the pthread_mutex_lock and unlock code including an implementation of atomic_exchange


It is somewhat difficult to read this code (due to the formatting), but this seems like a futex to me: if the lock is not being contended, the atomic_exchange() will take the lock without entering kernel space. If you want to argue with blinkingled, you should look at whether this specific lock is under contention or not, not paste large blocks of code that just demonstrate and prove his argument. ;P


yeah pthreads is implemented using futex. I don't care about the implementation details. It doesn't matter since it's still lock-based (albeit more efficient due to fewer system calls). My original point is, threading is still lock-based. Replacing your lock-based code with queues eliminates many of the penalties associated with locks and also simplifies your remaining code. Instead of using a lock to protect a shared resource, you can instead create a queue to serialize the tasks that access that resource. Queues do not impose the same penalties as locks. For example, queueing a task does not require trapping into the kernel to acquire a mutex (or however mutex is implemented)


Acquiring a mutex on Linux involves reading an integer from memory.

Also, the way a queue prevents two consumers from popping off the same element is the same way a mutex is implemented.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: