Android消息机制浅析

本文主要介绍Android的消息机制，而了解消息机制必先了解Handler，Looper，MessageQueue。

Android的消息机制主要是Handler的运行原理，弄清楚了Handler是如何运行的也就大致清除了Android的消息机制是如何运行的。而Handler内部中又紧紧包含了Looper和MessageQueue，因此我们可以根据Handler源码依次顺藤摸瓜，弄清Looper和MessageQueue原理，从而明白消息机制。但我们需要先了解Handler，Looper，MessageQueue，Message是什么。

问题
Handler
Looper
MessageQueue
消息传递流程
Native层中的消息机制
解决

问题

Q1:Handler发消息给子线程，Looper如何启动？
Q2:为什么在子线程创建Handler会抛异常？
Q3:Handler为什么loop是死循环？
Q4:解释下在单线程模型中Message、Handler、MessageQueue、Looper之间的关系。
Q5:Handler的作用有哪些？

Handler

为了了解Handler，我们先来看一下Handler的初始化函数做了哪些操作。

    public Handler(Callback callback, boolean async) {
        ......
        mLooper = Looper.myLooper();
        if (mLooper == null) {
            throw new RuntimeException(
                "Can't create handler inside thread that has not called Looper.prepare()"); // 说明了如果Handler在没有Looper的线程中创建则会抛出异常
        }
        mQueue = mLooper.mQueue;
        mCallback = callback;
        mAsynchronous = async;
    }

可以看到Handler中存储了当前线程的Looper，同时可以看到如果该线程中如果没有Looper的话，创建Handler的时候便会抛出异常。Handler中除了Looper，还存储了Looper中的消息队列。

Looper

在Handler的构造函数中我们看到了获取了Looper对象存入其中，那我们先来看下myLooper是怎么获取的。

    public static @Nullable Looper myLooper() {
        return sThreadLocal.get();
    }

此处的sThreadLocal 是一个ThreadLocal对象，存储着当前线程的Looper。ThreadLocal是一个在线程之间存储数据的类。内部采用了Map映射的方法。<key(当前ThreadLocal对象本身), Value(存储的数据)>

MessageQueue

消息队列其实并不是采用数据结构中的队列来实现的，采用了一个单链表的形式来存储消息的。MessageQueue中有一个Message类型的局部变量mMessage，是一个单链表的形式来存储消息的。

消息传递流程

Handler 有两种方式可以向MessageQueue 发送消息，一种是post ，另外一种是sendMessage。但其实post 实际上也是通过sendMessage来发送消息的。我们可以来看下源码。

    public final boolean post(Runnable r)
    {
       return  sendMessageDelayed(getPostMessage(r), 0);
    }
    
    private static Message getPostMessage(Runnable r) {
        Message m = Message.obtain();
        m.callback = r;
        return m;
    }

我们可以看到Runnable 通过getPostMessage来将Runnable封装成了一个Message 然后再发送出去。我们再来看下sendMessage。

    public final boolean sendMessage(Message msg)
    {
        return sendMessageDelayed(msg, 0);
    }

sendMessage直接将message 发送出去，而post 我们通过源码查看，发现其本质也是和sendMessage一样发送Message。所以我们可以再来看一下Handler 如何发送消息。

    public boolean sendMessageAtTime(Message msg, long uptimeMillis) {
        MessageQueue queue = mQueue;
        if (queue == null) {
            RuntimeException e = new RuntimeException(
                    this + " sendMessageAtTime() called with no mQueue");
            Log.w("Looper", e.getMessage(), e);
            return false;
        }
        return enqueueMessage(queue, msg, uptimeMillis);
    }

Handler 最终调用sendMessageAtTime来发送消息的，我们可以看到其实Handler执行了enqueueMessage函数，我们再来看下这个函数。

    private boolean enqueueMessage(MessageQueue queue, Message msg, long uptimeMillis) {
        msg.target = this;
        if (mAsynchronous) {
            msg.setAsynchronous(true);
        }
        return queue.enqueueMessage(msg, uptimeMillis);
    }

这个函数把Handler和Message中的target绑定在一起了，然后将message插入了消息队列中。我们紧接着到MessageQueue文件中看下enqueueMessage代码。

    boolean enqueueMessage(Message msg, long when) {
        ......
        synchronized (this) {
            ......
            Message p = mMessages;
            boolean needWake;
            if (p == null || when == 0 || when < p.when) {
                // 消息队列中没有消息了，则把插入的消息插入头部。
                // New head, wake up the event queue if blocked.
                msg.next = p;
                mMessages = msg;
                needWake = mBlocked;
            } else {
                // 如果消息队列中还有消息，那么将从中间插入
                // 插入顺序按照消息延迟时间在消息队列中依次排序。
                // Inserted within the middle of the queue.  Usually we don't have to wake
                // up the event queue unless there is a barrier at the head of the queue
                // and the message is the earliest asynchronous message in the queue.
                needWake = mBlocked && p.target == null && msg.isAsynchronous();
                Message prev;
                for (;;) {
                    prev = p;
                    p = p.next;
                   //如果消息执行时间晚于当前消息，则把链表节点向后移再继续判断
                    if (p == null || when < p.when) {
                        break;
                    }
                    if (needWake && p.isAsynchronous()) {
                        needWake = false;
                    }
                }
                msg.next = p; // invariant: p == prev.next
                prev.next = msg;
            }

            // We can assume mPtr != 0 because mQuitting is false.
            if (needWake) {
                // 唤醒消息队列
                nativeWake(mPtr);
            }
        }
        return true;
    }

在if else 中的判断中，if 中如果消息队列中没有消息了，那么将我们插入的消息作为消息头，并且需要去唤醒Looper。在else 中则是将msg按执行时间顺序插入到消息队列中。（因为本人算法不好，所以是用笔写写来理清的else中的逻辑）enqueueMessage执行完之后消息便插入MessageQueue中了。知道了消息队列如何存消息的，所以接下来我们需要看一下消息队列是如何取数据的。如何来看一下MessageQueue的next()方法。

Message next() {
        ......
        
        for (;;) {
            if (nextPollTimeoutMillis != 0) {
                Binder.flushPendingCommands();
            }
    
            // 阻塞线程nextPollTimeoutMillis时间
            nativePollOnce(ptr, nextPollTimeoutMillis);

            synchronized (this) {
                // Try to retrieve the next message.  Return if found.
                final long now = SystemClock.uptimeMillis();
                Message prevMsg = null;
                Message msg = mMessages;
                if (msg != null && msg.target == null) {
                    // Stalled by a barrier.  Find the next asynchronous message in the queue.
                    do {
                        prevMsg = msg;
                        msg = msg.next;
                    } while (msg != null && !msg.isAsynchronous());
                }
                if (msg != null) {
                    if (now < msg.when) {
                        // 消息晚于当前时间时，计算出堵塞时间。
                        // Next message is not ready.  Set a timeout to wake up when it is ready.
                        nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);
                    } else {
                        // 从头部拿到消息
                        // Got a message.
                        mBlocked = false;
                        if (prevMsg != null) {
                            prevMsg.next = msg.next;
                        } else {
                            mMessages = msg.next;
                        }
                        msg.next = null;
                        if (DEBUG) Log.v(TAG, "Returning message: " + msg);
                        msg.markInUse();
                        // 拿到消息之后才返回
                        return msg;
                    }
                } else {
                    // No more messages.
                    nextPollTimeoutMillis = -1;
                }

                // Process the quit message now that all pending messages have been handled.
                if (mQuitting) {
                    dispose();
                    return null;
                }
                ......
        }
    }

nativePollOnce函数会堵塞nextPollTimeoutMillis时间的线程，然后唤醒把消息从消息队列头部取出返回。释放CPU资源进入休眠状态，直到下个消息到达或者有事务发生(设置的nextPollTimeoutMillis时间到了)时，才会通过pipe管道写入数据来唤醒主线程工作。这里涉及到的是Linux的pipe/epoll机制，epoll机制是一种I/O多路复用机制，可以同时监听多个描述符，当某个描述符就绪时，则立刻通知相应程序来进行读或写操作，本质同步I/O，即读写是堵塞的。

然后我们要看一下Looper中loop是把从MessageQueue中取出消息来处理的。

    public static void loop() {
        final Looper me = myLooper();
        if (me == null) {
            throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
        }
        final MessageQueue queue = me.mQueue;
        ......

        for (;;) {
            // 把消息从MessageQueue取出
            Message msg = queue.next(); // might block
            if (msg == null) {
                // No message indicates that the message queue is quitting.
                return;
            }
            ......
            try {
                // 把消息给发送它的Handler调用dispatchMessage去处理。
                msg.target.dispatchMessage(msg);
                end = (slowDispatchThresholdMs == 0) ? 0 : SystemClock.uptimeMillis();
            } finally {
                if (traceTag != 0) {
                    Trace.traceEnd(traceTag);
                }
            }
            ......
            // 消息使用完之后回收
            msg.recycleUnchecked();
        }
    }

loop中从当前Looper 的消息队列中不断拿消息，然后将消息丢给msg.target的diapatchMessage来处理，而之前Handler 的enqueueMessage中有介绍过，Handler 把数据给消息队列之前将自己存入了Message 的 target 中，所以此处是把消息给发送它的Handler来处理，这样消息的处理就转移到了创建Handler的线程中来了。（而我们的Handler一般都是通过getMainLooper来创立的，即主线程的Handler，即将消息传入到主线程来处理了）

接着，我们再来看下Handler 是如何处理消息的。

    public void dispatchMessage(Message msg) {
        if (msg.callback != null) {
            handleCallback(msg);
        } else {
            if (mCallback != null) {
                if (mCallback.handleMessage(msg)) {
                    return;
                }
            }
            handleMessage(msg);
        }
    }
    
    private static void handleCallback(Message message) {
        message.callback.run();
    }

首先看消息中的callback是不是空，而我们有说过Handler post Runnable的时候我们有通过getPostMessage把Runnable封装成消息，封装的时候就将Runnable存入了消息的callback中，所以此处消息中的callback就是我们之前post的Runnable对象，在handlerCallback中我们看到将Runnable运行在了主线程。另外一种情况，如果我们是直接发送消息的那么我们就会调用mCallback.handlerMessage来处理消息，而handleMessage为我们创建Handler的时候重写的函数，这样也将消息在主线程中处理了。

Native层中的消息机制

我们已经从framework层面分析了Android的消息机制，现在我们尝试深入一些从Native层中分析之前我们提到过的一些问题。例如MessageQueue中如何通过nativePollOnce()来阻塞线程以及如何通过nativeWake()来唤醒线程。但了解这些之前，我们应该首先了解Native中是如何创建消息机制的。

Native中也存在MessageQueue和Looper，我们先看一下在Java层中MessageQueue的构造函数。

    MessageQueue(boolean quitAllowed) {
        mQuitAllowed = quitAllowed;
        mPtr = nativeInit();
    }

可以看到，通过调用nativeInit()来完成Native层中消息机制的创建，并且返回一个long值。我们来看下这个native方法。这个方法在android_os_MessageQueue.cpp中。

static jlong android_os_MessageQueue_nativeInit(JNIEnv* env, jclass clazz) {
    NativeMessageQueue* nativeMessageQueue = new NativeMessageQueue();
    if (!nativeMessageQueue) {
        jniThrowRuntimeException(env, "Unable to allocate native queue");
        return 0;
    }

    nativeMessageQueue->incStrong(env);
    return reinterpret_cast<jlong>(nativeMessageQueue);
}

native层中直接new了一个NativeMessageQueue对象赋值给一个指针（不是很熟悉C++，可能描述的不太准确，请指教），然后通过reinterpret_case将这个指针转化为相对偏移量long值，返回给java层。之后java层可通过mPtr这个偏移量在native中直接找到这个对象指针。而我们再来看下new NativeMessageQueue()这个构造函数做了些什么吧。

NativeMessageQueue::NativeMessageQueue() :
        mPollEnv(NULL), mPollObj(NULL), mExceptionObj(NULL) {
    mLooper = Looper::getForThread();
    if (mLooper == NULL) {
        mLooper = new Looper(false);
        Looper::setForThread(mLooper);
    }
}

可以看到，里面新建了一个Looper,并将它与当前线程绑定。Looper的创建过程中主要是调用了rebuildEpollLocked()来使用Linux中的epoll机制的。

void Looper::rebuildEpollLocked() {
    // Close old epoll instance if we have one.
    if (mEpollFd >= 0) {
#if DEBUG_CALLBACKS
        ALOGD("%p ~ rebuildEpollLocked - rebuilding epoll set", this);
#endif
        mEpollFd.reset();
    }

    // Allocate the new epoll instance and register the wake pipe.
    // 1
    mEpollFd.reset(epoll_create1(EPOLL_CLOEXEC));
    LOG_ALWAYS_FATAL_IF(mEpollFd < 0, "Could not create epoll instance: %s", strerror(errno));

    struct epoll_event eventItem;
    memset(& eventItem, 0, sizeof(epoll_event)); // zero out unused members of data field union
    eventItem.events = EPOLLIN;
    eventItem.data.fd = mWakeEventFd.get();
    // 2
    int result = epoll_ctl(mEpollFd.get(), EPOLL_CTL_ADD, mWakeEventFd.get(), &eventItem);
    LOG_ALWAYS_FATAL_IF(result != 0, "Could not add wake event fd to epoll instance: %s",
                        strerror(errno));

    for (size_t i = 0; i < mRequests.size(); i++) {
        const Request& request = mRequests.valueAt(i);
        struct epoll_event eventItem;
        request.initEventItem(&eventItem);

        int epollResult = epoll_ctl(mEpollFd.get(), EPOLL_CTL_ADD, request.fd, &eventItem);
        if (epollResult < 0) {
            ALOGE("Error adding epoll events for fd %d while rebuilding epoll set: %s",
                  request.fd, strerror(errno));
        }
    }
}

注释1处，使用Linux系统函数epoll_create1()来创建epoll实例并返回新创建的文件描述符。注释2中使用epoll_ctl对事件进行监听。当文件描述符写入事件的时候就会唤醒线程。至此完成了Native层中的消息机制的初始化，所以Android的消息机制初始化是先在Java中创建Looper，然后Java中的Looper创建的过程中又会创建Java中的MessageQueue，Java中的MessageQueue创建时会调用native方法nativeInit()创建native中的MessageQueue,native中的MessageQueue又会创建Native中的Looper,在Looper中使用Linux的epoll机制来监听事件。

接着我们知道了native中的消息机制的创建后再来看下，Android中是如何在没有消息时阻塞线程以及在消息来临时唤醒线程。首先我们来回顾一下Looper的loop()方法的其中一段。

    public static void loop() {
        final Looper me = myLooper();
        if (me == null) {
            throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
        }
        final MessageQueue queue = me.mQueue;

        // Make sure the identity of this thread is that of the local process,
        // and keep track of what that identity token actually is.
        Binder.clearCallingIdentity();
        final long ident = Binder.clearCallingIdentity();

        for (;;) {
            // 1
            Message msg = queue.next(); // might block
            if (msg == null) {
                // No message indicates that the message queue is quitting.
                return;
            }
            ...
        }
    }

在注释1处，我们可以看到调用了next()方法来从消息队列中取消息，再看一下右边的注释说可能阻塞。那我们就再来看下这个next()函数。

    Message next() {
        // Return here if the message loop has already quit and been disposed.
        // This can happen if the application tries to restart a looper after quit
        // which is not supported.
        final long ptr = mPtr;
        if (ptr == 0) {
            return null;
        }

        int pendingIdleHandlerCount = -1; // -1 only during first iteration
        int nextPollTimeoutMillis = 0;
        for (;;) {
            if (nextPollTimeoutMillis != 0) {
                Binder.flushPendingCommands();
            }

            // 1
            nativePollOnce(ptr, nextPollTimeoutMillis);
            //...
        }
    }

next()中调用了注释1中的nativePollOnce(ptr, nextPollTimeoutMillis)函数，这个函数就是阻塞线程的函数，其中ptr参数我们之前在MessageQueue的初始化中知道，native中可以通过这个偏移值long值找到native中的MessageQueue指针。而nextPollTimeoutMillis则是阻塞线程的时间。我们同样到android_os_MessageQueue.cpp中来看下。

static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj,
        jlong ptr, jint timeoutMillis) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    nativeMessageQueue->pollOnce(env, obj, timeoutMillis);
}

其中调用了pollOnce()函数。该方法中调用了Looper中的pollOnce(timeoutMillis)函数。最终调用了Looper中的pollInner(timeoutMillis)函数。

int Looper::pollInner(int timeoutMillis) {
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d", this, timeoutMillis);
#endif
    //......

    // We are about to idle.
    mPolling = true;

    struct epoll_event eventItems[EPOLL_MAX_EVENTS];
    int eventCount = epoll_wait(mEpollFd.get(), eventItems, EPOLL_MAX_EVENTS, timeoutMillis);
    
    //......
    return result;
}

这个函数很长，我们只看关键的这一部分。调用了系统函数epoll_wait来阻塞线程。当timeoutMillis大于0时，将阻塞至多timeoutMillis毫秒，直到文件描述符上有事件发现，或者直到捕获到一个信号为止。

参数 timeout 用来确定 epoll_wait()的阻塞行为，有如下几种。

如果 timeout 等于-1，调用将一直阻塞，直到兴趣列表中的文件描述符上有事件产生，或者直到捕获到一个信号为止。
如果 timeout 等于0，执行一次非阻塞式的检查，看兴趣列表中的文件描述符上产生了哪个事件。
如果 timeout 大于0，调用将阻塞至多 timeout 毫秒，直到文件描述符上有事件发生，或者直到捕获到一个信号为止。

此时线程进入了阻塞状态，Java层中的消息队列目前没有需要即时处理的消息。线程唤醒会有两个条件，第一个是线程阻塞了timeoutMillis毫秒之后唤醒，因为消息队列中最早的一条消息在timeoutMillis之后需要处理了。第二个就是消息队列来了新消息，也就是说MessageQueue中调用了enqueueMessage(message, when)插入消息。同样，我们重新看一下这个函数。

    boolean enqueueMessage(Message msg, long when) {
        if (msg.target == null) {
            throw new IllegalArgumentException("Message must have a target.");
        }
        if (msg.isInUse()) {
            throw new IllegalStateException(msg + " This message is already in use.");
        }

        synchronized (this) {
            //......

            // We can assume mPtr != 0 because mQuitting is false.
            if (needWake) {
                nativeWake(mPtr);
            }
        }
        return true;
    }

消息插入消息队列的过程省略了，我们看下关键的这个nativeWake(mPtr)函数。由于线程处于阻塞状态，所以needWake为true，调用该方法。此方法依旧在android_os_MessageQueue.cpp中。nativeWake(mPtr)最终调用的是native的Looper中的wake()方法。

void Looper::wake() {
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ wake", this);
#endif

    uint64_t inc = 1;
    // 1
    ssize_t nWrite = TEMP_FAILURE_RETRY(write(mWakeEventFd.get(), &inc, sizeof(uint64_t)));
    if (nWrite != sizeof(uint64_t)) {
        if (errno != EAGAIN) {
            LOG_ALWAYS_FATAL("Could not write wake signal to fd %d (returned %zd): %s",
                             mWakeEventFd.get(), nWrite, strerror(errno));
        }
    }
}

注释1处调用了系统函数write()来对文件描述符写入文件。同时TEMP_FAILURE_RETRY宏定义失败的话则重试。这里对文件描述符写入文件用来唤醒线程，因为之前采用了epoll_ctl进行了监听。所以线程唤醒之后，阻塞在Java层中的Looper中的loop()函数又可以继续执行，处理消息了。

至此，我们已经从Handler发送消息到MessageQueue，然后Looper从MessageQueue中取消息进行处理的全部流程都梳理了一遍，这也就是Android消息机制的运行主线，同时我们也了解了Native中的消息机制以及采用Linuex中的epoll机制来对线程进行阻塞唤醒来节省CPU资源。

接下来我们可以解决我们开头提的一些问题了。

解决

Q1:Handler发消息给子线程，Looper如何启动？
A1:Looper通过Looper中的静态方法loop()进行消息处理无线循环。会不断地从消息队列中取信息，当获取不到消息时在消息队列中会调用nativePollOnce阻塞当前线程，直到可以拿到消息时唤醒线程，将消息取出给Handler处理。
Q2:为什么在子线程创建Handler会抛异常？

A2:从Handler构造函数中可以得到答案。如下

      mLooper = Looper.myLooper();
      if (mLooper == null) {
          throw new RuntimeException(
              "Can't create handler inside thread " + Thread.currentThread()
                      + " that has not called Looper.prepare()");
      }

子线程创建Handler时会先拿当前线程的Looper，而子线程中没有调用Looper.prepare()来创造Looper，故会抛出异常。
Q3:Handler为什么loop的死循环不会产生ANR？
A3:首先需要先了解ANR产生的原因。在Android中如果主线程（UI线程）进行耗时操作会引发ANR异常，产生ANR异常原因一般有两种：
1.当前的事件没有机会得到处理。（主线程正在处理上一个事件，没有及时或者looper被某种原因阻塞了）
2.当前事件正在处理，但没有及时完成。
例如若onCreate()中进行了耗时操作，导致点击、触摸等不响应，就会产生ANR。为了避免ANR异常，需要在子线程中进行耗时操作，需要更新UI时，发送消息给主线程让主线程进行操作。

主线程中的Looper的loop中的死循环一直轮询向消息队列中拿消息，那死循环是不是会产生ANR呢？可以先看一下主线程ActivityThread中的运行代码

public static void main(String[] args) {
  Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ActivityThreadMain");
  ...
  Looper.loop();
 
  throw new RuntimeException("Main thread loop unexpectedly exited");
}

如果主线程的main方法中不进行loop的死循环，那应用运行完毕了就直接结束关闭了。所以ActivityThread的main()方法就是做消息损坏，一旦退出消息循环，那么应用也就退出了。Android是由事件驱动的，Looper.loop()不断接收消息队列中的事件处理事件。每一个点击触摸和Activity的生命周期都是在loop()控制之下，如果它停止了，应用就结束了。只能是loop()内的某一个消息或者对消息的处理阻塞了loop(),而不是loop()阻塞了消息。也就是说在loop()内的Activity在5s之内没有响应事件，这时就会产生ANR。
Q4:解释下在单线程模型中Message、Handler、MessageQueue、Looper之间的关系。
A4:1.Handler
发送消息到消息队列，并且执行从消息队列中取出的消息。
2.Looper
每个线程只有一个Looper，并且Looper中包含了消息队列，Looper中会进行loop()消息循环，不断地从消息队列中取出消息来给Handler进行处理。
3.MessageQueue
消息队列用来存放Handler发送过来的消息，一个消息队列中可包含多个消息Message。当Looper创建的时候会自动创建消息队列存储在Looper中。
4.Message
消息对象。建议通过Message.obtain()从对象池中取出数据，并且在使用完消息之后会通过recycleUnchecked进行将消息回收到对象池中。
Q5:Handler的作用有哪些？
A5:1.用于线程中通信，在子线程中处理好了耗时操作之后通知主线程进行UI的更新。
2.可用过延时。推送未来某个时间点将要执行的Runnable或Message到消息队列中去。

Blog