Squashing bugs in multithreaded Android code with CheckThread

Writing correct multithreaded code is difficult, and writing Android apps is no exception. Like many mobile platforms, Android’s UI framework is single threaded and requires the application developer to manage threads with no thread-safe guarantee. If your app is more complicated than “Hello, World!” you can’t escape writing multithreaded code. For example, to build a smooth and responsive UI, you will have to move long running operations like network and disk IO to background threads and then return to the UI thread to update the UI.

Thankfully, Android provides some tools to make this easier such as the AsyncTask utility class and the StrictMode API. You can also use good software development practices such as adhering to strict code style and requiring careful code review of code that involve the UI thread. Unfortunately, these approaches require diligence, are prone to human error, or only catch errors at runtime.

CheckThread for Android

CheckThread is an open source project authored by Joe Conti that provides annotations and a simple static analysis tool for verifying certain contracts in multithreaded programs. It’s not a brand new project and it’s not very difficult to use, but it hasn’t had a very high adoption for Android apps. It offers an automated alternative to exclusively using comments and code review to ensuring no bugs related to the UI thread are introduced in your code. The annotations provided by CheckThread are: @ThreadSafe, @NotThreadSafe, @ThreadConfined

ThreadSafe and NotThreadSafe are described in Java Concurrency in Practice, and CheckThread enforces the same semantics that book defines. For the purposes of this blog post, the only annotation that we’ll be using is ThreadConfined.

Thread confinement is a general property of restricting data or code to access from only a single thread. A data structure confined to the stack is inherently thread confined. A method that is only ever called by a single thread is also thread confined. In Android, updates to the UI must be confined to the UI thread. In very concrete terms, this implies that any method that mutates the state of a View should only be called from the UI thread. If this policy is violated, the Android framework may throw a RuntimeException, but also may simply produce undefined behavior, depending on the specific nature of the update to the UI.

CheckThread supports defining thread policies in XML files, so while it would be possible, it’s not necessary to download the source of the Android framework code and manually add annotations to it. Instead, we can simply define a general thread policy to apply to Android framework classes.

Time for an Example

The following example demonstrates how to declare a thread policy in XML, annotate a simple program and run the CheckThread analyzer to catch a couple of bugs.

CheckThread’s XML syntax supports patterns and wildcards which allows you to concisely define policies for Android framework code. In this example we define two Android specific policies. In general this file would contain more definitions for other Android framework classes and could also contain definitions for your own code.

The first policy declares that all methods in Activity and its subclasses that begin with the prefix “on” should be confined to the main thread. Since CheckThread has no built-in concept of the Android framework or of the Activity class we need to inform the static analyzer which thread will call these methods.

The second policy declares that all methods in classes ending with “View” should be confined to the main thread. The intention is to prevent calling any code that modifies that UI from any other thread than the UI thread. This is a little bit conservative since there are some methods in Android View classes that have no side-effects, but it will do for now.

https://gist.github.com/4113656

Having defined the thread policy, we can turn to our application code. The example app is the rough beginnings of a Hacker News app. It fetches the RSS feed for the front page, parses the titles and displays them in a LinearLayout.

This first version is naive; it does network IO and XML parsing in Activity.onCreate. This error will definitely be caught by StrictMode, and will likely just crash the app on launch, so it would be caught early in development, but it would be even better if it were caught before the app was even run.

https://gist.github.com/4113662

In this code, we make a call to the static method getHttp in the IO utility class. The details of this class and method are not important, but since it does network IO, it should be called from off the UI thread. We’ve annotated the entire class as follows:

https://gist.github.com/4113669

This annotation tells CheckThread that all the methods in this class should be called from the “other” thread.

Finally, we’re ready to run the static analyzer. CheckThread provides several ways to run the analysis tool, including Eclipse and Intellij plugins, but we’ll just use the Ant tasks on the command line. This is what CheckThread outputs after we run the analyzer:

https://gist.github.com/4113676

As you can see, CheckThread reports an error: we’re calling a method that should be confined to the “other” thread from a method that’s confined to “MAIN”. One solution to this problem is to start a new thread to do network IO on. We create an anonymous subclass of java.util.Thread and override run, inside of which we fetch the RSS feed, parse it and update the UI.

https://gist.github.com/4113683

We’ve annotated the run method to be confined to the “other” thread. CheckThread will use this to validate the call to IO.getHttp. After running the analyzer again, we discover that there’s a new error reported:

https://gist.github.com/4113686

This time, the error is caused by calling the method setText on a TextView from a different thread than the UI thread. This error is generated by the combination of our thread policy defined in XML and the annotation on the run method.

Instead, we could call the Activity.runOnUiThread with a new Runnable. The Runnable’s run method is annotated to be confined to the UI thread, since we’re passing it to a framework method that will call it from the UI thread.

https://gist.github.com/4113689

Finally, CheckThread reports no errors to us. Of course that doesn’t mean that the code is bug free, static analysis of any kind has limits. We’ve just gotten some small assurance that the contracts defined in the XML policy and annotations will be held. While this example is simple, and the code we’re analyzing would be greatly simplified by using an AsyncTask, it does demonstrate the class of errors that CheckThread is designed to catch. The complete sample project is published on Github.

The Pros and Cons of Annotations

One drawback that is probably immediately obvious is the need to add annotations to a lot of your code. Specifically, CheckThread’s static analysis is relatively simple, and doesn’t construct a complete call graph of your code. This means that the analyzer will not generate a warning for the code below:

https://gist.github.com/4113695

While this may appear to be a significant problem, it’s my opinion that in practice it is not actually a deal breaker. Java already requires that the programmer write most types in code. This is seen by some as a drawback of Java (and is often cited incorrectly as a drawback of static typing in general). However there are real advantages to annotating code with type signatures, and even proponents of languages with powerful type inference will admit this, since it’s usually recommended to write the type of “top-level” or publicly exported functions even if the compiler can infer the type without any hint.

The annotations that CheckThread uses are similar; they describe an important part of a method’s contract, that is whether it is thread safe or should be confined to a specific thread. Requiring the programmer to write those annotations elevates the challenge of writing correct multithreaded code to be at the forefront of the programmer’s mind, requiring that some thought be put into each method’s contract. The use of automated static analysis makes it less likely that a comment will become stale and describe a method as thread safe when it is not.

The Future of Static Analysis

The good news is that the future of static analysis tools designed to catch multithreaded bugs is looking very bright. A recent paper published by Sai Zhang, Hao Lü, and Michael D. Ernst at the University of Washington describes a more powerful approach to analyzing multithreaded GUI programs. Their work targets Android applications as well as Java programs written using other GUI frameworks. The analyzer described in their paper specifically does construct a complete call graph of the program being analyzed. In addition, it doesn’t require any annotations by the programmer and also addresses the use of reflection in building the call graph, which Android specifically uses to inflate layouts from XML. This work was published only this past summer, and the tool itself is underdocumented at the moment, but I recommend that anyone interested in this area read the paper which outlines their work quite clearly.

 

Standard

Write Once, Compile Twice, Run Anywhere

Many Java developers use a development environment different from the target deployment environment.  At Flurry, we have developers running OS X, Windows, and Linux, and everyone is able to contribute without thinking much about the differences of their particular operating system, or the fact that the code will be deployed on a different OS.

The secret behind this flexibility is how developed the Java toolchain has become. One tool (Eclipse)  in particular has Eclipsed the rest and become the dominant IDE for Java developers. Eclipse is free, with integrations like JUnit support, and a host of really great plugins making it the de facto standard in Java development, displacing IntelliJ and other options.  In fact, entry level developers rarely even think about the compilation step, because Eclipse’s autocompilation keeps your code compiled every time you save a file.

There’s Always a Catch

Unfortunately no technology is magical and while this set up rarely causes issues, it can. One interesting case arises when the developer is using the Eclipse 1.6 compiler compliance and the target environment uses Sun’s 1.6 JDK compiler.  For example at Flurry, during development we rely on Eclipse’s JDT Compiler, but the code we ship gets compiled for deployment on a Jenkins server by Ant using Sun’s JDK compiler. Note that both the developer and continuous integration environment are building for Java 6, but using different compilers. 

Until recently this never came up as an issue as the Eclipse and Sun compilers, even when running on different operating systems, behave almost identically.  However, we have been running into some interesting (and frustrating) compiler issues that are essentially changing “Write Once, Run Anywhere” into “Write Once, Compile Using Your Target JDK, Run Anywhere.”  We have valid 1.6 Java code using generics, which compiles fine under Eclipse, but won’t compile using Sun’s javac.

Let’s See an Example

An example of the code in question is below. Note that it meets the Java specification and should be a valid program. In fact, in Eclipse using Java 1.6 compiler compliance the code compiles, but won’t compile using Sun’s 1.6 JDK javac.

https://gist.github.com/4035987

Compiling this code using javac in the Sun 1.6 JDK gives this compiler error:

https://gist.github.com/4036089

“Write Once, Run Anywhere” never made any promises about using different compilers, but the fact that our toolchain was using a different compiler than our build server never bore much thought until now.

Possible Solutions

The obvious solution is to have all developers work on the same environment as where the code will be deployed, but this would defer developers from using their preferred environment and impact productivity by constraining our development options. Possible solutions we have kicked around :

  1. Have ant compile using the Eclipse incremental compiler, (using flags  -Dbuild.compiler=org.eclipse.jdt.core.JDTCompilerAdapter and of course -Dant.build.javac.target=1.6). This side steps the problem by forcing the continuous integration system to use the same compiler as developer laptops, but is not ideal as this was never an intended use of the Eclipse compiler. 
  2. Move to the 1.7 JDK for compilation, using a target 1.6 bytecode. This solves this particular issue, but what happens in the future?
  3. Change the code to compile under Sun’s JDK. This is not a bad option but will cost some speed of development found in the simplicity of Eclipse’s built in system. 

My experience has been that Eclipse is a well worn path in the Java world, and its a little surprising that this hasn’t come up before for us given the heavy use of generics (although there are lots of generics issues which have been logged over at bugs.sun.com, like http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6302954 which has come up for us as a related issue – the “no unique maximal instance exists” error message). 

Switching to use the Eclipse compiler for our deployable artifacts would be an unorthodox move, and I’m curious if anyone out there reading this has done that, and if so, with what results.

We had a discussion internally and the consensus was that moving to 1.7 for compilation using a target 1.6 bytecode (#2) should work, but would potentially open us up to bugs in 1.7 (and would require upgrading some internal utility machines to support this).  We aren’t quite ready to make the leap to Java 7, although its probably time to start considering the move. 

For now, we coded around the compiler issue, but its coming up for us regularly now, and we are kicking around the ideas on how to resolve.  In the near term, for the projects that run into this generics compile issue, developers are back to using good ole ant to check if their code will “really” compile.  Its easy to forget how convenient autocompilation has become, and the fact that it isn’t really the same as the build server’s compiler.

Standard