Interview Strategies at Flurry

To meet the demands of the market, Flurry has grown rapidly. Our numbers have roughly doubled each year, and we plan to continue this trajectory for the foreseeable future. Because of this pattern, we spend a significant amount of time interviewing candidates. Here is our approach.


Synopsis

Untitled

Our process is designed to help us discover as much as we can about each candidate as efficiently as possible. We begin with a phone screen, which is a low effort, low cost handshake between us and the candidate. This gives us a basic sense of their programming knowledge and their compatibility with us. Then, we follow up with a code test that evaluates their ability to design and implement a solution to the proposed challenge. Finally, we conduct two onsite interviews, one of which covers the design decisions made in their code test and their technical abilities, and another which covers their creative problem solving skills and delves into their past experiences.

During the two interviews, we assess the following:

  • Communication skills

  • Problem solving skills

  • Design ability

  • Team fit



Giving Hints

Most candidates will get stuck solving a problem at least once during the interviews. This is our opportunity to direct the conversation with a hint. A good hint can sometimes be very difficult to give. If an interviewee has chosen to answer a problem solving question in a way that is unusual but not necessarily incorrect, and then gets stuck, it is up to us to think ahead and supply a hint that helps them get to where they want to go. This can be especially challenging if the interviewee knows more about the subject than we do. The alternative is to give a hint that sets the candidate onto a path that leads to a known good solution, but this has several drawbacks. First, it wastes time because the interviewee must start again. Second, the candidate often still has their original idea in mind, which distracts them. Finally, pointing someone in a completely different direction is jarring and throws them off, especially if they start analyzing whether they have jeopardized their chances at an offer by getting the question wrong. Ideally, a good hint will provoke a thoughtful discussion about how to arrive at a solution. This also introduces a collaborative element to the interview and gives the interviewee a chance to teach us something. By digging in like this, we gain a wealth of information about their ability to communicate and how they approach problems.

 

Reviewing the Code Test

The code test tells us several things about the candidate:

  • How they write code in a natural setting

  • How they structure blocks of code – is it organized into logical units of work?

  • How they test their code – are their tests concise and relevant? What do the tests cover? Is their code designed to be testable?

  • How they design their algorithms – what approach does the candidate take?

Reviewing the test with the candidate mimics an actual code review. We discuss the compromises they made, such as performance, readability, and testability. We also explore the design decisions the candidate made to gain insight into how they prioritize their development practices. This also gives us a sense of how they respond to feedback: do they give justifications for their design decisions? Do they readily acknowledge mistakes?

 

Digging Deeper

Any candidate can memorize the answers to common interview questions, but eliciting such canned responses will not give us useful information about their abilities as a programmer. Therefore, we choose questions that lead to interesting discussion. For a hypothetical example, take the simple question of “What is a tree set?” If the candidate talks about the O(log n) find/insert/remove operations, we could follow up with “Since a tree is faster than a linked list for common operations, what are some reasons to use a linked list instead of a tree set?” This forces the candidate to explain their thought process, which is much more valuable than knowing whether the candidate can regurgitate runtime efficiencies of common structures.

Listening Well

At Flurry, we care about a candidate’s communication skills and fit just as much as their technical abilities. We pick up that information in a few ways. A lot of this type of information is gleaned during the technical portions. If a technical question is too hard or unexpected, we can look at an interviewee’s coping mechanisms – do they panic? What is their thought process – does it jump around or does it logically build from a set of premises? If a question is too easy, how does the candidate respond? Is he arrogant or dismissive? These are the keys which will tell us if an interviewee is confident about what they know and don’t know and is able to think for themselves. This kind of information can come from anywhere at any time, so we make sure to stay and take notes regardless of what it happening. By the end of the process, several different interviewers must come to a single conclusion, and if anybody thinks the candidate is not qualified, we pass on them.

Describing Flurry to the Interviewee

Anyone familiar with the process knows that interviews are just as much about helping the candidate learn about the company as they are evaluating the candidate. Since Flurry values intelligence and honesty, we want to attract the same qualities in our candidates. To that end, during the interview process, we are truthful about Flurry’s strengths and weaknesses and present a clear picture of what it will be like working here. The people interviewing the candidate are their future colleagues, and interviews take place in the same location as their future work environment. By the end of it, a candidate should have a good idea what it is like to work here, should they accept an offer.

 

 

Author: Jon Miller

Standard

APNS Test Harness

As Dr. Evil found out after spending a few decades in a cryogenic freezer, perspectives on quantity change very quickly. Ever since the explosion of mobile apps and the growing number of services that can deal with humongous amounts of data, we need to re-define our concepts of what ‘Big Data’ means. This goes for developers who want to write the Next Great App, as well as those who want to write services to support it.

One of the best ways of connecting with mobile application customers is via remote push notifications. This is available on both Android (GCM) and iOS (APNS). These services allow developers to send messages directly to their users and this is an extremely valuable tool to announce updates, send personalized messages and engage directly with the audience. Google and Apple provide services that developers can send push messages to and they in turn deliver those messages to their users.

Drevilpush

The Problem

It’s not unusual for apps these days to have in the order of millions and even tens of millions of users. Testing a Push Notification backend can be extremely hard. Sure, you can set up a few test devices to receive messages but how do you know how long it would take your backend to send out a large number of push messages to the Google and Apple servers? Also, you don’t want to risk being throttled or completely blacklisted by either of those services by sending a ton of test data their way.

The Solution

The solution is to write a mock server that’s able to simulate the Google/Apple Push Notification Service, and a test client to hit it with requests.

The Google service is completely REST based, so a script that executes a lot of curls in a loop can do that job. Also, it’s fairly straightforward to write a simple HTTP server and accepts POSTs and sends back either a 200 or some sort of error code.

Apple’s APNS, however, presents a few challenges. It’s a binary format listed here. Since the protocol is binary, you need to write some sort of mock client that can generate messages in the specified format. At Flurry, we’ve been playing around with Node.js to build scalable services and it’s fairly straightforward to setup an Apple APNS test client and server.

The Client

https://gist.github.com/rahuloak/4949310

The client.connect() method connects to the mock server and generates test data. The Buffer object in Node is used to pack the data into a binary format to send it over the wire. Although the protocol lets you specify a token size, the token size has been set to 64 bytes in the client since that’s typically the token length that gets generated. Also, in our experience, the APNS server actually rejects tokens that aren’t exactly 64 bytes long. The generateToken() method generates 64 byte hex tokens randomly. The payload is simple and static in this example. The createBuffer method can generate data in both the simple and enhanced format.
 
What good is a client without a server, you ask? So without further ado, here’s the mock server to go with the test client.

The Server

https://gist.github.com/rahuloak/4949381

After accepting a request, the server buffers everything into an array and then reads the buffers one by one. APNS has an error protocol, but this server only sends a 0 on success and a 1 otherwise. Quick caveat: Since the server stores data in a variable until it gets a FIN from the client (on ‘end’) and only then does it process the data, the {allowHalfOpen:true} option is required on createServer so that the client does not automatically close the connection.

This setup is fairly basic, but it is useful for many reasons. Firstly, the client could be used to generate fake tokens and send them to any server that would accept them (just don’t do it to the APNS server, even in sandbox mode). The data in the payload in the above example is static, but playing around with the size of the data as well as the number of blocks sent per request helps identify the optimal size of data that you would want to send over the wire. At the moment, the server does nothing with the data, but saving it to some database or simply adding a sleep in the server would be a good indicator of estimated time to send a potentially large number of push messages. There are a number of variables that could be changed to try and estimate the performance of the system and set a benchmark of how long it would take to send a large batch of messages.

Happy testing!

Standard

Tech Women

4th grade budding girl geek in the making 2nd row 2nd girl from the left

 
I grew up in a small suburb of New York fascinated with math and sciences. 3-2-1 Contact was my all-time favorite show then and getting their magazine was such a joy. As a young girl it was fun to try out the BASIC programs they published, programming with a joystick and running them on my Atari system (Yes programming with a joystick or paddle is just as useful as the MacBook Wheel.) It seemed like a no brainer to dive into computers when I started college. Women in my family were commonly in the sciences, so entering my college CS program was a bit of a culture shock for me; I could actually count all the women in my class year on one hand!
 
After graduating and working at a range of tech companies as a Quality Assurance Engineer, from big players to small startups, I’ve always had the desire to give back to the tech community. Only recently, however, did I find the right avenue. One day a co-worker of mine shared a link with me about the TechWomen program. From their website:
TechWomen brings emerging women leaders in Science, Technology, Engineering and Mathematics  from the Middle East and Africa together with their counterparts in the United States for a professional mentorship and exchange program. TechWomen connects and supports the next generation of women entrepreneurs in these fields by providing them access and opportunity to advance their careers and pursue their dreams.
 As soon as I read that, I applied right away.  This was exactly the type of program I was looking for to help share what I’ve learned.

 
It must have been written in the stars as I was accepted as a mentor in the program.  I was matched with Heba Hosny who is an emerging leader from Egypt.  She works as a QA Engineer at a Vimov, an Alexandria based mobile application company. During her three week internship at Flurry she was involved in the process of testing the full suite of Flurry products.

During Heba’s stay with us she was like a sponge, soaking up the knowledge to learn what it takes to build and run a fast-paced, successful company in Silicon Valley. In her own words,

“EVERYBODY likes to go behind the scenes. Getting backstage access to how Flurry manages their analytics business was an eye opening experience for me. I was always curious to see how Flurry makes this analytics empire, being behind the curtains with them for just a month has been very inspiring for me to the extent that some of what Flurry does has became pillars of how I daily work as a tester for Flurry analytics product used by the company I work for.

In a typical Internship, you join one company and at the end of the day you find yourself sitting in the corner with no new valuable information. You have no ability to even contact a senior guy to have a chat with him. Well, my internship at Flurry was the total OPPOSITE of that.

The Flurry workplace is different. In Flurry managers, even C levels, are sitting alongside engineering, business development, marketing, sales, etc. This open environment allowed me to meet with company CEO, CTO, managers, and even sitting next to the analytics manager.

 In short, an internship at Flurry for me was like a company wide in-depth journey of how you can run a superb analytics shop and what it’s like to deal with HUGE amounts of data like what Flurry works with .”

Working with Heba during her internship was a great experience. The experience of hosting an emerging leader was very fruitful. In QA we were able to implement some of the new tools Heba introduced to us, such as the test case management tool Tarantula. Heba also gave us the opportunity to learn more about her culture and gave members of our staff a chance to practice their Arabic. The San Francisco Bay Area is a very diverse place but this is the first chance many of us have gotten to hear a first hand account of the Arab Spring.

From our experience in the tech field, it’s obvious that the industry suffers from a noticeable lack of strong female leadership at the  top. It’s time that women who value both a rich home life and a fulfilling career explore the tech startup world. Participating in programs such as TechWomen will help in this regard. These programs benefit not only the mentee and mentor, but the industry as a whole. Mentees who gain experience in Silicon Valley tech companies will pay it forward to next generations of future tech women in their communities by sharing their experiences. Mentors in the program not only learn from their mentees but are able to create a sense of community to help make sure the mentee has a successful internship. Company-wise, participating in programs like TechWomen bring tremendous exposure to Flurry outside of the mobile community. As we enrich more women’s lives in the tech field, we can share even more experiences to help inspire young women and girls to know it’s possible to touch the Silicon Valley dream, no matter where in the world they are.

For more information:

Standard

The Benefits of Good Cabling Practices

An organized rack makes a world of difference in tracing and replacing cables, easily removing hardware, and most importantly increasing airflow. By adopting good cabling habits, your hardware will run cooler and more efficiently and ensure the health and longevity of your cables. You also prevent premature hardware failures caused by heat retention. Good cabling practices don’t sound important but it does make a difference. It’s also nice to look at or show off to your friends/enemies.

When cabling, here are some practices Flurry lives by:

Label everything

There has never been a situation where you’ve heard someone say, “I wish I hadn’t labeled this.” Labeling just makes sense. Spend the extra time to label both ends of the network and power cables. Your sanity will thank you. If you’re really prepared, print out the labels on a sheet ahead of time so they’ll be ready to use.

Cable length

When selecting cable length, there are two schools of thought. There are those who want exact lengths and those who prefer a little extra slack. The majority of messy cabling jobs are from selecting improper cable lengths so use shorter cables where possible. A good option is custom made cables. You get the length that you need without any excess. This option is usually expensive in either time or money. The other option is to purchase standard length cables. Assuming that you have a 42U rack, the furthest distance between two network ports is a little over six feet. In our rack build outs, we’ve had great results using standard five foot network cables for our server to switch connections. 

Cable management arms

When purchasing servers, some manufacturers provide a cable management arm with your purchase. They allow you to pull out your server without unplugging any cables. For this added benefit, they provide bulk, retain heat, and reduce proper airflow. If you have them, we suggest that you don’t use them. Under most circumstances, you would unplug all cables before you pull out a server anyway.

No sharp bends

Cables do require a bit of care when being handled. A cable’s integrity can suffer with any sharp bends so try to avoid this. In the past, we have seen port speed negotiation and intermittent network issues cause by damaged network cables.

Use mount points

As you group cables together, utilize anchor points inside of the rack to minimize stress on cable ends. Prolonged stress on the cable ends can cause the cable and socket it’s connected in to break. Power ends are also known to unplug. The weight of the bundled power cables can gradually unplug it at any moment. Using anchor points will help alleviate directed stress to the outlet.

Img_1935_3

 

Less sharing

Isolate different types of cables (power, network, kvm, etc) into different runs. Separating cable types will allow for easy access and changes. Bundled power cables can cause electromagnetic interference on surrounding cables so it would be wise to separate power from network cables. If you must keep copper network and power cables close together, try to keep them at right angles. Standing at the back of the rack, network cables are positioned on the left hand side of the rack while power cables are generally on the right in our setup.

Lots and lots of velcro

We’ve seen the benefits of velcro cable ties very early on. It’s got a lot of favorable qualities that plastic zip ties do not. They’re easy to add/remove and also retie. They’re also great when mounting bundled cables into anchor points inside of the racks. If your velcro ties come with a slotted end, do not give into the urge to thread the velcro into the ends. It’s annoying to unwrap and rethread. Don’t be shy to cut the velcro to length, either; using just the right length of velcro can make it easier to bundle and re-bundle cables. 

Now that you have these tips in mind, let’s get started on cabling a Flurry rack.

1. Facing the back of a 42U rack, add a 48 port switch in about the middle of the rack (position 21U (21st from the bottom). Once you have all your servers racked, now the fun part being, cabling. Let’s start with the network.

 2. From the top most server, connect the network cable to the top left port of your switch, which should be port 1.

3. As you go down the rack, connect the network cables on the top row of ports from left to right on the switch (usually odd numbered ports). Stop when you’ve reached the switch.

Img_1933_3

4. Using the velcro cable ties, gather together the cables in a group of ten and bundle the cabled groups with the cable ties. Keep the bundle on the left hand side of the rack. You will have one group of ten and one group of eleven that form into one bundled cable.

Img_1932_3

5. For the bottom set of servers, start with the lowest server (rack position 1U) and connect the network cable to the bottom left most port on the switch.

6. Starting from the bottom up, connect the network cables on the bottom row of ports from left to right on the switch (usually even numbered ports).

Img_1940_3

7. Doing the same as the top half of the rack, gather together the cables in a group of ten and bundle the cabled groups with the cable ties. Keep these bundles on the left hand side of the rack. You’ll end up with two bundles of ten that form into one bundled cable. Look pretty decent?

8. Now, lets get to power cabling. In this scenario, we will have three power distribution units (pdus), one on the left and two on the right side of the rack. Starting from the top of the rack, velcro together five power cables and plug them into one of the pdu strips on the left side of the rack from the top down.

Img_1930_3

9. Take another two sets of four bundled power cables and plug them into the other pdu strips on the right hand side also following the top to bottom convention. You should end up with a balanced distribution of power plugs.

Img_1931_4

10. Take a bundle of six power cables and plug them into the pdu strip on the left hand side.

11. Take another two sets of four power cables and plug them into the two pdu strips on the right hand side.

Img_1936_3

12. Start from the bottom up, bundle the power cables in groups of five. You will end up with two sets of five power cables and a bundle of four.

13. Plug the bundle of four power cables into the pdu on the left hand side.

Img_1939_3

At this point, you can take a step back and admire your work. Hopefully, it looks sort of like this:

Img_1942_43Img_1942_52

Good cabling can be an art form. As in any artistic endeavor, it takes a lot of time, patience, skill, and some imagination. There is no one size fits all solution, but hopefully this post will provide you with some great ideas on your next rack build out.

Standard

Exploring Dynamic Loading of Custom Filters in HBase

Any program that pulls data from a large HBase table containing terabytes of data spread over many nodes will need to put a bit of thought into the retrieval of this data. Failure to do this may mean waiting for and subsequently processing a lot of unnecessary data, to the point where it renders this program (whether a single-threaded client or a MapReduce job) useless. HBase’s Scan API helps in this aspect. It configures the parameters of the data retrieval, including the columns to include, start and stop rows and batch sizing.
 
The Scan can also include a filter which can be the most impactful
way to improve performance of scans of an HBase table. This filter is applied to a table and screens out unwanted rows from being included in a result set. A well-designed filter is performant and minimizes the data scanned and returned to the client. There are many useful Filters that come standard with Hbase, but sometimes the best solution is to use a custom Filter tuned to your HTable’s schema.

Before your custom filter can be used, it will have to compiled, packaged in a jar, and deployed to all the regionservers. Restarting the HBase cluster is necessary for the regionservers to pick up the code in their classpaths. Therein lies the problem – an HBase restart takes a non-trivial amount of time (although rolling restarts mitigate that somewhat) and the downtime is significant with a cluster as big as Flurry’s.

This is where dynamic filters come in. The word ‘dynamic’ refers to the on-demand loading of these custom filters, just like loading external modules at runtime in a typical application server or web server. In this post, we will explore an approach that makes this possible in the cluster.

How It Works Now
Before we dive into the workings of dynamically loading filters, let’s see how regular custom filters work.

Assuming the custom filter has already been deployed to a jar in the regionservers’ classpath, the client can simply use the filter, e.g. in a full table scan, like this

https://gist.github.com/4227003

This filter will have to be pushed to the regionservers to be run server-side. The sequence of how the custom filter gets replicated on the regionservers is as follows:

  1. The client serializes the Filter class name and its state into a byte stream by calling the CustomFilter’s write(DataOutput) method.
  2. The client directs the byte array to the regionservers that will be part of the scan.
  3. Server-side, the RegionScanner re-creates the Scan, including the filter, using the byte stream. Part of the stream is the filter’s class name, and the default classloader uses this fully qualified name to load the class using Class.forName().
  4. The filter is then instantiated using its empty constructor and configured using the rest of the filter byte array (using the filter’s readFields(DataInput) method (see org.apache.hadoop.hbase.client.Scan for details).
  5. The filter is then applied as a predicate on each row.

(1) myfilter.jar containing our custom filter resides locally in the regionservers’ classpath
(2) the custom filter is instantiated using the default Classloader

Once deployed, this definition of our custom filter is static. We can make an ad hoc query using the combination of filters, but if we need to add, extend or replace a custom filter, it has to be added to the regionserver’s classpath and we have to wait for its next restart before those filters can be used.

There is a faster way.

Dynamic Loading
A key takeaway from the previous section is that Filters are Writables – they are instantiated using the name of the class and then configured by a stream of bytes that the Filter understands. This makes the filter configuration highly customizable and we can use this flexibility to our advantage.

Rather than create a regular Filter, we introduce a ProxyFilter which acts as the extension point through which we can load our custom Filters on demand. During runtime, it will load the custom class filter itself.

Let’s look at some example code. To start with, there is just a small change we have to make on the client; the ProxyFilter now wraps the Filter or FilterList we want to use in our scan.

https://gist.github.com/4227336

The ProxyFilter passes its own class name to be instantiated on the server side, and serializes the custom filter after.

https://gist.github.com/4227434

On the regionserver the ProxyFilter is initialized in the same way as described in the previous section. The byte stream that follows should minimally contain the filter name and its configuration byte array. In the ProxyFilter’s readFields method, the relevant code looks like this.

https://gist.github.com/4227524

This is very much like how the default Scan re-creates the Filter on the regionserver with one critical difference – it uses a filterModule object to obtain the Class definition of the custom filter. This module retrieves the custom filter Class and returns it to ProxyFilter for instantiation and configuration.

There can be different strategies for retrieving the custom filter class. Our example code copies the jar file from the Hadoop filesystem to a local directory and delegates the loading of the Filter classes from this jar to a custom classloader [3].

To configure the location of the directory the module searches for the filters.jar in HDFS, add the following property in hbase-site.xml.

<property>
<name>flurry.hdfs.filter.jar</name>
<value>/flurry/filters.jar</value>
</property>
(1) The custom filter jar resides in a predefined directory in HDFS

(2) The proxyfilter.jar containing the extension point needs to reside locally in the regionserver’s classpath
(3) The ProxyFilter is instantiated using the default ClassLoader
(4) If necessary, the rowfilter.jar is downloaded from a preset Path in HDFS. A custom classloader in ProxyFilter proceeds to instantiate the correct filter. Filter interface calls are then delegated to the enclosed filter.

With the ProxyFilter in place, it is now simply a matter of placing or replacing the jar in the Hadoop FS directory to pick up the latest filters.

Reloading Filters
When a new Scan is requested on the server side, this module first checks up on the filter.jar. If this jar is unchanged, the previously loaded Classes are returned. However, if the jar has been updated, the module repeats the process of downloading it from HDFS, creating a new instance of the classloader and reloading the classes from this modified jar. The previous classloader is dereferenced and left to be garbage collected. Restarting the HBase cluster is not required.

The HdfsJarModule keeps track of the latest custom filter definitions using a separate classloader for the different jar versions

Custom classloading and reloading can be a class-linking, ClassCastException minefield, but the risk here is mitigated by the highly specialized use case of Filtering. The filter is instantiated and configured per scan and its object lifecycle limited to the time it takes to do the scan in the regionserver. The example uses the child-first classloader mentioned in a previous post on ClassLoaders that searches for a configured set of URLs before delegating to its parent classloader [2].

Things to watch out for

  • The example code has a performance overhead as it makes additional calls to HDFS to check for the modification time of the filter jar when a filter is first instantiated. This may be a significant factor for smaller scans. If so, the logic can be changed to check the jar less frequently.
  • The code is also very naïve at this point. Updating the filter.jar in the Hadoop FS while a table scan is happening can have undesired results if the updated filters are not backward compatible. Different versions of the jar can be picked up by the RegionServers for the same scan as they check and instantiate the Filters at different times.
  • Mutable static variables are discouraged in the custom Filter because they will be reinitialized when the class is reloaded dynamically.

Extensions
The example code is just a starting point for more interesting functionality tailored to different use cases. Scans using filters can also be used in MapReduce jobs and coprocessers. A short list of possible ways to extend the code:

  • The most obvious weakness in the example implementation is the ProxyFilter only supports one jar. Extending that to include all jars in a filter directory will be a good start. [4]
  • Different clients may expect certain versions of Filters. Some versioning and bookkeeping logic will be necessary to ensure that the ProxyFilter can serve up the correct filter to each client.
  • Generalize the solution to include MapReduce scenarios that use HBase as the input source. The module can load the custom filters at the start of each map task from the MR job library instead, unloading the filters after the task ends.
  • Support other JVM languages for filtering. We have tried serializing Groovy 1.6 scripts as Filters but performance was several times slower.


Using the Proxyfilter as a generic extension point for custom filters allows us to improve our performance without the hit of restarting our entire HBase cluster.

Footnotes
[1] Class Reloading Basics http://tutorials.jenkov.com/java-reflection/dynamic-class-loading-reloading.html
[2] See our blog post on ClassLoaders for alternate classloader delegation http://tech.flurry.com/using-the-java-classloader-to-write-efficient
[3] The classloader in our example resembles the one described in this ticket https://issues.apache.org/jira/browse/HBASE-1936
[4] A new classloader has been just been introduced in hbase-0.92.2 for coprocessors, and it seems to fit perfectly for our dynamic filters https://issues.apache.org/jira/browse/HBASE-6308
[5] Example code https://github.com/vallancelee/hbase-filter/tree/master/customFilters

Standard

Squashing bugs in multithreaded Android code with CheckThread

Writing correct multithreaded code is difficult, and writing Android apps is no exception. Like many mobile platforms, Android’s UI framework is single threaded and requires the application developer to manage threads with no thread-safe guarantee. If your app is more complicated than “Hello, World!” you can’t escape writing multithreaded code. For example, to build a smooth and responsive UI, you will have to move long running operations like network and disk IO to background threads and then return to the UI thread to update the UI.

Thankfully, Android provides some tools to make this easier such as the AsyncTask utility class and the StrictMode API. You can also use good software development practices such as adhering to strict code style and requiring careful code review of code that involve the UI thread. Unfortunately, these approaches require diligence, are prone to human error, or only catch errors at runtime.

CheckThread for Android

CheckThread is an open source project authored by Joe Conti that provides annotations and a simple static analysis tool for verifying certain contracts in multithreaded programs. It’s not a brand new project and it’s not very difficult to use, but it hasn’t had a very high adoption for Android apps. It offers an automated alternative to exclusively using comments and code review to ensuring no bugs related to the UI thread are introduced in your code. The annotations provided by CheckThread are: @ThreadSafe, @NotThreadSafe, @ThreadConfined

ThreadSafe and NotThreadSafe are described in Java Concurrency in Practice, and CheckThread enforces the same semantics that book defines. For the purposes of this blog post, the only annotation that we’ll be using is ThreadConfined.

Thread confinement is a general property of restricting data or code to access from only a single thread. A data structure confined to the stack is inherently thread confined. A method that is only ever called by a single thread is also thread confined. In Android, updates to the UI must be confined to the UI thread. In very concrete terms, this implies that any method that mutates the state of a View should only be called from the UI thread. If this policy is violated, the Android framework may throw a RuntimeException, but also may simply produce undefined behavior, depending on the specific nature of the update to the UI.

CheckThread supports defining thread policies in XML files, so while it would be possible, it’s not necessary to download the source of the Android framework code and manually add annotations to it. Instead, we can simply define a general thread policy to apply to Android framework classes.

Time for an Example

The following example demonstrates how to declare a thread policy in XML, annotate a simple program and run the CheckThread analyzer to catch a couple of bugs.

CheckThread’s XML syntax supports patterns and wildcards which allows you to concisely define policies for Android framework code. In this example we define two Android specific policies. In general this file would contain more definitions for other Android framework classes and could also contain definitions for your own code.

The first policy declares that all methods in Activity and its subclasses that begin with the prefix “on” should be confined to the main thread. Since CheckThread has no built-in concept of the Android framework or of the Activity class we need to inform the static analyzer which thread will call these methods.

The second policy declares that all methods in classes ending with “View” should be confined to the main thread. The intention is to prevent calling any code that modifies that UI from any other thread than the UI thread. This is a little bit conservative since there are some methods in Android View classes that have no side-effects, but it will do for now.

https://gist.github.com/4113656

Having defined the thread policy, we can turn to our application code. The example app is the rough beginnings of a Hacker News app. It fetches the RSS feed for the front page, parses the titles and displays them in a LinearLayout.

This first version is naive; it does network IO and XML parsing in Activity.onCreate. This error will definitely be caught by StrictMode, and will likely just crash the app on launch, so it would be caught early in development, but it would be even better if it were caught before the app was even run.

https://gist.github.com/4113662

In this code, we make a call to the static method getHttp in the IO utility class. The details of this class and method are not important, but since it does network IO, it should be called from off the UI thread. We’ve annotated the entire class as follows:

https://gist.github.com/4113669

This annotation tells CheckThread that all the methods in this class should be called from the “other” thread.

Finally, we’re ready to run the static analyzer. CheckThread provides several ways to run the analysis tool, including Eclipse and Intellij plugins, but we’ll just use the Ant tasks on the command line. This is what CheckThread outputs after we run the analyzer:

https://gist.github.com/4113676

As you can see, CheckThread reports an error: we’re calling a method that should be confined to the “other” thread from a method that’s confined to “MAIN”. One solution to this problem is to start a new thread to do network IO on. We create an anonymous subclass of java.util.Thread and override run, inside of which we fetch the RSS feed, parse it and update the UI.

https://gist.github.com/4113683

We’ve annotated the run method to be confined to the “other” thread. CheckThread will use this to validate the call to IO.getHttp. After running the analyzer again, we discover that there’s a new error reported:

https://gist.github.com/4113686

This time, the error is caused by calling the method setText on a TextView from a different thread than the UI thread. This error is generated by the combination of our thread policy defined in XML and the annotation on the run method.

Instead, we could call the Activity.runOnUiThread with a new Runnable. The Runnable’s run method is annotated to be confined to the UI thread, since we’re passing it to a framework method that will call it from the UI thread.

https://gist.github.com/4113689

Finally, CheckThread reports no errors to us. Of course that doesn’t mean that the code is bug free, static analysis of any kind has limits. We’ve just gotten some small assurance that the contracts defined in the XML policy and annotations will be held. While this example is simple, and the code we’re analyzing would be greatly simplified by using an AsyncTask, it does demonstrate the class of errors that CheckThread is designed to catch. The complete sample project is published on Github.

The Pros and Cons of Annotations

One drawback that is probably immediately obvious is the need to add annotations to a lot of your code. Specifically, CheckThread’s static analysis is relatively simple, and doesn’t construct a complete call graph of your code. This means that the analyzer will not generate a warning for the code below:

https://gist.github.com/4113695

While this may appear to be a significant problem, it’s my opinion that in practice it is not actually a deal breaker. Java already requires that the programmer write most types in code. This is seen by some as a drawback of Java (and is often cited incorrectly as a drawback of static typing in general). However there are real advantages to annotating code with type signatures, and even proponents of languages with powerful type inference will admit this, since it’s usually recommended to write the type of “top-level” or publicly exported functions even if the compiler can infer the type without any hint.

The annotations that CheckThread uses are similar; they describe an important part of a method’s contract, that is whether it is thread safe or should be confined to a specific thread. Requiring the programmer to write those annotations elevates the challenge of writing correct multithreaded code to be at the forefront of the programmer’s mind, requiring that some thought be put into each method’s contract. The use of automated static analysis makes it less likely that a comment will become stale and describe a method as thread safe when it is not.

The Future of Static Analysis

The good news is that the future of static analysis tools designed to catch multithreaded bugs is looking very bright. A recent paper published by Sai Zhang, Hao Lü, and Michael D. Ernst at the University of Washington describes a more powerful approach to analyzing multithreaded GUI programs. Their work targets Android applications as well as Java programs written using other GUI frameworks. The analyzer described in their paper specifically does construct a complete call graph of the program being analyzed. In addition, it doesn’t require any annotations by the programmer and also addresses the use of reflection in building the call graph, which Android specifically uses to inflate layouts from XML. This work was published only this past summer, and the tool itself is underdocumented at the moment, but I recommend that anyone interested in this area read the paper which outlines their work quite clearly.

 

Standard

Write Once, Compile Twice, Run Anywhere

Many Java developers use a development environment different from the target deployment environment.  At Flurry, we have developers running OS X, Windows, and Linux, and everyone is able to contribute without thinking much about the differences of their particular operating system, or the fact that the code will be deployed on a different OS.

The secret behind this flexibility is how developed the Java toolchain has become. One tool (Eclipse)  in particular has Eclipsed the rest and become the dominant IDE for Java developers. Eclipse is free, with integrations like JUnit support, and a host of really great plugins making it the de facto standard in Java development, displacing IntelliJ and other options.  In fact, entry level developers rarely even think about the compilation step, because Eclipse’s autocompilation keeps your code compiled every time you save a file.

There’s Always a Catch

Unfortunately no technology is magical and while this set up rarely causes issues, it can. One interesting case arises when the developer is using the Eclipse 1.6 compiler compliance and the target environment uses Sun’s 1.6 JDK compiler.  For example at Flurry, during development we rely on Eclipse’s JDT Compiler, but the code we ship gets compiled for deployment on a Jenkins server by Ant using Sun’s JDK compiler. Note that both the developer and continuous integration environment are building for Java 6, but using different compilers. 

Until recently this never came up as an issue as the Eclipse and Sun compilers, even when running on different operating systems, behave almost identically.  However, we have been running into some interesting (and frustrating) compiler issues that are essentially changing “Write Once, Run Anywhere” into “Write Once, Compile Using Your Target JDK, Run Anywhere.”  We have valid 1.6 Java code using generics, which compiles fine under Eclipse, but won’t compile using Sun’s javac.

Let’s See an Example

An example of the code in question is below. Note that it meets the Java specification and should be a valid program. In fact, in Eclipse using Java 1.6 compiler compliance the code compiles, but won’t compile using Sun’s 1.6 JDK javac.

https://gist.github.com/4035987

Compiling this code using javac in the Sun 1.6 JDK gives this compiler error:

https://gist.github.com/4036089

“Write Once, Run Anywhere” never made any promises about using different compilers, but the fact that our toolchain was using a different compiler than our build server never bore much thought until now.

Possible Solutions

The obvious solution is to have all developers work on the same environment as where the code will be deployed, but this would defer developers from using their preferred environment and impact productivity by constraining our development options. Possible solutions we have kicked around :

  1. Have ant compile using the Eclipse incremental compiler, (using flags  -Dbuild.compiler=org.eclipse.jdt.core.JDTCompilerAdapter and of course -Dant.build.javac.target=1.6). This side steps the problem by forcing the continuous integration system to use the same compiler as developer laptops, but is not ideal as this was never an intended use of the Eclipse compiler. 
  2. Move to the 1.7 JDK for compilation, using a target 1.6 bytecode. This solves this particular issue, but what happens in the future?
  3. Change the code to compile under Sun’s JDK. This is not a bad option but will cost some speed of development found in the simplicity of Eclipse’s built in system. 

My experience has been that Eclipse is a well worn path in the Java world, and its a little surprising that this hasn’t come up before for us given the heavy use of generics (although there are lots of generics issues which have been logged over at bugs.sun.com, like http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6302954 which has come up for us as a related issue – the “no unique maximal instance exists” error message). 

Switching to use the Eclipse compiler for our deployable artifacts would be an unorthodox move, and I’m curious if anyone out there reading this has done that, and if so, with what results.

We had a discussion internally and the consensus was that moving to 1.7 for compilation using a target 1.6 bytecode (#2) should work, but would potentially open us up to bugs in 1.7 (and would require upgrading some internal utility machines to support this).  We aren’t quite ready to make the leap to Java 7, although its probably time to start considering the move. 

For now, we coded around the compiler issue, but its coming up for us regularly now, and we are kicking around the ideas on how to resolve.  In the near term, for the projects that run into this generics compile issue, developers are back to using good ole ant to check if their code will “really” compile.  Its easy to forget how convenient autocompilation has become, and the fact that it isn’t really the same as the build server’s compiler.

Standard