Yesterday, security researchers announced that iPhones and 3G iPads maintain a log of the phone’s position over time. Today, another researcher released a python program to dump the equivalent log from Android devices. While it has been common knowledge that cellular phone carriers store customer location data for a while now, it comes as a surprise to learn that client devices are carrying their own logs as well. These logs are included in backups on the user’s computer and rogue applications could easily steal the data if left unencrypted. Even more troublesome are reports that Michigan State Police are using a device from a company named Cellebrite to extract data from people’s cellular phones during routine traffic stops.
So what data, exactly, is contained in the consolidated.db file? And just how accurate is the positioning data contained within? After unencrypting my iPhone backups in iTunes, I used the information in Pete Warden’s documentation on GitHub to pull the information into a spreadsheet. I had 12,380 records in total starting on 9/3/2010. That’s the day I purchased my iPhone 4. On average, that’s 61 location entries per day! After spending some time with the data, here are some preliminary observations:
1. The log appears to be limited to cell phone tower data, not GPS data. The values in the Horizontal Accuracy column never fall below 500 (meters) which suggests this is approximate location data from cell towers. Moreover, the VerticalAccuracy and Speed columns are always -1, which is what the Significant Location Change service reports when GPS is not in use.
2. The MCC and MNC columns represent some combination of carrier and/or country code. These columns have values of 310 and 410, respectively, whenever I was in the United States. This includes trips to both coasts, so I’m fairly certain these values apply to AT&T anywhere in the country. In Europe, these columns take on different values for each country. It appears that the MCC column might be a country code and the MNC column is a carrier code.
3. Timestamps are only updated after a significant location change. Surprsingly, the same timestamp value is shared across many location entries. This behavior seems rather strange, as it is trivial to grab the current system time when writing locations to the database. Apparently the column was intended to represent the phone’s arrival in a certain area. Generally, travelling a distance of 5 kilometers is enough to trigger a new timestamp but this is not guaranteed. I have many timestamp updates at less than 1km traveled and quite a few updates of over 5km without a timestamp change. It appears that driving somewhere quickly (i.e. using the freeway) is more likely to trigger a timestamp change than by slowly moving to a new location (i.e. on foot). This is not a reliable metric by any means, as I had some blocks of entries with the same timestamp span upwards of 20 miles within a city.
4. A stationary phone can generate many log entries. Building on the idea that timestamps only change when the phone is moved a significant distance, I started mapping blocks of entries with the same timestamp value. It turns out the phone logs quite a few location entries while the phone is technically stationary, even in buildings with great reception. This is not surprising, as received signal strength from nearby towers will vary significantly as I walk through a building. The following image shows location entries while I was in my apartment, which has excellent reception: (Note: I translated all of the values by the same offset for some anonymity)
5. Averaging entries with the same timestamp comes remarkably close to the phone’s actual location, but only if reception is good and cellular towers are dense. Averaging a random sample of consolidate.db entries with late-night timestamps yielded coordinates only 690 meters from the actual location of my apartment. The average of a block of entires on a Saturday gave me coordinates only 630 m from my girlfriend’s house. Both of these locations have great reception and are moderately-dense suburban neighbors. On the other hand, averaging entries during the week put me a full 8.1 km from the actual location of my office. We have terrible reception in my office, so this is not too surprising.
6. 50% of location changes occur after 1 km of [perceived] movement. 33% occur after 500 m of movement. A full 90% occur within 5km. These distances are calculated from the estimated locations in consolidated.db. It would be interesting to log actual GPS location in parallel and compare the two values.
7. Locations with good reception have far more entries than locations with poor reception. I expected the opposite to be true, as buildings with poor reception tend to have higher received signal strength fluctuations and therefore trigger more perceived location changes. Instead, the phone infrequently logs locations whenever reception is poor. This could be a side effect of some built-in mechanism to only log locations when the phone is relatively stationary and receiving a stable signal. As a result, it’s very difficult to pin down the phone’s location if reception is poor.
8. Older entries are less frequent. This one is just a rough observation. It’s entirely possible that I’ve been more mobile lately than when I first bought my iPhone. However, my log has far more entries with timestamps in the past several months than it does from the beginning of the consolidated.db log. This could be due to changes in the iOS software to log more often. It could also indicate that the iOS software prunes older entries over time to keep consolidated.db from growing too large, but I’m not so sure this is the case.
So what does this mean for iPhone users? Not a whole lot, as long as you remember to encrypt your iPhone backup in iTunes to make it a bit more difficult for rogue applications to access. Still, I’m not sure why iOS and Android OS find it necessary to keep location logs. The only logical explanation I’ve heard so far is that it might help speed up future geolocation queries. It will be interesting to see how Apple and Google respond to this, if at all.