Open Source Tools

Alexa, Tell Me Your Secrets

Ryan Benson

Jun 5, 2016 • 5 min read

The Amazon Echo is a nifty little device that you communicate with via speech - you can ask it to do various tasks and it verbally replies. You preface each command with the trigger word - either "Alexa", "Amazon", or "Echo". The Echo uses the Alexa Voice Service to handle the verbal interactions and Alexa really does a great job. I got one of these devices as a gift a few months ago and I've grown to like it quite a bit. Of course, the paranoid side of me was a bit uneasy about a device constantly recording and processing audio from my home looking for the trigger words, but the forensic investigator side of me was very interested at what potential artifacts this new device could create and the insights Alexa could give into real-world things. I'm always very interested at technological artifacts that let investigators escape the digital and get a glimpse into the physical world.

Acquiring Alexa's Data

That being said, I wasn't about to do anything potentially destructive to my Echo (like I said, it was a gift and I would like to keep it in a functioning state), so a teardown like Cheeky4n6Monkey is proposing is out for me. Since many IoT devices are controlled in some fashion via a smartphone, I thought an analysis of data from the Alexa companion iOS app would be a good, non-destructive option.

The first step was to get the Alexa data off my iPhone. I created an iTunes backup of my phone and used iPBD2 to browse it (for a rundown of how to get data off mobile devices, check out Practical Mobile Forensics by Heather Mahalik). In the backup, I found a com.amazon.echo directory, which had a handful of files in it: a preferences plist, binary cookies, and 'LocalData.sqlite'. The SQLite file caught my eye immediately, as many applications store all kinds of interesting information in SQLite DBs. I popped it open in an SQLite viewer and found four tables: ZDATAITEM, Z_METADATA, Z_MODELCACHE, and Z_PRIMARYKEY.

The ZDATAITEM table had the most interesting content; the other three tables only had one row each and appeared to contain configuration and version information. The ZDATAITEM table only had four rows, but two of them looked promising. They basically were key/value pairs, with ZKEY being ToDoCollection.TASK or ToDoCollection.SHOPPING_ITEM, and ZVALUE being a long JSON string.

Parsing Alexa's Data

From using my Echo and associated Alexa app, I knew that there are two todo-type lists that users can add things to via voice: the Shopping List and the To-do List. That seems to match up nicely with the two rows in the ZDATAITEM table. Each ZVALUE is an array of JSON objects, with each object containing information about a specific task. Here is an example of one object; I'll detail the fields below.

Item Text

The text field is at the core of the two lists. text is the content of the shopping/task item, as displayed to the user in the app. This is what Alexa determined the user said. The companion field, nbestItems, contains the options that Alexa chose from. In some cases, nbestItems only has one value and is the same as the text field. In many instances however, a whole range of runners-up are in nbestItems. In my data, nbestItems often contains words from before or after the eventual text selection ("Alexa, add orange juice to the shopping list"). I think this field could be interesting in that it has the potential to pick up extraneous words; what if another conversation was going on as the user was speaking to Alexa? Or what if a random conversation happened to contain either "Alexa" or "Echo" and was partially recorded? I think both of these have pretty low chances of 1) actually happening, and 2) being useful to a case, but it's something to think about.

Item Text Examples:

text: "orange juice"
nbestItems: ["orange juice", "juice", "juice to", "add orange juice"]

Timestamps

Four timestamp fields exist for each object; however, the reminderTime wasn't populated in any of my test data. I haven't used reminders, so the absence is not terribly surprising, and I would guess that when populated the value would be a millisecond epoch timestamp like the other three. In all the objects I looked at, the lastLocalUpdatedDate was a little before the lastUpdatedDate. I imagine this is because as I check off items on my mobile device, lastLocalUpdatedDate gets triggered. lastUpdatedDate is set a little later, as it may reflect when the update propagates to the server.

Timestamp Examples:

createdDate: 1457219023029
lastLocalUpdatedDate: 1457221754310
lastUpdatedDate: 1457221754558
reminderTime: null

IDs

There were four IDs for each object. The customerId was static for all my test data, which makes sense as my account was the only one used. The itemId is a long string, appearing to be the customer ID concatenated with a GUID, with the two separated by a #. The utteranceId was null for all my test cases.

The originalAudioId is where it gets more interesting. The originalAudioId field will be null if the item was not entered by voice (items also can be added/updated/removed by typing them in the mobile app or in a browser on amazon.com). When it is present, it is unique per entry and is used by the Alexa service to access stored audio clips of speech. I played around with the web version of Alexa (alexa.amazon.com) and Fiddler, a debugging proxy. The Alexa app (both on a mobile device and in the browser) allows the user to play back the raw audio of what Alexa heard, in order to let Alexa know if it understood what you said. In the web app, when you click the button to playback an audio clip, the clip is retrieved from pitangui.amazon.com/api/utterance/audio/data?id=<originalAudioId>. There's a whole lot more interesting stuff here with the API and the originalAudioId, but I'll save that for another post.

ID Examples:

customerId: A1C9VTA5F7ZW1N
itemId: A1C9VTA5F7ZW1N#6826a04d-b48e-3128-a1cc-9037bd48ee6d
utteranceId: null
originalAudioId: AB72C63C86AW3:1.0/2016/03/05/23/B0F00615549601C4/03:40::TNIH_2V.14c747fb-52c0-4018-8908-4163f73cb865ZXV/0

Status Items

This group of fields look pretty self explanatory. complete and deleted were both always either true or false, type was either SHOPPING_ITEM or TASK, and version was '2' in my test data.

Status Examples:

complete: true
deleted: false
type: SHOPPING_ITEM
version: 2

Python Script

I put together a Python script to parse out the objects in the Shopping list and the To-do list. It takes the LocalData.sqlite file as input, reads the two rows of interest from ZDATAITEM, parses the embedded JSON, and prints everything out in an XLSX file. It also prints out the text, created, and last updated values for each list item to the console.

I hope you find the script and analysis useful (or at least interesting). Until next time!

Get the script from Github:

alexa_todos_parser.py