Yogesh Khatri put out a blog post last week called Parsing Unknown Protobufs with Python. I was very excited about this, since protobufs have intrigued me for quite a while. I've tackled deciphering unknown ones off and on, as I know they are quite common on mobile devices and I suspect there are more in URLs than the few known ones in Google Search.
If you haven't read Yogesh's post (you should), a key concept about protobufs is that they are minimal; you have to make guesses when parsing them if you don't know the original data type for each item.
Yogesh created a test protobuf and walked through parsing it using three different tools: protoc (a Google-supplied command line tool), protobuf-decoder (an older Python 2 script), and blackboxprotobuf (a newer Python script). All of them made different assumptions about data types, because they had too; there isn't enough information in the raw protobuf to differentiate between different "wire-types".
His post shows the differences between his original protobuf and how it was parsed with the three tools (again, read his post if you haven't). Yogesh was kind enough to share his test file (tester_pb) with me; I'm going to use it to show two more ways to decode unknown protobufs!
My favorite do-it-all data munging tool is CyberChef. It's super powerful, easy to use, and constantly getting updates. It also can decode protobufs! Here it is decoding tester_pb:
It's interesting that it decodes the protobuf in yet another different way than the previous three tools. CyberChef didn't do anything incorrectly; it just made a different set of processing assumptions/guesses than the other tools.
What's also great about CyberChef is that it can do additional transforms to the data before parsing the protobuf. Do you have a b64-encoded protobuf? Or one that is compressed, then base85-encoded, then ROT13'd? No problem, just chain the appropriate operation(s) in CyberChef before 'Protobuf Decode'.
With this latest update, Unfurl can now parse protobufs as well! It's using slightly-modified blackboxprotobuf code, so the "assumptions" it makes about the data before displaying are the same. Here is the same tester_pb being parsed with Unfurl:
However, if you hover over a field, Unfurl tries to explain a bit about wire types and possible other data formats.
Unfurl isn't designed to take files as input, so you can't just drop a protobuf file in it. It supports reading protobufs three ways (right now):
- as a hex-string (example),
- as standard base64-encoded (example),
- and as URL-safe base64-encoded (example).
I'm excited to have protobuf support in Unfurl. It means that every node that looks like a potential protobuf will get tested as one. I'm hoping to discover more interesting data in URLs this way. If you Unfurl a URL and find some protobuf data, please let me know!
Further Protobuf Reading
- Parsing unknown protobufs with python by Yogesh Khatri
- Just Call Me Buffy the Proto Slayer – An Initial Look into Protobuf Data in Mac and iOS Forensics by Sarah Edwards
- Usagestats on Android 10 (Q) by Yogesh Khatri
- Google Search & Personal Assistant data on android by Yogesh Khatri
- What did I Listen to on Spotify for iOS? by Phill Moore
- Protocol Buffers - Message Structure by Google