Unfurl v2022.11: Social Media Edition

This "social media edition" Unfurl release includes parsing Twitter sharing codes, timestamps from Mastodon and LinkedIn IDs, expanding Substack redirects, & more!

Unfurl v2022.11: Social Media Edition

It's been a while, but a new Unfurl release is here! v2022.11 adds new features and has behind-the-scenes changes. With all the attention on Twitter lately, in this post I'm going to highlight changes related to social media websites:

  • Defining Twitter's sharing (s) parameter values (all 71 of them!)
  • Extracting timestamps from Mastodon IDs
  • Decoding multiple types of LinkedIn identifiers
  • Expanding Substack redirect links
  • Parsing common tracking/analytics query string parameters

Get it now, or read on for more details about the new features!

Twitter

Besides the headline-grabbing changes at Twitter, there have been some gradual, less obvious changes as well: the query string parameters. A few years ago (maybe 2018?) the s parameter appeared, and people (myself included) began speculating and trying to figure out its purpose. By experimentation, the values for s of 19, 20, and 21 seemed pretty clear: they meant a sharing source of Android, Twitter Web, and iOS, respectively (and Unfurl parsed them as such).

A few weeks ago, someone was poking at Twitter's JavaScript files and discovered an object with the mappings of 71 values for the sharing codes! They kindly shared this with me (thanks 2xyo!) and I added them to Unfurl.

The codes generally show the combination of device type (iOS, iPhone, Android, web browser) and method (email, WhatsApp, copy) used to share the tweet. I haven't personally seen the majority of these codes in use so I can't say they all are still valid, but then I also haven't shared a tweet from my iPad using LinkedIn (s=71)!

Here's my cleaned-up interpretation of what the s codes mean (links to the original .js files are in the GitHub issue if you're curious).

s Parameter Shared From
01 an Android using SMS
02 an Android using Email
03 an Android using Gmail
04 an Android using Facebook
05 an Android using WeChat
06 an Android using Line
07 an Android using FBMessenger
08 an Android using WhatsApp
09 an Android using Other
10 iOS using Messages or SMS
11 iOS using Email
12 iOS using Other
13 an Android using Download
14 iOS using Download
15 an Android using Hangouts
16 an Android using Twitter DM
17 Twitter Web using Email
18 Twitter Web using Download
19 an Android using Copy
20 Twitter Web using Copy
21 iOS using Copy
22 iOS using Snapchat
23 an Android using Snapchat
24 iOS using WhatsApp
25 iOS using FBMessenger
26 iOS using Facebook
27 iOS using Gmail
28 iOS using Telegram
29 iOS using Line
30 iOS using Viber
31 an Android using Slack
32 an Android using Kakao
33 an Android using Discord
34 an Android using Reddit
35 an Android using Telegram
36 an Android using Instagram
37 an Android using Daum
38 iOS using Instagram
39 iOS using LinkedIn
40 an Android using LinkedIn
41 Gryphon using Copy
42 an iPhone using SMS
43 an iPhone using Email
44 an iPhone using Other
45 an iPhone using Download
46 an iPhone using Copy
47 an iPhone using Snapchat
48 an iPhone using WhatsApp
49 an iPhone using FBMessenger
50 an iPhone using Facebook
51 an iPhone using Gmail
52 an iPhone using Telegram
53 an iPhone using Line
54 an iPhone using Viber
55 an iPhone using Instagram
56 an iPhone using LinkedIn
57 an iPad using SMS
58 an iPad using Email
59 an iPad using Other
60 an iPad using Download
61 an iPad using Copy
62 an iPad using Snapchat
63 an iPad using WhatsApp
64 an iPad using FBMessenger
65 an iPad using Facebook
66 an iPad using Gmail
67 an iPad using Telegram
68 an iPad using Line
69 an iPad using Viber
70 an iPad using Instagram
71 an iPad using LinkedIn

In addition to the s parameter, we've seen t roll out gradually. I saw t on links shared from Android in late 2021 (s=19), then from Twitter Web (s=20) in early 2022, and finally from iOS (s=21) a bit later in 2022. I don't think anyone outside of Twitter knows exactly how the t parameter is constructed, but from my observations it appears consistent per device for a time. I shared tweets via numerous methods in August from my phone and the t was consistently the same. I did similar tests again in November, and the t value was again the same for different sharing methods, but it was different than from August. Maybe a software update or some other change on the device caused a change in the t "fingerprint"? With this in mind, I think seeing the same t values on multiple links suggests the same device was the sharing source. However, different t values could still be from the same device, just over a longer time period.

Mastodon

This isn't actually a new parser (it's been in Unfurl for a few years), but I figured it would be worth mentioning with the increased interest in Mastodon. Mastodon is similar to Twitter in some respects; one of those is that the URLs of "toots" (Mastodon's version of tweets) contain an embedded timestamp. The long ID at the end of the URL is similar to a Twitter Snowflake:

https://infosec.exchange/web/@RyanDFIR/109306117687853105

Due to the federated nature of Mastodon, it could be running on domain that Unfurl doesn't know about. To avoid false positives, I only have a short allowlist of domains to parse as Mastodon instances. If you know of any others that you'd like to be parsed, let me know.

LinkedIn

A while ago, I did some research and discovered how to dissect a TikTok identifier and extract a timestamp. Ollie Boyd figured out that IDs in LinkedIn post URLs had a similar makeup and made a tool to extract those timestamps. I've added this ability to Unfurl:

Unfurl extracting a timestamp from a LinkedIn Post ID

LinkedIn Messaging IDs

It turns out these LinkedIn IDs are used in more places than posts. One place they used to appear was in Messaging threads. When viewing messages on linkedin.com, the URL for each message thread (series of messages with a user) looked like https://www.linkedin.com/messaging/thread/6685980502161199104/. The ID at the end has an embedded timestamp that seemed to line up with when the first message in the thread was sent.

I've been referencing this in past tense because this isn't the case anymore; message threads now have URLs that look like https://www.linkedin.com/messaging/thread/2-ZTRkNzljZjgtOTRmNC00ZGJkLWJlYTktMDFjOWU4MTgxMjhjXzAxMA==/. These new IDs (which I'm calling "v2" from the 2- at the beginning) are base64-encoded UUIDs with a few characters appended. The above "v2" ID decodes to e4d79cf8-94f4-4dbd-bea9-01c9e818128c_010.

For those familiar with UUIDs, you may spot that this looks like a UUIDv4 (randomly-generated). I went back through my LinkedIn messages threads, all the way back to 2009 (wow, I've been on there a long time), and found something interesting. The older message threads had UUIDs that fit the form of UUIDv5 (name-based), while the newer ones fit UUIDv4. From my messages, the switch from UUIDv5 to UUIDv4 happened near early 2021-05 (I have a UUIDv5 message on 2021-04-26 and a UUIDv4 on 2021-05-14).

Why I am going on about this? Neither version 4 or 5 UUIDs contain any embedded timestamp information (unlike version 1). However, now for this particular use case, we can infer that a LinkedIn ID based on UUIDv5 corresponds to a message thread older than 2021-05, while one with a UUIDv4 was sent after that. It's a small, rough bit of timing information, but that's what Unfurl is all about: trying to parse all those tiny pieces of knowledge, in the hope that when put together they might paint a clearer picture.

LinkedIn Profile IDs

A few months ago, Jack Crook showed how to decode LinkedIn Profile IDs and use their sequential nature to estimate profile creation time:

These "profile IDs" are different than the other IDs we discussed previously. I thought this technique was really interesting; I've added parsing the ID from base12 to Unfurl. I don't yet do anything with taking that number and estimating the creation time, but that sounds like a neat little project when I find the time.

Tracking URL Parameters

Many websites add URL parameters to links to help with user tracking and analytics. This is not a new practice; we've all seen a bunch of parameters tacked on the end of links. As investigators, we can sometimes use these parameters to infer more information: how a user clicked on a link, what site the link was on, or even when they clicked it.

These parameters are key/value pairs; for example, in utm_source=newsletter, the key is utm_source and the value is newsletter. The values often contain helpful clues (in the example, I'd guess that the link was from an email newsletter). Even in the cases when the values are opaque, we can glean some information from the key. For example, with fbclid=IwAR3Nuy7koMAB1KyVE1NqjcVGqAExIxVjQLSx-01U_e3LHKwSOzf2NsyP0UI, I have no idea (yet!) how to parse anything out of the IwAR3... value, but from the key I can infer the link was from Facebook.

I've added parsing of some of the most common of the tracking/analytics parameters to Unfurl. If you find one you'd like added, please let me know.

Substack

I've seen Substack increase in popularity as well. I so far only subscribe to "The Info Op" by the grugq, but there is a lot of other good content there too. I typically read it via email and noticed that all the links go through Substack redirects. I added expanding of Substack's redirect links to Unfurl; since many of the links are to Twitter/Mastodon and Substack adds utm_* tracking parameters, this enables those parsers to run as well, making some nice Unfurl graphs:

Unfurl parsing a Substack redirect link from an email

Get it!

Those are the major items in this Unfurl release. There are more changes that didn't make it into the blog post; check out the release notes for more. To get Unfurl with these latest updates, you can:

All features work in both the web UI and command line versions (unfurl_app.py & unfurl_cli.py).