A new Unfurl release is here! v2022.02 has been a long time coming and adds new features, including:

  • Parsing for Google Search's  aqs parameter
  • Integrates MISP's "warning lists" to enrich domain names
  • Supports expanding shortlinks from 3x more domains
  • Extract encoded timestamps from Twitter image filenames
  • Parsing for Brave Search

Get it now, or read on for more details about the new features!

Google Search's aqs Parameter

Google Search's Assisted Query Stats (or aqs) parameter isn't new (it's been around since 2012 from what I can tell). Unlike many other Google Search URL parameters, it isn't a secret - it's (mostly) documented in the Chromium source. Per a comment in the code, AQS' purpose is to log "impressions of all autocomplete matches shown at the query submission time."

So what does that really mean? Consider the following screenshot:

Searching for "unfurl url" in Chrome's Omnibox

In the screenshot, I have typed "unfurl url" into Chrome's "Omnibox" (the address/search box). Chrome is showing me four suggestions relevant to what I have entered:

Suggestion 1: Do a Google Search for the text I entered ("unfurl url")
Suggestions 2-4: Visit relevant pages from my local history - parts of the page title and URL that contain the words I entered are bolded in each suggestion

I ultimately selected the first suggestion and was sent to the Google Search Engine Results Page (SERP) for "unfurl url". The URL had an aqs parameter: aqs=chrome..69i57j69i60l3.7758j0j9. Parsing that URL with Unfurl yields:

Google SERP URL containing an aqs parameter, parsed with Unfurl

What Unfurl parses from the aqs parameter can give quite a bit of insight about what I did to get to that Google SERP:

  • I started on the "New Tab Page" in Chrome
  • I was shown four suggestions ("Autocomplete Matches")
  • The first (index 0) was a Google Search suggestion
  • The second, third, and fourth (indexes 1-3) were URLs from my local history that were related to the text I entered
  • I select the first suggestion
  • It was 19.794 seconds from when I started typing to when I went to the SERP (this seems long; taking a screenshot slowed me down evidently)

The aqs parameter doesn't capture the content of the suggestions offered to me, but I think you'd agree that what it does log is pretty interesting. The mechanics of unpacking the aqs parameter would be too much for this post, but I may come back to it in a future post. You can also take a look through Unfurl's code for parsing it if you're curious.

Enrich Domain Names using MISP Lists

One requested feature was to have some sort of annotation for domain names showing how popular they are. The open source MISP project has a curated set of lists of all sorts, including domain names:

GitHub - MISP/misp-warninglists: Warning lists to inform users of MISP about potential false-positives or other information in indicators
Warning lists to inform users of MISP about potential false-positives or other information in indicators - GitHub - MISP/misp-warninglists: Warning lists to inform users of MISP about potential fal...

The purpose of these lists is to add context (a domain is in the top 1K/5K/1M domains, an IP address belongs to GCP, a hash is of EICAR, etc) to help in deciding if something is a false positive or not, not to list "good" or "bad" things.

Unfurl uses the various domain lists to annotate a domain (see below). Check out the link above to misp-warninglists for the full list of their lists (there are a lot).

One of those MISP "warninglists" is of domains used for link shortening. Unfurl already supported resolving some shortlinks, but it was a list I had manually pulled together and tested. Adding MISP's list to my own triples the number of shortlink domains Unfurl supports (from 27 to 81).

One other shortlink-related improvement was parsing LinkedIn "slinks", as Brian Krebs calls them:

How Phishers Are Slinking Their Links Into LinkedIn
If you received a link to LinkedIn.com via email, SMS or instant message, would you click it? Spammers, phishers and other ne’er-do-wells are hoping you will, because they’ve long taken advantage of a marketing feature on the business networking site…

Unfurl already resolved LinkedIn shortlinks with the format lnkd.in/xyz123. This involves extracting the shortcode (xyz123 in my fictitious example), creating the intermediary "slink" URL using that shortcode (https://www.linkedin.com/slink?code=xyz123), then finally determining the destination of that shortlink using the Location header. This Unfurl update adds the ability to expand "slinks" directly, in addition to the more typical lnkd.in shortlinks.

A LinkedIn "slink" mentioned in Krebs' article, parsed with Unfurl
A note on contacting external resources: For many different reasons, I wanted to ensure that Unfurl reached out to external domains as little as possible, but some external resources would be really useful in Unfurl (as in the case of expanding shortlinks). My "middle ground" was to allow Unfurl to contact an allowlist of link shortener services to get the "expanded" link, but not contact the destination. If this doesn't work for you and you'd rather Unfurl not reach out to any external sites, there is a setting to disable all remote lookups.

Recognize and Parse Twitter Image Filenames

Unfurl has parsed the Twitter Snowflakes in tweets since its inception, but I only recently learned that the names Twitter gives to uploaded images also contain a Snowflake! It's mentioned by Dr. Neal Krawetz on his blog way back in 2014 (!):

Name Dropping - The Hacker Factor Blog

It appears different than the Snowflakes used in tweets - it's base64-encoded rather than shown as a decimal (EqmR8DPVEAAd5mv vs 1344769819887865856) and has three extra bytes at the end (I haven't been able to determine their purpose yet). But like tweets, the timestamp embedded in the Snowflake is consistent with when the object (tweet or image) was created - which in the case of images means the time it was uploaded to Twitter.

If we encounter one of these images elsewhere still with the name Twitter gave it, we have some hints about it: that it came from Twitter and when it was uploaded to Twitter. The odds of an image having a name that can be properly decoded as a Twitter Snowflake, with a reasonable embedded timestamp, and not being from Twitter is vanishingly small (unless it was deliberately renamed by someone).

In this example below, I saved an image from a tweet, then uploaded it to my site (without renaming it). Unfurl indicates that the image might have originally come from Twitter and shows the upload timestamp from the Snowflake.

Lastly, this update adds the ability for Unfurl to parse a Brave Search URL. It's relatively basic, at least compared to the Google Search parser (which is massive), but I think it's a good start.

Brave Search URL parsed with Unfurl

Get it!

Those are the major items in this Unfurl release. There are more changes that didn't make it into the blog post; check out the release notes for more. To get Unfurl with these latest updates, you can:

All features work in both the web UI and command line versions (unfurl_app.py & unfurl_cli.py).