Solving Magnet Forensics CTF with Plaso, Timesketch, and Colab
The folks at Magnet Forensics had a conference recently, and as part of it they put together a digital forensics-themed Capture the Flag competition. I wasn't able to attend, but thankfully they released the CTF online a few days after the live competition ended.
It looked like a lot of fun and I wanted to take a crack at it using the open source tools we use/build here at Google: Plaso, Timesketch, and Colab/Python.
Forensics Preprocessing¶
I'm going to focus on how to find the answers to the CTF questions after all the processing has been done. I'll quickly summarize the processing steps I did to get to the state when I pick up my walkthrough.
I started off by processing the provided E01 image with a basic log2timeline command; nothing special added:
ryan:~$ log2timeline.py MUS2019-CTF.plaso MUS-CTF-19-DESKTOP-001.E01
Once that finished, I went to Timesketch, made a new sketch, and uploaded the MUS2019-CTF.plaso file I just made. The .plaso file is a database containing the results of my log2timeline run; Timesketch can read it and provide a nice, collaborative interface for reviewing and exploring that data.
Most of what I'm going to show you is done in Colab by accessing the Timesketch API in Python. You can do most of the steps in the Timesketch web interface directly, but I wanted to demonstrate how you can use Python, Colab, Timesketch, and Plaso together to work a case.
Timesketch & Colab Setup¶
First, you can run this notebook and play along instead of reading it here. The Timesketch GitHub has Colab (Timesketch and Colab) that walks through how to install, connect, and explore a Sketch using Colab. Please check it out if you want a more thorough explanation of the setup; I'm just going to show the commands you need to run to get it working:
# Install the TimeSketch API client if you don't have it
!pip install timesketch-api-client
# Import some things we'll need
from timesketch_api_client import client
import pandas as pd
pd.options.display.max_colwidth = 60
Connect to Timesketch¶
By default, this will connect to the public demo Timesketch server, which David Cowen has graciously allowed to host a copy of the Plaso timeline of the MUS2019-CTF. Thanks Dave!
#@title Client Information { run: "auto"}
SERVER = 'https://demo.timesketch.org' #@param {type: "string"}
USER = 'demo' #@param {type: "string"}
PASSWORD = 'demo' #@param {type: "string"}
ts_client = client.TimesketchApi(SERVER, USER, PASSWORD)
Now that we've connected to the Timesketch server, we need to select the Sketch that has the CTF timeline.
First we'll list the available sketches, then print their names:
sketches = ts_client.list_sketches()
for i, sketch in enumerate(sketches):
print('[{0:d}] {1:s}'.format(i, sketch.name))
Then we'll select the MUS2019-CTF sketch:
ctf = sketches[0]
Lastly, I'll briefly explain a few parameters of the explore function, which we'll use heavily when answering questions.
sketch_name.explore() is how we send queries to Timesketch and get results back. query_string, return_fields, and as_pandas are the main parameters I'll be using:
- query_string: This is the same as the query you'd enter if you were using the Timesketch web interface. It's also the default first parameter; I'll omit it in my queries below for brevity.
- return_fields: Here we specify what fields we want back from Timesketch. This is where we can get really specific using Colab and only get the things we're interested in (which varies depending on what data types we're expecting back).
- as_pandas: This just a boolean value which tells Timesketch to return a Pandas DataFrame, rather than a dictionary. We'll have this set to True in all our queries, since DataFrames are awesome!
Okay, enough setup. Let's get to answering questions!
Questions¶
I grouped the questions from the 'Basic - Desktop' section into three categories: NTFS, TeamViewer, and Registry.
NTFS Questions¶
This first set of questions relate to aspects of NTFS: MFT entries, sequence numbers, USN entries, and VSNs.
As a little refresher, the 64-bit file reference address (or number) is made up of the MFT entry (48 bits) and sequence (16 bits) numbers. We often see this represented as something like 1234-2, with 1234 being the MFT entry number and 2 being the sequence number. Plaso calls the MFT entry number the inode, since that's the more generic term that applies across file systems.
Q: What is the name of the file associated with MFT entry number 102698?¶
Since Plaso parses out the MFT entry (or as it calls it, inode) into its own field, let's do a query for all records with that value:
ts_results = ctf.explore(
'inode:102698',
return_fields='datetime,timestamp_desc,data_type,inode,filename',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','inode','filename']]
Multiple results, as is expected since Plaso creates multiple records for different types of timestamps, but they all point to the same filename: /Users/Administrator/Downloads/TeamViewer_Setup.exe
Q: What is the file name that represented MFT entry 60725 with a sequence number of 10?¶
The quick way to answer this is to just search for the MFT entry number (60725) and look for references to sequence number 10 in the message field:
ts_results = ctf.explore(
'60725',
return_fields='datetime,timestamp_desc,data_type,filename,message',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','filename','message']]
That's a bunch of rows, so let's filter it down by searching for messages that contain '60725-10':
ts_results[ts_results.message.str.contains('60725-10')]
That filename is really long and cut off; let's just select that field, then deduplicate using set():
set(ts_results[ts_results.message.str.contains('60725-10')].filename)
Another way to solve this is to query for the file reference number directly. That's not as easy as it sounds, since Plaso stores it in the hex form (I'm working on fixing that). We can work with that though!
Let's do the same query as above, but add the file_reference field:
ts_results = ctf.explore(
'60725',
return_fields='datetime,timestamp_desc,data_type,file_reference,filename,message',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','file_reference','filename','message']]
The file_reference value is not the format we want, since it's hard to tell what the sequence number is. We can convert it to a more useful form though:
# Drop any rows with NaN, since they aren't what we're looking for and will
# break the below function.
ts_results = ts_results.dropna()
pd.options.display.max_colwidth = 110
# Replace the file_reference hex value with the human-readable MFT-Seq version.
# This is basically what Plaso does to display the result in the 'message'
# string we searched for.
ts_results['file_reference'] = ts_results['file_reference'].map(
lambda x: '{0:d}-{1:d}'.format(int(x) & 0xffffffffffff, int(x) >> 48))
ts_results[['datetime','timestamp_desc','data_type','file_reference','filename']]
There. Now we have the file_reference number in an easier-to-read format, and the history of filenames that MFT entry 60725 has had! It's easy to look for the entry with a sequence number of 10 and get our answer.
Q: Which file name represents the USN record where the USN number is 546416480?¶
Like other questions, the quick, generic way to answer is to just search for the unique detail; in this case, search in Timesketch for '546416480'. I'll show the more targeted way below, but it's pretty simple:
ts_results = ctf.explore(
'update_sequence_number:546416480',
return_fields='datetime,timestamp_desc,data_type,update_sequence_number,filename',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','update_sequence_number','filename']]
Q: What is the MFT sequence number associated with the file "\Users\Administrator\Desktop\FTK_Imager_Lite_3.1.1\FTK Imager.exe"?¶
We'll handle this question like other ones involving the file reference address, except in this case we first need to find the MFT entry number (or inode) from the file name. Searching for the whole file path in Timesketch is problematic (slashes among other things), so let's search for the file name and then verify the path is right:
ts_results = ctf.explore(
'FTK Imager.exe',
return_fields='datetime,timestamp_desc,data_type,inode,message',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','inode','message']]
In the second row of the results, we can find the correct path we're looking for in the message and see that the corresponding inode is 99916. We could do another search, similar to how we answered other questions... or we could just look down a few rows for a USN entry that shows: "FTK Imager.exe File reference: 99916-4". There's the answer!
Q: What is the Volume Serial Number of the Desktop's OS volume?¶
I know the VSN can be found in multiple places, but the first one I thought of was as part of a Prefetch file, so let's do it that way.
I'll search for all 'volume creation' Prefetch records, since I don't really care about which particular one, beyond that it's from the OS drive.
ts_results = ctf.explore(
'data_type:"windows:volume:creation"',
return_fields='datetime,timestamp_desc,data_type,device_path,hostname,serial_number,message',
as_pandas=True)
pd.options.display.max_colwidth = 70
ts_results[['datetime','timestamp_desc','data_type','device_path','hostname','serial_number','message']]
You can see the VSN in a readable format at the end of the device_path or in the message string. I'm only seeing one value here, so we don't need to determine which drive was the OS one. If we did, I'd look for some system processes that need to run from the OS drive to get the right VSN.
That's good enough for the question, but let's also convert the serial_number field from an integer to the hex format the answer wants, just to be sure:
'{0:08X}'.format(3438183451)
TeamViewer Questions¶
The next group of questions involved TeamViewer, a common remote desktop program.
Q: Which user installed Team Viewer?¶
We can start searching very broadly, then focus in on anything that stands out. Let's just search everything we have for "TeamViewer":
ts_results = ctf.explore(
'TeamViewer',
return_fields='datetime,timestamp_desc,data_type,message',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','message']]
That returned a lot of results (600+). We could page through them all, but why not see if there are any interesting clusters first? That sounds like a job for a visualization!
You can do this multiple ways; I'll do it in Python in a second, but the explanation is a bit complicated. The easier way is to do the search in TImesketch, then go to Charts > Histogram:
And here's how you'd do something similar in Python:
ts_results = ts_results.set_index('datetime')
ts_results['2018':].message.resample('D').count().plot()
Okay, so from the graphs it looks like we have a good cluster at the end of February; let's look closer. I'll slice the results to only show after 2019-02-20:
ts_results = ctf.explore(
'TeamViewer',
return_fields='datetime,timestamp_desc,data_type,filename,message',
as_pandas=True)
ts_results = ts_results.set_index('datetime')
ts_results['2019-02-20':][['timestamp_desc','data_type','filename','message']]
So from this, in a short interval starting 2019-02-25T20:39, we can see:
- a Google search for "teamviewer"
- a visit in Chrome to teamviewer.com,
- then teamviewer.com/en-us/teamviewer-automatic-download/,
- and lastly a bunch of TeamViewer related files being created.
The web browser and files created were done under the Administrator account (per the path filename), so that's our answer.
Q: How Many Times¶
At least how many times did the teamviewer_desktop.exe run?
Prefetch is a great artifact for "how many times did something run"-type questions, so let's look for Prefetch execution entries for the program in question:
ts_results = ctf.explore(
'data_type:"windows:prefetch:execution" AND teamviewer_desktop.exe',
return_fields='datetime,timestamp_desc,data_type,executable,run_count,message',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','executable','run_count','message']]
Q: Execute Where¶
After looking at the TEAMVIEWER_DESKTOP.EXE prefetch file, which path was the executable in at the time of execution?
We did all the work for this question with the previous query (the answer is in the message string), but we can explicitly query for the path:
ts_results = ctf.explore(
'data_type:"windows:prefetch:execution" AND teamviewer_desktop.exe',
return_fields='datetime,timestamp_desc,data_type,executable,run_count,path',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','executable','run_count','path']]
Registry Questions¶
This last set of questions can be answered using the Windows Registry (and one from event logs).
Lots of registry questions depend on the Current Control Set, so let's verify what it is:
# Escaping fun: We need to escape the slashes in the key_path once for Timesketch and once for Python, so we'll have triple slashes (\\\)
ts_results = ctf.explore(
'data_type:"windows:registry:key_value" AND key_path:"HKEY_LOCAL_MACHINE\\\System\\\Select"',
return_fields='datetime,timestamp_desc,data_type,message',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','message']]
From the message, the Current control set is 1.
Q: What was the timezone offset at the time of imaging? and What is the timezone of the Desktop¶
I'm combining these, since the answer is in the same query:
ts_results = ctf.explore(
'data_type:"windows:registry:key_value" AND key_path:"HKEY_LOCAL_MACHINE\\\System\\\ControlSet001\\\Control\\\TimeZoneInformation"',
return_fields='datetime,timestamp_desc,data_type,message',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','message']]
The message is really long; let's pull it out:
set(ts_results.message)
The name of the Timezone is in the message string, as is the ActiveTimeBias, which we can use to get the UTC offset:
# The ActiveTimeBias is in minutes, so divide by -60 (I don't know why it's stored negative):
420 / -60
Q: When was the Windows OS installed?¶
Plaso actually parses this out as its own data_type, so querying for it is easy:
ts_results = ctf.explore(
'data_type:"windows:registry:installation"',
return_fields='datetime,timestamp_desc,data_type,message',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','message']]
Q: What is the IP address of the Desktop?¶
We already confirmed the Control Set is 001, so let's query for the registry key under that control set that holds the Interface information:
ts_results = ctf.explore(
'key_path:"System\\\ControlSet001\\\Services\\\Tcpip\\\Parameters\\\Interfaces"',
return_fields='datetime,timestamp_desc,data_type,message',
as_pandas=True)
ts_results[['datetime','timestamp_desc','data_type','message']]
There are a few entries, but only the last one has what we want. Reading through it (or using Ctrl+F) we can find the 'IPAddress' is 64.44.141.76.
set(ts_results.message)
Q: Which User Shutdown Windows on February 25th 2019?¶
Event logs seem like a good place to look for this answer, since a shutdown generates a 1074 event in the System event log. From the question, we have a fairly-narrow timeframe, so let's slice the results down to that after we do our query:
ts_results = ctf.explore(
'data_type:"windows:evtx:record" AND filename:"System.evtx" AND 1074',
return_fields='datetime,timestamp_desc,data_type,username,message',
as_pandas=True)
ts_results = ts_results.set_index('datetime')
ts_results['2019-02-25':'2019-02-26'][['timestamp_desc','data_type','username','message']]
Wrap Up¶
That's it! Thanks for reading and I hope you found this useful. This walkthrough covered most of the questions from the 'Basic - Desktop' category; I may do other sections as well if there is time/interest. If you found this useful, check out Kristinn's demonstration of Timesketch and Colab.
You can get the free, open source tools I used to solve the CTF:
- Plaso / Log2Timeline: https://github.com/log2timeline/plaso
- Timesketch: https://github.com/google/timesketch
- Colab(oratory): https://colab.sandbox.google.com/notebooks/welcome.ipynb