If Apple did computer forensics…

This is too funny…

“The writeblocker, iBlock, would only image at 1 mb/s and would have a non-replacable internal battery with a 12-month lifespan. When everyone who was going to buy one had done so, they’d release an iBlock ’s’ – this writes at a speed approaching the commercial standard but still has the battery problem. Apple dismiss this as a ‘false negativity point by uncreative people’ and sue anyone publicly criticising it.”

You can find the full post here.

Outlook PST (Personal Folder) File Format Now Available From Microsoft

Microsoft has decided to publish a copy of the Outlook Personal Folder File format (.PST file).  You can view the specification at: http://msdn.microsoft.com/en-us/library/ff385210.aspx

Site Updates

Recently I’ve been making some updates to this site.  Here is a brief list:

New Theme

If you’re looking at the site right now, you’ve probably noticed that the theme has changed.  I had been using Andreas Viklund’s 1024px for a few years, and decided it was time for something new.  I ended up drinking the kool-aid and went with the Thesis theme.  Thesis is a (very) configurable theme framework.  If you run a WordPress blog, you may want to check out Thesis.

Copyright Notice

The content on this site is now licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.  If you’re interested in using the content under a different license, please contact me.

Google Reader Shared Links

You can now see all of the items that I’ve shared in Google Reader, right from this blog.  To find the page, go to Resources -> Google Reader Shared Links at the top of the blog.  If you want to subscribe to the feed for the shared links, visit http://www.google.com/reader/shared/codeforensics.

Teaching Schedule

I regularly teach the Security 508 (Computer Forensics, Investigation, and Response) , Security 408 (Computer Forensic Essentials), Security 610 (Reverse Engineering Malware), and Security 553 (Metasploit for Penetration Testers) courses for SANS.  If you’re interested in taking a class from me, you can find my teaching schedule by clicking on About -> Teaching Schedule at the top of the blog.


I’ve been using Twitter for several months (and according to some, I’m a Twitter-addict :p).  You can follow (and contact) me on Twitter using @mikemurr.

Microsoft to Release the .PST File Format

@MicrosoftPress tweet’d this earlier today: ‘Paul Lorimer, Group Manager, MS Office Interoperability: “…we will be releasing documentation for the .pst file format.” http://ow.ly/wHqE‘.

It looks like the specification for the Outlook Personal Folder (.PST ) file format will be released under Microsoft’s OSP.  The original blog post is “Roadmap for Outlook Personal Folders (.pst) Documentation” (at the Microsoft Interoperability blog).

Since email can easily play a vital role during an investigation, releasing this specification can provide investigators, examiners, analysts, and digital forensic tools, with a better understanding of the evidence at hand.

Computer Forensic Exam of Najibullah Zazi’s Laptop

Earlier today, Jonathan Abolins tweeted about a US DOJ memorandum on detainee Najibullah Zazi.  The memorandum is about the motion the US government filed for a permanent order of detention for Zazi.  Part of the evidence that supports the order of detention, comes from a forensic exam of Zazi’s laptop.  I found a few pieces of evidence quite interesting from a digital forensics perspective.

  • Zazi is associated with three separate email accounts.  The memorandum states that one account is “directly subscribed to Zazi”, and “all three accounts contain slight variations of the same password.”
    • While not the best password policy, it could help with attribution.
  • JPEG images of handwritten notes about explosives (manufacture, handling, etc.) were found as email attachments.
    • Keyword searches would probably fail to find this evidence, since the notes are JPEG images.  Are there any digital forensics tools (or plugins/scripts) that support keyword searching of images? (perhaps by OCR?)
  • Browser artifacts were uncovered that suggested Zazi searched for hydrocholoric acid.  Additionally, a site for “Lab Safety for Hydrocholoric Acid” was bookmarked with two different web browsers.
    • The bookmarking could be useful in demonstrating intent, as users often bookmark sites they wish to remember, and/or return to.  The same bookmark in two different browsers makes this action less likely to be “accidental”.
  • Some of the browser artifacts suggested that Zazi “searched a beauty salon website for hydrocide and peroxide”.  Later, surveillance videos and receipts were used to show that Zazi purchased hydrogen peroxide products from a beauty supply store.  Other persons associated with Zazi, also purchased hydrogen and acetone, from three other beauty supply stores.
    • Digital evidence is just one type of evidence.  Here digital evidence (browser artifacts) is combined with physical evidence (surveillance video and receipts), to make the arguments more persuasive.
  • After executing another search warrant (at a later date), Zazi’s laptop was seized again.  The difference is that in the latter seizure, the hard drive was not recovered (it had been removed).
    • This could be considered a rudimentary form of anti-forensics.  You can’t analyze ones and zeros if they aren’t there.

You can view the memorandum here.

The Meaning of LEAK Records

I’ve been pretty quiet lately, largely due to spending time developing LibForensics.  Currently I’m adding support to read Microsoft Windows Internet cache containers (a.k.a. index.dat files).  If you’ve ever dealt with index.dat files before, you’ve probably encountered the mysterious “LEAK” record.  The purpose of this blog post is to explain one way that these records are created.

Background Information

In order to understand how LEAK records are created, it is useful to understand the Microsoft Windows Internet API.  The Microsoft Windows Internet API (WinInet) provides applications with the ability to interact with networked resources, usually over FTP and HTTP.  There are several functions in the WinInet API, including functions to provide caching.  Applications can use the WinInet API caching functions to store local (temporary) copies of files retrieved from the network.  The primary reason to use caching is to speed up future network requests, reading a local copy of a file, instead of from the network.

A cached file is called a “Temporary Internet File” (TIF).  The WinInet API manages TIFs using a cache container, which are files named index.dat.  There are several WinInet API functions to work with entries in cache containers, including creating a URL cache entry, reading locally cached files, and deleting URL cache entries.  The WinInet API also provides a cache scavenger, which periodically runs and cleans up entries that are marked for deletion.

The cache containers (index.dat files) are almost always associated with Microsoft Internet Explorer.  This is likely because Internet Explorer is one of the most commonly used applications that uses the WinInet API caching capabilities.  However, since the WinInet API is available to any end-user application, any application can use the caching capabilities.  This can pose an issue when attributing a specific entry in the cache container, to the program which generated the entry.

Internally a cache container is composed of a header, followed by one or more records.  There are several different types of records, including URL records (which describe cached URLs), and REDR records (for describing redirects).  A cached URL can have an associated TIF, which is described in the appropriate URL record.

LEAK Records

Now that we’ve reviewed index.dat files, we’ll see how to create LEAK records.  However before going further I want to emphasize that this is just one approach to creating LEAK records.  LEAK records may have uses outside of what is described in this post.

For the impatient: A LEAK record can be generated by attempting to delete a URL cache entry (via DeleteUrlCacheEntry) when the associated temporary internet file (TIF) can not be deleted.

The last paragraph of the MSDN documentation on the cache scavenger, discusses what happens when a cache entry is marked for deletion:

The cache scavenger is shared by multiple processes. When one application deletes a cache entry from its process space by calling DeleteUrlCacheEntry, it is normally deleted on the next cycle of the scavenger. However, when the item that is marked for deletion is in use by another process, the cache entry is marked for deletion by the scavenger, but not deleted until the second process releases it.

To summarize, when the cache scavenger runs and it encounters an item that is marked for deletion, but the item in use by another process, then the cache entry is not actually deleted.

Another reference to LEAK records can be found at Understanding index.dat Files.  The author describes LEAK as a “Microsoft term for an error”.

Combining these two ideas (deleting a cache entry when it is in use, and LEAK as a term for error), we can come up with a theory: a LEAK record is generated when an error occurs during the deletion of a url cache entry.  If you’ve ever taken a SANS Security 508 course (Computer Forensics, Investigation, and Response) from me, you’ll probably remember my approach to examinations (and investigations in general): theory (hypothesis) and test.

In order to test the theory, we need to create a series of statements and associated outcomes, that would be true if our theory is correct.

At this stage our theory is fairly generic.  To make the theory testable, we need to make it more specific.  This means we will need to determine a series of actions that will result in the generation of a LEAK record.  The first place to look is at the MSDN documentation on the WinInet API.  To save time, rather than walking through all the WinInet API functions, I’ll just reference the relevant ones:

Looking at this list, there are a few possible ways to generate an error while deleting a URL cache entry:

  1. Create/Commit a URL cache entry, and lock the entry using RetrieveUrlCacheEntryStream.
  2. Create/Commit a URL cache entry and corresponding TIF, and open the TIF.
  3. Create/Commit a URL cache entry and corresponding TIF, and make the TIF read-only.

The general approach is to create (and commit) a URL cache entry, then create a condition that would make deleting the entry fail.

Let’s solidify these into testable theories as “if-then” statements (logical implications) with function calls:

  • IF we create a URL cache entry using CreateUrlCacheEntry and CommitUrlCacheEntry, lock the entry using RetrieveUrlCacheEntryStream, and call DeleteUrlCacheEntry
    • THEN we will see a LEAK record.
  • IF we create a URL cache entry and corresponding TIF using CreateUrlCacheEntry and CommitUrlCacheEntry, open the TIF using open(), and call DeleteUrlCacheEntry
    • THEN we will see a LEAK record.
  • IF we create a URL cache entry and corresponding TIF using CreateUrlCacheEntry and CommitUrlCacheEntry, make the TIF read-only using chmod, and call DeleteUrlCacheEntry
    • THEN we will see a LEAK record.

Theory Testing

The next step is to test our theories.  It is relatively straight forward to translate the if-then statements into code.  In the “Sample Code” section I’ve included a link to a zip file that contains (amongst other things) three Python files, test_leak1.py, test_leak2.py, and test_leak3.py.  Each file implements one of the if-then statements.

Here is the output from running test_leak1.py (in a Windows 2003 virtual machine):

C:ToolsPython31>python z:inet_cachetest_leak1.py
Creating URL: http://rand_286715790
Using file: b'C:\Documents and Settings\Administrator\Local Settings\Temporary Internet Files\Content.IE5\81QNCLMB\CAUJ6C3U'
Locking URL: http://rand_286715790
Deleting URL: http://rand_286715790
ERROR: DeleteUrlCacheEntryA failed with error 0x20: The process cannot access the file because it is being used by another process.

The output from test_leak1.py indicates that there was an error during the call to DeleteUrlCacheEntry.  After copying the associated index.dat file to a Linux system, we can find a reference to http://rand_286715790:

xxd -g 1 -u index.dat.leak1
000ef00: 55 52 4C 20 02 00 00 00 00 00 00 00 00 00 00 00  URL ............
000ef10: 50 A1 F4 DB 08 32 CA 01 00 00 00 00 00 00 00 00  P....2..........
000ef20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000ef30: 60 00 00 00 68 00 00 00 02 00 10 10 80 00 00 00  `...h...........
000ef40: 01 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00  ..@.............
000ef50: 2A 3B B3 5A 02 00 00 00 01 00 00 00 2A 3B B3 5A  *;.Z........*;.Z
000ef60: 00 00 00 00 EF BE AD DE 68 74 74 70 3A 2F 2F 72  ........http://r
000ef70: 61 6E 64 5F 32 38 36 37 31 35 37 39 30 00 AD DE  and_286715790...
000ef80: 43 41 55 4A 36 43 33 55 00 BE AD DE EF BE AD DE  CAUJ6C3U........

The record is still marked as “URL “.  Further examination of the file shows no additional references to http://rand_286715790.  Here is the output from running test_leak2.py (in a Windows 2003 virtual machine):

C:ToolsPython31>python z:inet_cachetest_leak2.py
Creating URL: http://rand_3511348668
Opening file: b'C:\Documents and Settings\Administrator\Local Settings\Temporary Internet Files\Content.IE5\81QNCLMB\CAC23G8H'
Deleting URL: http://rand_3511348668

There was no clear indication that an error occurred.  After copying the index.dat file to a Linux system, we can find a reference to http://rand_3511348668:

xxd -g 1 -u index.dat.leak2
000ef00: 4C 45 41 4B 02 00 00 00 00 00 00 00 00 00 00 00  LEAK............
000ef10: 90 70 17 74 0C 32 CA 01 00 00 00 00 00 00 00 00  .p.t.2..........
000ef20: 00 04 00 00 00 00 00 00 00 00 00 00 00 E7 00 00  ................
000ef30: 60 00 00 00 68 00 00 00 02 00 10 10 80 00 00 00  `...h...........
000ef40: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000ef50: 2A 3B EB 5D 01 00 00 00 00 00 00 00 2A 3B EB 5D  *;.]........*;.]
000ef60: 00 00 00 00 EF BE AD DE 68 74 74 70 3A 2F 2F 72  ........http://r
000ef70: 61 6E 64 5F 33 35 31 31 33 34 38 36 36 38 00 DE  and_3511348668..
000ef80: 43 41 43 32 33 47 38 48 00 BE AD DE EF BE AD DE  CAC23G8H........

This time a LEAK record was created.  Further examination of the file shows no additional references to http://rand_3511348668.  Here is the output from running test_leak3.py (in a Windows 2003 virtual machine):

C:ToolsPython31>python z:inet_cachetest_leak3.py
Creating URL: http://rand_1150829499
chmod'ing file: b'C:\Documents and Settings\Administrator\Local Settings\Temporary Internet Files\Content.IE5\81QNCLMB\CAKB2RNB'
Deleting URL: http://rand_1150829499

Again, there was no clear indication that an error occurred.  After copying the index.dat file to a Linux system, we can find a reference to http://rand_1150829499:

xxd -g 1 -u index.dat.leak3
000ef00: 4C 45 41 4B 02 00 00 00 00 00 00 00 00 00 00 00  LEAK............
000ef10: 00 2B AF B5 0D 32 CA 01 00 00 00 00 00 00 00 00  .+...2..........
000ef20: 00 04 00 00 00 00 00 00 00 00 00 00 00 E7 00 00  ................
000ef30: 60 00 00 00 68 00 00 00 02 00 10 10 80 00 00 00  `...h...........
000ef40: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000ef50: 2A 3B 0A 5F 01 00 00 00 00 00 00 00 2A 3B 0A 5F  *;._........*;._
000ef60: 00 00 00 00 EF BE AD DE 68 74 74 70 3A 2F 2F 72  ........http://r
000ef70: 61 6E 64 5F 31 31 35 30 38 32 39 34 39 39 00 DE  and_1150829499..
000ef80: 43 41 4B 42 32 52 4E 42 00 BE AD DE EF BE AD DE  CAKB2RNB........

As with test_leak2.py, a LEAK record was generated. Further examination of the file shows no additional references to  http://rand_1150829499.

Given the results, we can assess the correctness of our theories.  Since test_leak1.py did not generate a LEAK record, while test_leak2.py and test_leak3.py did, we can narrow our original theory to TIFs.  Specifically that a LEAK record is generated when DeleteUrlCacheEntry is called, and the associated TIF (temporary internet file) can not be deleted.

It is also prudent to note that we only ran the tests once.  In all three tests it is possible that there are other (unknown) variables that we did not account for, and in the latter two tests the unknown variables just happened to work in our favor.  To strengthen the theory that LEAK records occur when a TIF can not be deleted, we could run the tests multiple times, as well as attempt other methods to make the TIF file “undeleteable”.

Sample Code

The file test_leak.zip contains code used to implement the testing of theories in this blog post.  The files test_leak1.py, test_leak2.py, and test_leak3.py implement the tests, while inet_cache_lib.py, groups.py, entries.py, and __init__.py are library files used by the test files.  All of the code was designed to run on Python3.1 on Windows systems, and interfaces with the Windows Internet API via the ctypes module.  The code is licensed under the GPL v3.

To install the sample code, unzip the file test_leak.zip to a directory of your choosing.  You can download the sample code by clicking on the link test_leak.zip.

The Single Piece of Evidence (SPoE) Myth

Often a crime-drama television show will have a “single piece of evidence”, which explains the entire crime, and is used to get a guilty conviction. In real life very rarely does this situation arise. Instead typical investigations will uncover many pieces of evidence that are used during trial. Some of the evidence found during an investigation will be more persuasive to a jury, some will be less persuasive. However, it’s uncommon (and perhaps foolish) for a prosecutor to proceed to court with a single piece of evidence. What is somewhat more common, is for a prosecutor to proceed to court with multiple pieces of evidence, with perhaps one or two that are likely to be very persuasive.

One topic where the SPoE myth is often used is anti-forensics. Simply, anti-forensics is anything that a suspect does to hinder a forensic examination. Many of the sources of information that are used during an investigation (e.g. file system time stamps) can be easily modified. When a new anti-forensic technique has been discovered, there is sometimes a tendency to see the technique as a “silver bullet” which can halt an entire investigation.

The truth is, a single action (e.g. logging in, compiling a program, reading email, etc.) can impact many different aspects of the operating system, especially on a Windows system. Compromising the integrity of a “single piece of evidence” (e.g. the last accessed file system time stamp) is rarely fatal. This is because there are typically a number of places to look to find evidence to support (or deny) some theory.  Removing one piece of evidence may make an argument weaker (or stronger), but rarely does it invalidate the entire argument.

Sometimes the answers are enough, sometimes they’re not

When you watch someone who is new to investigations work a case, one thing that often needs to be explained is the idea that the “smoking gun”, by itself, often isn’t enough. What do I mean by this? Well, Not only am I interested in what you found (which is important in it’s own right) but also by how you found it.

Take for example, a case where relevant evidence is found in unallocated space. Perhaps the suspect deleted a file that contained relevant evidence. Assume that file system metadata information, that kept track of which clusters (or blocks for EXT2/3) were assigned to the file, and in which order, was over written. This means that you’ll have to use a data searching technique (e.g. signature finding, guess and check, etc.) to locate the relevant information. There are a number of different techniques that could be used to arrive at your conclusions. The path you took, may very well come under scrutiny, to verify the soundness of your logic. In this scenario, not only is the “smoking gun” evidence important, but how you found the evidence (and knew how to “properly” interpret it) is also important.

There are times however, when simply “finding the answer” is good enough. One example that came up today was about passwords for encrypted files. Assume you’re conducting an examination of a system, and come across an encrypted file. For whatever reason, the suspect is unavailable. Now assume that you have an image of physical memory, (i.e. RAM) and are able to use a tool such as the Volatility Framework or Memparser to analyze the image. During your analysis you find what you believe to be the password to the encrypted file. You can test your hypothesis by simply attempting to decrypt the file. If you are correct, the file will decrypt properly. In this case, the fact that the password worked, would likely be good enough. You would still need to properly document your actions, however they would likely be less important than the outcome.

The admissibility vs. weight of digital evidence

There is always a lot of conversation about when digital evidence is and is not admissible. Questions like “are proxy logs admissible?” and “what tools generate admissible evidence?” are focused on the concept of evidence admissibility. Some of the responses to these questions are correct, and some not really correct. I think the underlying issues (at least from what I’ve observed) with the incorrect answers stems from a confusion of two similar yet distinct legal concepts: evidence admissibility and the weight of evidence.

Caveats and Disclaimers

Before we begin this discussion, I want you to be aware of the following items:

  • I am not a lawyer
  • This is not legal advice
  • Always consult with your legal counsel for legal advice
  • The legal concepts discussed in this blog post are specific to the United States. Other jurisdictions are likely to have similar concepts.
  • Every court case (civil, criminal and otherwise) is decided on a case-by-case basis. This means what is true for one case may not be true for another.

Essentially, evidence admissibility refers to the requirements for evidence to be entered into a court case. The weight of evidence however refers to how likely the evidence is to persuade a person (e.g. judge or jury) towards (or against) a given theory.

In the legal system, before evidence can be presented for persuasive use, it must be admitted by the court. If one side or the other raises an objection to the evidence being admitted, a judge will typically listen to arguments from both sides, and come to a decision about whether or not to admit the evidence. The judge will likely consider things like admissibility requirements (listed below), prejudicial effects, etc.

When it comes to court (and I’m going to focus on criminal court) the rules for what is and what is not admissible vary. There are however three common elements:

  1. Authenticity
  2. Relevancy
  3. Reliability

Briefly, authenticity refers to whether or not the evidence is authentic, or “is what it is purported to be.” For example, is the hard drive being entered into evidence as the “suspect drive” actually the drive that was seized from the suspect system? Relevancy refers to whether or not the evidence relates to some issue at hand. Finally, reliability refers to whether or not the evidence meets some “minimum standard of trustworthiness”. Reliability is where concepts such as Daubert/Frye, repeatable and consistent methodology, etc. are used. The oft quoted “beyond a reasonable doubt” is used as a bar for determining guilt or innocence, not evidence admissibility.

These requirements apply equally well to all types of evidence, including digital evidence. In fact, there are no extra “hoops” that digital evidence has to cross through for admissibility purposes. You’ll also notice things like chain of custody, MD5 hashes, etc. aren’t on the list. For a simple reason, they aren’t strict legal requirements for evidence admissibility purposes. Devices such as a chain of custody, MD5 hashes, etc. are common examples of how to help meet various admissibility requirements, or how to help strengthen the weight of the evidence, but in and of themselves are not strictly required by legal statute.

There are “myths” surrounding evidence admissibility that are common to digital forensics. I’ll focus on the two most common (that I’ve seen):

  1. Digital evidence is easy to modify and can’t be used in court
  2. Only certain types of tools generate admissible evidence

The first myth focuses around the idea that digital evidence is often easy to modify (either accidentally or intentionally.) This really focuses on the reliability requirement of evidence admissibility. The short answer is that digital evidence is admissible. In fact, unless there is specific support to a claim of alteration (e.g. discrepancies in a log file) the opposing side can not even raise this possibility (at least for admissibility purposes.) Even if there are discrepancies, the evidence is likely to still be admitted, with the discrepancies going towards the weight of the evidence rather than admissibility. The exception to this might be if the discrepancies/alterations were so egregious as to undermine a “minimum standard of trustworthiness.”

The second myth is commonly found in the form of the question “What tools are accepted by the courts?” I think a fair number of people really mean “What tools generate results that are admissible in court?” Realize that in this case, “results” would be considered evidence. This scenario is somewhat analogous to a criminalist photographing a physical crime scene and asking the question “What cameras are accepted by the courts?” As long as the camera records an accurate representation of the subject of the photograph, the results should be admissible. This would be some “minimum standard of trustworthiness”. To contrast this to weight, realize that different cameras record photographs differently. A 3 megapixel camera will have different results than a 1 megapixel camera. An attorney could argue about issues surrounding resolution, different algorithms, etc. but this would all go to the weight (persuasive factor) of the evidence, not the admissibility.

Hopefully this clarifies some of the confusion surrounding evidence admissibility. I’d love to hear other people’s comments and thoughts about this, including any additional questions.

CitySec meetup in Los Angeles

For those of you who haven’t already seen CitySec, it’s worth stopping by.  CitySec.org is a site created by Thomas Ptacek (from Matasano Chargen) to facilitate gatherings of information security professionals.  The tone of the meetings appears to be quite relaxed, to quote “What is a CitySect Meetup?“:

The rule of thumb is, no more structure than is absolutely necessary to get people into a room (where “room” usually means “bar”): if structure (like “name tags” or “surveys”) would even possibly prevent one person from attending the meeting, don’t use it.

For those of us in the greater Los Angeles area, there is a CitySec meetup (LASec) scheduled for 8PM on June 7th at the Westwood Brewing Co (near UCLA).  Here’s a link to the address on Google Maps.  Infosec and beer, a great combination 🙂