The Meaning of LEAK Records

I’ve been pretty quiet lately, largely due to spending time developing LibForensics.  Currently I’m adding support to read Microsoft Windows Internet cache containers (a.k.a. index.dat files).  If you’ve ever dealt with index.dat files before, you’ve probably encountered the mysterious “LEAK” record.  The purpose of this blog post is to explain one way that these records are created.

Background Information

In order to understand how LEAK records are created, it is useful to understand the Microsoft Windows Internet API.  The Microsoft Windows Internet API (WinInet) provides applications with the ability to interact with networked resources, usually over FTP and HTTP.  There are several functions in the WinInet API, including functions to provide caching.  Applications can use the WinInet API caching functions to store local (temporary) copies of files retrieved from the network.  The primary reason to use caching is to speed up future network requests, reading a local copy of a file, instead of from the network.

A cached file is called a “Temporary Internet File” (TIF).  The WinInet API manages TIFs using a cache container, which are files named index.dat.  There are several WinInet API functions to work with entries in cache containers, including creating a URL cache entry, reading locally cached files, and deleting URL cache entries.  The WinInet API also provides a cache scavenger, which periodically runs and cleans up entries that are marked for deletion.

The cache containers (index.dat files) are almost always associated with Microsoft Internet Explorer.  This is likely because Internet Explorer is one of the most commonly used applications that uses the WinInet API caching capabilities.  However, since the WinInet API is available to any end-user application, any application can use the caching capabilities.  This can pose an issue when attributing a specific entry in the cache container, to the program which generated the entry.

Internally a cache container is composed of a header, followed by one or more records.  There are several different types of records, including URL records (which describe cached URLs), and REDR records (for describing redirects).  A cached URL can have an associated TIF, which is described in the appropriate URL record.

LEAK Records

Now that we’ve reviewed index.dat files, we’ll see how to create LEAK records.  However before going further I want to emphasize that this is just one approach to creating LEAK records.  LEAK records may have uses outside of what is described in this post.

For the impatient: A LEAK record can be generated by attempting to delete a URL cache entry (via DeleteUrlCacheEntry) when the associated temporary internet file (TIF) can not be deleted.

The last paragraph of the MSDN documentation on the cache scavenger, discusses what happens when a cache entry is marked for deletion:

The cache scavenger is shared by multiple processes. When one application deletes a cache entry from its process space by calling DeleteUrlCacheEntry, it is normally deleted on the next cycle of the scavenger. However, when the item that is marked for deletion is in use by another process, the cache entry is marked for deletion by the scavenger, but not deleted until the second process releases it.

To summarize, when the cache scavenger runs and it encounters an item that is marked for deletion, but the item in use by another process, then the cache entry is not actually deleted.

Another reference to LEAK records can be found at Understanding index.dat Files.  The author describes LEAK as a “Microsoft term for an error”.

Combining these two ideas (deleting a cache entry when it is in use, and LEAK as a term for error), we can come up with a theory: a LEAK record is generated when an error occurs during the deletion of a url cache entry.  If you’ve ever taken a SANS Security 508 course (Computer Forensics, Investigation, and Response) from me, you’ll probably remember my approach to examinations (and investigations in general): theory (hypothesis) and test.

In order to test the theory, we need to create a series of statements and associated outcomes, that would be true if our theory is correct.

At this stage our theory is fairly generic.  To make the theory testable, we need to make it more specific.  This means we will need to determine a series of actions that will result in the generation of a LEAK record.  The first place to look is at the MSDN documentation on the WinInet API.  To save time, rather than walking through all the WinInet API functions, I’ll just reference the relevant ones:

Looking at this list, there are a few possible ways to generate an error while deleting a URL cache entry:

  1. Create/Commit a URL cache entry, and lock the entry using RetrieveUrlCacheEntryStream.
  2. Create/Commit a URL cache entry and corresponding TIF, and open the TIF.
  3. Create/Commit a URL cache entry and corresponding TIF, and make the TIF read-only.

The general approach is to create (and commit) a URL cache entry, then create a condition that would make deleting the entry fail.

Let’s solidify these into testable theories as “if-then” statements (logical implications) with function calls:

  • IF we create a URL cache entry using CreateUrlCacheEntry and CommitUrlCacheEntry, lock the entry using RetrieveUrlCacheEntryStream, and call DeleteUrlCacheEntry
    • THEN we will see a LEAK record.
  • IF we create a URL cache entry and corresponding TIF using CreateUrlCacheEntry and CommitUrlCacheEntry, open the TIF using open(), and call DeleteUrlCacheEntry
    • THEN we will see a LEAK record.
  • IF we create a URL cache entry and corresponding TIF using CreateUrlCacheEntry and CommitUrlCacheEntry, make the TIF read-only using chmod, and call DeleteUrlCacheEntry
    • THEN we will see a LEAK record.

Theory Testing

The next step is to test our theories.  It is relatively straight forward to translate the if-then statements into code.  In the “Sample Code” section I’ve included a link to a zip file that contains (amongst other things) three Python files, test_leak1.py, test_leak2.py, and test_leak3.py.  Each file implements one of the if-then statements.

Here is the output from running test_leak1.py (in a Windows 2003 virtual machine):

C:ToolsPython31>python z:inet_cachetest_leak1.py
Creating URL: http://rand_286715790
Using file: b'C:\Documents and Settings\Administrator\Local Settings\Temporary Internet Files\Content.IE5\81QNCLMB\CAUJ6C3U'
Locking URL: http://rand_286715790
Deleting URL: http://rand_286715790
ERROR: DeleteUrlCacheEntryA failed with error 0x20: The process cannot access the file because it is being used by another process.

The output from test_leak1.py indicates that there was an error during the call to DeleteUrlCacheEntry.  After copying the associated index.dat file to a Linux system, we can find a reference to http://rand_286715790:

xxd -g 1 -u index.dat.leak1
...
000ef00: 55 52 4C 20 02 00 00 00 00 00 00 00 00 00 00 00  URL ............
000ef10: 50 A1 F4 DB 08 32 CA 01 00 00 00 00 00 00 00 00  P....2..........
000ef20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000ef30: 60 00 00 00 68 00 00 00 02 00 10 10 80 00 00 00  `...h...........
000ef40: 01 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00  ..@.............
000ef50: 2A 3B B3 5A 02 00 00 00 01 00 00 00 2A 3B B3 5A  *;.Z........*;.Z
000ef60: 00 00 00 00 EF BE AD DE 68 74 74 70 3A 2F 2F 72  ........http://r
000ef70: 61 6E 64 5F 32 38 36 37 31 35 37 39 30 00 AD DE  and_286715790...
000ef80: 43 41 55 4A 36 43 33 55 00 BE AD DE EF BE AD DE  CAUJ6C3U........
...

The record is still marked as “URL “.  Further examination of the file shows no additional references to http://rand_286715790.  Here is the output from running test_leak2.py (in a Windows 2003 virtual machine):

C:ToolsPython31>python z:inet_cachetest_leak2.py
Creating URL: http://rand_3511348668
Opening file: b'C:\Documents and Settings\Administrator\Local Settings\Temporary Internet Files\Content.IE5\81QNCLMB\CAC23G8H'
Deleting URL: http://rand_3511348668

There was no clear indication that an error occurred.  After copying the index.dat file to a Linux system, we can find a reference to http://rand_3511348668:

xxd -g 1 -u index.dat.leak2
...
000ef00: 4C 45 41 4B 02 00 00 00 00 00 00 00 00 00 00 00  LEAK............
000ef10: 90 70 17 74 0C 32 CA 01 00 00 00 00 00 00 00 00  .p.t.2..........
000ef20: 00 04 00 00 00 00 00 00 00 00 00 00 00 E7 00 00  ................
000ef30: 60 00 00 00 68 00 00 00 02 00 10 10 80 00 00 00  `...h...........
000ef40: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000ef50: 2A 3B EB 5D 01 00 00 00 00 00 00 00 2A 3B EB 5D  *;.]........*;.]
000ef60: 00 00 00 00 EF BE AD DE 68 74 74 70 3A 2F 2F 72  ........http://r
000ef70: 61 6E 64 5F 33 35 31 31 33 34 38 36 36 38 00 DE  and_3511348668..
000ef80: 43 41 43 32 33 47 38 48 00 BE AD DE EF BE AD DE  CAC23G8H........
...

This time a LEAK record was created.  Further examination of the file shows no additional references to http://rand_3511348668.  Here is the output from running test_leak3.py (in a Windows 2003 virtual machine):

C:ToolsPython31>python z:inet_cachetest_leak3.py
Creating URL: http://rand_1150829499
chmod'ing file: b'C:\Documents and Settings\Administrator\Local Settings\Temporary Internet Files\Content.IE5\81QNCLMB\CAKB2RNB'
Deleting URL: http://rand_1150829499

Again, there was no clear indication that an error occurred.  After copying the index.dat file to a Linux system, we can find a reference to http://rand_1150829499:

xxd -g 1 -u index.dat.leak3
...
000ef00: 4C 45 41 4B 02 00 00 00 00 00 00 00 00 00 00 00  LEAK............
000ef10: 00 2B AF B5 0D 32 CA 01 00 00 00 00 00 00 00 00  .+...2..........
000ef20: 00 04 00 00 00 00 00 00 00 00 00 00 00 E7 00 00  ................
000ef30: 60 00 00 00 68 00 00 00 02 00 10 10 80 00 00 00  `...h...........
000ef40: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000ef50: 2A 3B 0A 5F 01 00 00 00 00 00 00 00 2A 3B 0A 5F  *;._........*;._
000ef60: 00 00 00 00 EF BE AD DE 68 74 74 70 3A 2F 2F 72  ........http://r
000ef70: 61 6E 64 5F 31 31 35 30 38 32 39 34 39 39 00 DE  and_1150829499..
000ef80: 43 41 4B 42 32 52 4E 42 00 BE AD DE EF BE AD DE  CAKB2RNB........
...

As with test_leak2.py, a LEAK record was generated. Further examination of the file shows no additional references to  http://rand_1150829499.

Given the results, we can assess the correctness of our theories.  Since test_leak1.py did not generate a LEAK record, while test_leak2.py and test_leak3.py did, we can narrow our original theory to TIFs.  Specifically that a LEAK record is generated when DeleteUrlCacheEntry is called, and the associated TIF (temporary internet file) can not be deleted.

It is also prudent to note that we only ran the tests once.  In all three tests it is possible that there are other (unknown) variables that we did not account for, and in the latter two tests the unknown variables just happened to work in our favor.  To strengthen the theory that LEAK records occur when a TIF can not be deleted, we could run the tests multiple times, as well as attempt other methods to make the TIF file “undeleteable”.

Sample Code

The file test_leak.zip contains code used to implement the testing of theories in this blog post.  The files test_leak1.py, test_leak2.py, and test_leak3.py implement the tests, while inet_cache_lib.py, groups.py, entries.py, and __init__.py are library files used by the test files.  All of the code was designed to run on Python3.1 on Windows systems, and interfaces with the Windows Internet API via the ctypes module.  The code is licensed under the GPL v3.

To install the sample code, unzip the file test_leak.zip to a directory of your choosing.  You can download the sample code by clicking on the link test_leak.zip.