Windows Pagefile Collection and Analysis for a Live Forensics Context
Seokhee Lee† , Antonio Savoldi‡∗, Sangjin Lee† and Jongin Lim†
†
Center for Information Security Technologies, Korea University, Seoul, Korea
‡
Department for Electronics for Automation, University of Brescia, Via Branze 38, Brescia, Italy
†
{gosky7,sangjin,jilim}@korea.ac.kr, ‡ antonio.savoldi@ing.unibs.it
Abstract which does not permit to have an unmodified and, conse-
quently, integral version of the volatile memory. The latter
The aim of this paper is to present a new tool, the Page- methods require a special hardware which is able to retrieve
file Collection Tool (PCT), which can be used to obtain a the volatile memory from the system without introducing
pagefile on a live Windows based system. It is a known fact any new code nor relying on potentially untrusted code to
that a pagefile on a live system is protected by the operat- perform the extraction. Clearly, in this case, the hardware
ing system, which uses it in the virtual memory context. By needs to be installed prior to the incident [3]. Additionally,
using the NTFS filesystem specifications we were able to re- by probing the system from outside with a special dedicated
construct the full pagefile, which can be used by a forensics hardware, according to some authors [14], it is possible to
expert to carve out further and precious information in the have some leakage of integrity of the main memory. In-
memory analysis field. deed, there is still an ongoing research problem, which is
the determination of a precise set of conditions able to de-
fine an upper bound to the leakage of integrity involved in
the volatile RAM collection phase.
1 Introduction
There have been interesting contributions related to the
According to [6], volatile system memory analysis is still field of volatile memory analysis in a live context. Accord-
at an infancy stage compared with other branches of digital ing to [6], in order to recover as many memory pages as
forensics. It aims at gathering information from the contents possible, it would be valuable to have access to the page-
of a computer’s volatile memory with the purpose of finding file for a Windows based system. Moreover, in order to use
which processes were running, when they were started and the pagefile effectively, it should be acquired at the same
by whom, what specific activities those processes were do- time of memory acquisition to enhance the reconstruction
ing and the state of active network connections. Thus, sys- of the processes that were in the main memory. Collecting
tem memory can provide a great deal of information about the pagefile on a live system is cumbersome since Windows
the system’s runtime state at the time of, or just after, an operating system has the complete control and protection of
incident. Furthermore, it is quite interesting to observe that it. Thus, it would be useful for an examiner to have a tool
recent advances in real-world attacks have shown a trend capable of copying a paging file by parsing the raw filesys-
towards memory-only modification whenever possible. As tem. The Pagefile Collection Tool is aimed at extracting the
a result, traditional post-mortem analysis techniques are in- full pagefile from a live Windows based system, with the
efficient to find out the existence of intruders [11]. purpose to enhance and facilitate a forensic analysis. This
To address this issue, a number of techniques have been tool is considered as an extension of our previous work [7].
proposed to aid with the collection of volatile data and,
The remaining part of the paper is organized as follows.
among these, we have software and hardware approaches.
Firstly, we will present the main algorithm of the created
The former methods suffer from the so-called Lockard ex-
tool which is able to extract the pagefile from a live Win-
change principle [4], which means that it is impossible to
dows computer system. Secondly, useful and fundamen-
take a forensically sound snapshot of the memory system
tal information extracted from a pagefile will be analysed,
being observed without altering it. This results from consid-
mentioning also the potential problem of leakage of sensi-
ering the presence of a reading process in the main memory,
tive information. Finally, privacy leakage examples will be
∗ Corresponding author: Antonio Savoldi. Tel. +39-030-3715511. provided together with a brief overview on how to collect
Email: antonio.savoldi@ing.unibs.it the hybernation file.
1
2 Pagefile Collection to store the file name and the parent directory information.
Second, it is used in a directory index and this permits to
A pagefile is considerable as a “blurb” of memory pages find a pagefile entry from the various MFT entries by com-
belonging to various processes and it is a strict part of vir- paring the file name with information in the $FILE NAME
tual memory system [9] [15]. In order to create this pagefile attribute. Every MFT entry has also a $DATA attribute,
collection tool, an in-depth awareness of the NTFS file sys- which contains the file content. If the content is over 700
tem is required. One of the most important concepts related bytes in size, it becomes non-resident and is saved in exter-
to the design of NTFS is that metadata are allocated to files. nal clusters [1].
The Master File Table (MFT) is the heart of NTFS because Certainly, a pagefile is over 700 bytes in size, so its
it contains information about all files and directories. Ev- $DATA attribute will be non-resident. Figure 2 represents a
ery file and directory has at least one entry in the table, and detailed view of this attribute, useful to understand how the
the entries by themselves are very simple. They are 1KB content of a file is stored in the hard-disk. The first byte of
in size, but only the first 42 bytes have a defined purpose. the data structure is organized in 4 upper bits and in 4 lower
Figure 1 describes the scheme of the NTFS file system [1]. bits. The four least significant bits contain the number of
bytes in the run length field, which follows the header byte.
Pointing the start Pointing the start Entire Hard Disk The four most significant bits contain the number of bytes
Cluster of MFT Cluster of Pagefile
in the run offset field, which follows the length field. The
Boot MFT Pagefile run offset is the start cluster address of the file and the run
length is the cluster length of the file. As already men-
MFT Entry of Pagefile tioned, if the $DATA attribute is non-resident, all the file
MFT Entry 1 MFT Entry
Attribute Attribute Attribute Unused space
content will be distributed among different clusters which
Header
MFT Entry 2 MFT Entry
Header Attribute Attribute Attribute Unused space are recorded with different runs. Figure 2 represents a so-
MFT Entry 3
MFT Entry
Header Attribute Attribute Attribute Unused space called sparse layout with sparse run related to three clusters.
MFT Entry
MFT Entry 4 Header Attribute Attribute Attribute Unused space
: :
MFT Entry N MFT Entry
Attribute Attribute Attribute Unused space
Header
48 49 50 51 52
1 Start:48 Len:5
2 Start:80 Len:2 56 57 58
Figure 1. NTFS Master File Table. 3 Start:56 Len:3
80 81
The first entry in the table is named $MFT, and it de-
scribes the on-disk location of the MFT. The starting loca-
tion of the MFT is given in the boot sector, which is al-
Figure 2. Run List.
ways located in the first sector of the partition. A MFT
entry has many attributes within, which are data structures
that store a specific type of data. Every attribute is com- According to this scheme, we have read the values of the
posed by a header, which contains the meta-information, sectors of the hard disk. Thus, we found where the pagefile
and of content, which can have any format and any size. was located on the physical hard-disk. Finally, we were able
NTFS provides two locations where attributes can be stored. to collect pagefile directly from the hard disk, even while
A resident attribute stores the content in the MFT entry the system was running. In figure 3, we can see an explana-
with the attribute header and this works only for small tory diagram regarding the implemented tool.
size attributes. A non-resident attribute stores its con- In order to properly figure out how the program works,
tent in an external cluster in the filesystem. If an at- we can describe the main building blocks step by step. Ini-
tribute is resident, the content will immediately follow the tially, the tool reads the boot sector and then finds the initial
header. Instead, non-resident attributes are stored in cluster- cluster address of the $MFT file.
runs, which are consecutive clusters, and the run is doc-
OpenHdd(szCmdOut);
umented using the starting cluster address and run length. hVolume = CreateFile(drivename, GENERIC_READ,
Nearly every allocated MFT entry has a $FILE NAME FILE_SHARE_READ|FILE_SHARE_WRITE, NULL,
OPEN_EXISTING, 0, NULL);
and a $STANDARD INFORMATION type attribute. The ReadBoot_cal_mft();
$FILE NAME attribute contains the file name, size, and ReadFile (hVolume, &boot, 512, &RealRead, NULL);
cal_mft();
temporal information. The $STANDARD INFORMATION
attribute contains temporal, ownership, and security infor-
mation. The $FILE NAME attribute is used by our tech- After that, it is necessary to locate the initial address of the
nique for two purposes. First, it is placed in a MFT entry $MFT by finding also the pagefile entry.
2
Start before the collection phase. Undoubtedly, as stated by [6],
the pagefile could enhance this process by carving out the
Open Hard Disk complementary memory pages which were swapped-out by
the memory management unit, during the ordinary func-
Calculate start cluster
and length of pagefile tioning of the operating system. As a matter of fact, it is
Read Boot sector
certainly necessary and possible, even though not trivial, to
determine to which process belongs a given page stored into
Find $MFT file Move the file pointer the pagefile, in order to create an evidence by linking the
to start cluster of pagefile
swapped page with the related running process. Accord-
Find filename attribute ingly, this step has not been implemented yet, but it will
in the $MFT
Dump pagefile to a file be done as a further research, by following the indications
of [6]. So far, we have applied standard searching tech-
No niques which are aimed at extracting some valuable con-
Close Hard Disk tent, such as passwords, user IDs, credit card numbers, frag-
Filename = ments of pictures, keystrokes information, messenger chat
“Pagefile.sys”?
logs and contents of recent used files, such as URLs and
END
Yes textual documents. For example, as it is visible in figure 5,
a web password is detectable within a fixed string schema.
Find data attribute
of pagefile entry This permits to create simple filters which can be used to
detect this piece of potentially useful data.
With the purpose of extracting such information, we can
use at least two different, although standard, approaches,
Figure 3. Diagram of the implemented collection tool. that is searching keywords with classical tools, such as
strings of Linux, or applying data carving algorithms,
such as those used by Scalpel [13] or Foremost [5]. The
__int64 mftpos = (__int64)(lowpos)*(__int64)(512); former method is not likely to be applied to search Korean
__int64 curpos = 0;
curpos = myFileSeek(hVolume,mftpos,FILE_BEGIN);
words. Thus, we have used a filtering algorithm capable
find_pagefile(); of discriminating such coding schema. This approach is us-
able, for example, to find passwords, credit card numbers or
In the $MFT every entry has a filename attribute, so we even keystrokes information. The latter method is certainly
can find the pagefile entry by comparing the name, which applicable, although it suffers from an extremely high num-
is pagefile.sys, with the corresponding attribute. It is ber of false positive rate detection, that is the recovered files
necessary also to calculate the pagefile sector number ac- are misclassified. Indeed, the sequence of memory pages
cording to pagefile entry in the $MFT file. Having found in the pagefile, as is verifiable by observing it, is almost
the pagefile entry, we can extract the data attribute. pseudo casual. Thus, even if the header of a known file,
such as a JPEG, is detected, the body is not likely to be
cal_pagefile_sector();
__int64 movepos = recovered, being the sequence of pages very different and
(__int64)(pagefile_sector)*(__int64)(512); highly fragmented.
curpos = myFileSeek(hVolume,movepos,FILE_BEGIN);
In our research we have analysed about 60 pagefiles of
Finally, there is the physical dump of the pagefile from the different public accessible computers, by recovering about
hard disk. 45 sensitive pieces of information among those we have
mentioned previously. With regard to passwords recovery,
DumpPagefile(argv[2]); according to table 1, if the pagefile size is over 768 Mbytes,
for(j = 0 ; j < (clusterlength_pagefile) ;j++){
ReadFile(hVolume, Buffer, clustersize, the probability to find a password is about 66%. By consid-
&RealRead, NULL); ering the memory size of the current computer system, we
if(fp!=NULL){
fwrite(Buffer, clustersize, 1, fp); can observe that it is often above 512 Mbytes. Thus, we can
} guess that sensitive information will be found on the major-
}
ity of such pagefiles. If a pagefile is stolen, the following
problems may arise.
3 Pagefile Analysis: Results and Issues First, critical information like passwords, identification
numbers, and credit card numbers can be stolen. Interest-
One of the goals of volatile memory analysis is to recon- ingly, ordinary users are used to having the same password
struct as many processes as possible which were executed and user ID to log on various web sites. With the knowledge
3
of the web services normally used by the attacked user, it is Internet Explorer Process
possible to steal other precious privacy information.
Second, criminals might draw some conclusions about n el
users’ habits. Indeed, there are details about the system us- han
eC
cur
age in the pagefile, such as the history of visited web pages. Se Web Server
Accordingly, if it was analysed, there will be the possibility
to infringe privacy data by using these habits of the user in
Memory Memory Dump
order to determine a certain profile.
Third, after a user has read a document, even by using
some anti-forensic technique [12] to erase it carefully, there
will be the possibility that a part, if not all the document is
left in the pagefile. Thus, important information could be Figure 4. Collection of sensitive information from a pro-
stolen. cess memory dump.
Pagefile size Memory size Number of Number of SX KWWS$))ORJLQHPSDVFRP)ORJLQ)ORJBHPSDVKWP
analysed files passwords found
O)HV'KWWS$))ZZZHPSDVFRP E6HFXUH
256M below 192M below 4 0
384M 256M 7 5
XVHULG ,'6$03/( SDVVZG 3$66:'6$03/( [ \
768M 512M 27 14
1G 768M beyond 18 16
Figure 5. Results of Internet Explorer memory dump
Table 1. Number of passwords discovered related to the analysis.
pagefile size
encryption purposes. As it is visible in figure 6, it will be
certainly possible to recover the password. As suggested in
our previous work on pagefile [8], it would be advisable to
4 Leakage of Privacy in Memory System adopt a system model which uses the encryption of the main
memory and the pagefile, in order not to have any leakage of
The majority of well-known procedures regarding the sensitive information. Therefore, these pieces of informa-
collection of volatile information are related to gather the tion may be swapped out from main memory into pagefile,
system state, with the purpose of taking a memory snap- by creating a fault with respect to the user’s privacy.
shot which can reveal plentiful information. Certainly, in
order to reach this goal, there are interesting solutions al-
ready mentioned in standard literature [3]. It is possible,
()0021IB0001107798,ou=CHB,ou=personal,o=yessign,c=krcn=yessignCA,
for example, to take the full space memory related with a ou=LicensedCA,o=yessign,c=krCER.PASSWDSAMPLE
particular process and to look for sensitive pieces of infor-
mation. In order to verify this, we have analysed the process
space memory of the Internet Explorer program, which has Figure 6. Results of a certificate (PKI) analysis.
been used to connect towards a secure server by using an en-
crypted channel. There are plenty of tools to deal with this Another very interesting source of information is repre-
task and we used one called Userdump [10]. Interestingly, sented by the hybernation file [3] [2] which can be consid-
by analyzing the process memory dump with a simple key- ered as a full state snapshot of the computer system. As it
word searching technique, we discovered the user ID and is known, when Windows operating system goes into hy-
the password used to log on the remote secure server. With bernation mode, the power manager saves the compressed
regard to figures 4 and 5, even though we log on to a re- content of the physical memory, which preserves the state
mote server by using a secure encrypted channel, the user of the system, to a file called Hiberfil.sys in the root
credentials, which are user ID and password, are necessar- directory of the system volume. The file is capable of
ily loaded into the main memory to be sequently transferred containing the uncompressed contents of physical memory,
to the server. This implies that these sensitive piece of data but compression is used to minimize disk I/O and facili-
remain into the main memory not encrypted. tate the resume phase. Normally, during the boot process,
It is also possible to recover passwords from a PKI (Public if a valid Hiberfil.sys file is located, the Windows
Key Infrastructure) certificate, used for authentication and boot loader NTLDR will load the file’s contents into physi-
4
cal memory and transfer control to the kernel that handles [2] B. Carrier. Filesystem Foreniscs Analysis. Addison-Wesley,
resuming system operation after hybernation. Hence, this Inc., 2005.
file can be used to carve out plenty of information which [3] H. Carvey. Windows Foreniscs Analysis DVD Toolkit.
can be used in an investigative context. With the same ap- Addison-Wesley, Inc., 2005.
[4] W. Chisum and B. Turvey. Evidence dynamics: Locard’s
proach used to recover the pagefile.sys, we can col-
exchange principle and crime reconstruction. Journal of Be-
lect Hiberfil.sys by preserving its digital integrity. havioral Profiling, 1, 2000.
Thus, this appears as an interesting possibility for a digital [5] K. Kendall and J. Kornblum. Foremost, 2001. Software
investigation on a live system. available at: http://foremost.sourceforge.
net/.
5 Conclusions and Future Works [6] J. Kornblum. Using every part of the buffalo in windows
memory analysis. Digital Investigation, 4:24–29, 2007.
[7] S. Lee, H. Kim, S. Lee, and J. Lim. Digital evidence collec-
We have presented an innovative tool capable of collect- tion process in integrity and memory information gathering.
ing the entire pagefile on a live Windows based computer In Proceedings of Systematic Approaches to Digital Foren-
system. We have emphasized the fact that such a tool is sic Engineering, First International Work-shop, Proc. IEEE,
valuable for forensics practitioners who are looking for pages 236–247, 2005.
pieces of information helpful to facilitate and enhance [8] S. Lee, A. Savoldi, S. Lee, and J. Lim. Password recovery
using an evidence collection tool and countermeasures. In
an investigative scenario. Moreover, we have discussed
to appear in Proceedings of Intelligent Information Hiding
the NTFS structure and the main algorithm by portraying
and Multimedia Signal Processing, Proc. IEEE, 2007.
the peculiarities of such a filesystem which can be used [9] K. Li and P. Hudak. Memory coherence in shared virtual
to reconstruct the complete pagefile. Additionally, we memory system. In Transaction on Computer System, Proc.
have pinpointed the results of analysis of the pagefile, by ACM, pages 321–359, 1989.
noticing that the bigger the pagefile is, the more likely [10] Microsoft. User Mode Process Dumper Version 8.1, 2007.
it is to find sensitive pieces of information. Finally, we Software available at: http://support.microsoft.
have outlined an example related to a leakage of privacy com/kb/241215.
information in the main memory. As a future work, we [11] N. Petroni, A. Walters, T. Fraser, and W. Arbaugh. Fatkit: A
would like to improve the memory analysis scenario by framework for the extraction and analysis of digital forensic
data from volatile system memory. Digital Investigation,
integrating the pagefile analysis into the volatile memory
3:197–210, 2006.
framework. Finally, we will provide the necessary exten- [12] S. Rekhis and N. Boudriga. Formal forensic investigation
sion to the tool in order to determine which process belongs eluding disk-based anti-forensic attacks. In S. Verlag, edi-
a given memory page stored into the pagefile. tor, Workshop on Information Security Applications, LNCS
Series, 2005.
Acknowledgements. This work was supported by the IT [13] G. Richard. Scalpel: A frugal, high performance file carver.
R&D program of MIC/IITA [2007-S019-01] (Development In Proceedings of Digital Forensics Research Workshop,
of Digital Forensic System for Information Transparency). Proc. IEEE, 2005.
[14] J. Rutkowska. Beyond The CPU: Defeating Hardware
Based RAM Acquisition Tools (Part I: AMD case). Black
References Hat Conference DC, 2007.
[15] A. Silberschatz and P. Galvin. Operating System Concepts.
[1] B. Carrier. File System Forensic Analysis. Addison Wesley, Syngress Publishing, Inc., 2007.
2005.