-
Notifications
You must be signed in to change notification settings - Fork 603
ftguess
decalage2 edited this page Mar 26, 2026
·
1 revision
ftguess is a Python module to determine the type of a file based on its contents (not its extension or filename). It can be used as a command-line tool or as a Python library.
It is part of the python-oletools package.
- Identifies the file type and container format from file content (magic bytes, internal structure)
- Recognises OLE-based formats (Word 97-2003, Excel 97-2003, PowerPoint 97-2003, MSI, ...)
- Recognises OpenXML/ZIP-based formats (Word 2007+, Excel 2007+, PowerPoint 2007+, XPS, ...)
- Recognises RTF, OneNote, PNG, Windows PE executables, and generic ZIP archives
- Reports the application, container type, file type, MIME content-type, and PRONOM PUID
- For OLE files, reports the root CLSID and its known name
- Supports scanning multiple files, recursive directory traversal, and files inside zip archives
- Can be used as a Python library from your own applications
| File Type | Extensions |
|---|---|
| MS Word 6-7 | .doc, .dot |
| MS Word 97-2003 | .doc, .dot |
| MS Word 2007+ Document | .docx |
| MS Word 2007+ Macro-Enabled Document | .docm |
| MS Word 2007+ Template | .dotx |
| MS Word 2007+ Macro-Enabled Template | .dotm |
| MS Excel 5.0/95 | .xls, .xlt, .xla |
| MS Excel 97-2003 | .xls, .xlt, .xla |
| MS Excel 2007+ Workbook | .xlsx |
| MS Excel 2007+ Macro-Enabled Workbook | .xlsm |
| MS Excel 2007+ Binary Workbook | .xlsb |
| MS Excel 2007+ Template | .xltx |
| MS Excel 2007+ Macro-Enabled Template | .xltm |
| MS Excel 2007+ Macro-Enabled Add-in | .xlam |
| MS PowerPoint 97-2003 | .ppt, .pps, .pot |
| MS PowerPoint 2007+ Presentation | .pptx |
| MS PowerPoint 2007+ Slideshow | .ppsx |
| MS PowerPoint 2007+ Macro-Enabled Presentation | .pptm |
| MS PowerPoint 2007+ Macro-Enabled Slideshow | .ppsm |
| MS OneNote | .one |
| Windows Installer | .msi |
| XPS | .xps |
| RTF | .rtf, .doc |
| PNG | .png |
| Windows PE Executable / DLL | .exe, .dll, .sys, .scr |
| Generic ZIP Archive | .zip |
| Generic OLE/CFB File | — |
ftguess [options] <filename> [filename2 ...]
-r find files recursively in subdirectories
-z PASSWORD if the file is a zip archive, open first file from it
using the provided password
-f PATTERN if the file is a zip archive, file(s) to open within it
(wildcards * and ? supported, default: *)
-l LEVEL logging level: debug/info/warning/error/critical
(default: warning)
$ ftguess sample.docx
ftguess 0.60.2 on Python 3.10.0 - http://decalage.info/python/oletools
THIS IS WORK IN PROGRESS - Check updates regularly!
File : sample.docx
File Type : MS Word 2007+ Document
Description: MS Word 2007+ Document (.docx)
Application: MS Word
Container : OpenXML
Content-type(s) :
PUID : None
Import FileTypeGuesser and pass either a file path or raw bytes:
from oletools.ftguess import FileTypeGuesser, FTYPE, CONTAINER, APP
# From a file path:
ftg = FileTypeGuesser(filepath='document.docx')
# From bytes in memory:
with open('document.docx', 'rb') as f:
data = f.read()
ftg = FileTypeGuesser(data=data)
# Always close when done:
ftg.close()| Attribute | Description |
|---|---|
ftg.ftype |
The matched FType_* class (see constants below) |
ftg.filetype |
String constant from FTYPE, e.g. 'Word2007_DOCX'
|
ftg.container |
String constant from CONTAINER, e.g. 'OpenXML'
|
ftg.application |
String constant from APP, e.g. 'MS Word'
|
ftg.root_clsid |
Root CLSID for OLE files (string), or None
|
ftg.root_clsid_name |
Human-readable name for the root CLSID, or None
|
ftg.main_part_content_type |
Content-type of the main part for OpenXML files, or None
|
| Method | Returns True if… |
|---|---|
ftg.is_ole() |
The container is OLE/CFB |
ftg.is_openxml() |
The container is OpenXML (ZIP-based) |
ftg.is_word() |
The file is any MS Word format |
ftg.is_excel() |
The file is any MS Excel format |
ftg.is_powerpoint() |
The file is any MS PowerPoint format |
from oletools.ftguess import FileTypeGuesser, FTYPE, CONTAINER
ftg = FileTypeGuesser(filepath='sample.doc')
if ftg.is_ole():
print('OLE container detected')
if ftg.root_clsid:
print('Root CLSID:', ftg.root_clsid, '-', ftg.root_clsid_name)
if ftg.filetype == FTYPE.WORD97:
print('This is a Word 97-2003 document, may contain VBA macros')
if ftg.ftype.may_contain_vba:
print('This file type may contain VBA macros')
ftg.close()Each FType_* class (accessible via ftg.ftype) exposes:
| Attribute | Description |
|---|---|
name |
Short name of the file type |
longname |
Full descriptive name |
extensions |
List of typical file extensions |
content_types |
List of MIME content-types |
PUID |
PRONOM Unique ID |
may_contain_vba |
True if VBA macros are possible in this format |
may_contain_xlm |
True if XLM (Excel 4) macros are possible |
may_contain_ole |
True if embedded OLE objects are possible |
Use the FTYPE, CONTAINER, and APP classes for comparisons:
from oletools.ftguess import FTYPE, CONTAINER, APP
# FTYPE examples: FTYPE.WORD97, FTYPE.EXCEL2007_XLSM, FTYPE.RTF, FTYPE.UNKNOWN, ...
# CONTAINER examples: CONTAINER.OLE, CONTAINER.OpenXML, CONTAINER.RTF, CONTAINER.ZIP, ...
# APP examples: APP.MSWORD, APP.MSEXCEL, APP.MSPOWERPOINT, APP.UNKNOWN, ...