A powerful web-based tool for extracting hidden metadata from online documents. Perfect for digital investigators, journalists, and security professionals.
- Overview
- Importance in OSINT & Cybersecurity
- Features
- Installation
- Usage
- Document Type Examples
- API Reference
- Contributing
- License
NFOMINER automates the process of extracting hidden metadata from various document types. Simply provide a URL, and the tool downloads, analyzes, and presents all available metadata in an organized format.
| Use Case | Importance |
|---|---|
| Author Identification | Discover who created a document and their contact information |
| Timeline Analysis | Establish creation and modification timelines for evidence |
| Document Authenticity | Verify if documents have been tampered with or modified |
| Source Verification | Confirm the origin of leaked or suspicious documents |
- Threat Intelligence: Analyze malicious documents for attacker fingerprints
- Incident Response: Trace the origin of data breaches through document metadata
- Forensic Analysis: Extract evidence from documents in cybercrime cases
- Policy Enforcement: Detect policy violations through unauthorized document sharing
- Journalism: Verify source documents and whistleblower submissions
- Law Enforcement: Gather evidence from digital documents in investigations
- Corporate Security: Monitor for unauthorized information sharing
- Academic Research: Ensure proper attribution and source verification
- Multi-Format Support: PDF, Word, Excel, Images, and more
- Web-Based Interface: Easy-to-use web application
- Bulk Analysis: Process multiple documents simultaneously
- Export Capabilities: Download results in CSV/JSON format
- REST API: Programmatic access for automation
- Self-Hosted: Complete control over your data and analysis
| Format | Metadata Extracted |
|---|---|
| Author, Creator, Producer, Creation Date, Modification Date, Keywords | |
| DOCX | Author, Created Date, Modified Date, Last Modified By, Revision, Title |
| XLSX | Company, Manager, Application, Version, Various custom properties |
| Images | EXIF data, GPS coordinates, Camera make/model, Software used |
| Legacy Office | Title, Subject, Author, Comments, Revision History |
- Python 3.8 or higher
- pip package manager
Follow the link below, copy and install the script manually using nano: https://gist.github.com/techenthusiast167/15538980658e65fef95f3c3e838ce196
Example:
- nano nfominer.py
- Press Ctrl + O, Enter, Ctrl + X to save and exit
pip install flask requests PyPDF2 python-docx olefile Pillow exifread pandas
Create or activate your virtual environment:
virtualenv my_temp_venv
source my_temp_venv/bin/activate
Web Interface:
-
Start the server: python nfominer.py
-
Open browser: Navigate to http://localhost:5000
-
Enter URL: Paste document URL and select file type
-
Analyze: Click "Extract Metadata" to view results
1. PDF Document Analysis
Example URL: https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf
Extracted Metadata:
"Title": "Dummy PDF file", "Author": "Author Name", "Creator": "PDFKit.NET", "Producer": "PDFKit.NET 3.0 for .NET", "CreationDate": "D:20220115103425+01'00'", "ModDate": "D:20220115103425+01'00'", "Keywords": "dummy, test, pdf"
2. Word Document (DOCX) Analysis
Example URL: https://calibre-ebook.com/downloads/demos/demo.docx
Extracted Metadata:
"Author": "John Doe", "Created": "2022-03-15 14:30:45+00:00", "Modified": "2022-03-20 09:15:22+00:00", "Last Modified By": "Jane Smith", "Revision": "3", "Title": "Quarterly Report Q1 2022", "Company": "Acme Corporation"
3. Image Metadata Extraction
Example URL: https://example.com/sample-image.jpg
Extracted EXIF Data:
"Image Make": "Canon", "Image Model": "Canon EOS 5D Mark IV", "DateTimeOriginal": "2023:06:15 14:35:22", "GPSLatitude": "34.0522", "GPSLongitude": "-118.2437", "Software": "Adobe Photoshop 2023", "Artist": "Photographer Name"
OSINT Value: Geolocation intelligence, device fingerprinting, photographer identification
4. Excel Spreadsheet Analysis
Example URL: https://file-examples.com/wp-content/storage/2017/02/file_example_XLSX_10.xlsx
Extracted Metadata:
"Author": "Data Analyst Team", "Last Modified By": "mary.johnson@company.com", "Company": "Global Solutions Inc.", "Application": "Microsoft Excel", "Created": "2021-11-10 08:45:00+00:00"
OSINT Value: Email discovery, organizational structure, business context
5. Older Office Documents (DOC, XLS)
Example URL: https://www.example.com/documents/old-report.doc
Extract Metadata:
"Title": "Confidential Market Analysis", "Subject": "Competitor Research", "Author": "John Analyst", "Keywords": "competitor, market, research, 2020", "Comments": "For internal use only - do not distribute", "Last Author": "Sarah Manager", "Revision Number": "5", "Application": "Microsoft Word 2010", "Create Time/Date": "2020-08-12 10:30:00", "Last Save Time/Date": "2020-08-15 16:45:00", "Template": "Normal.dotm", "Company": "Strategic Insights Ltd."
OSINT Value: Sensitivity indicators - "Confidential", "do not distribute". Workflow analysis - Multiple revisions and authors. Temporal analysis - Document creation and modification timeline. Organizational context - Company name and employee roles
-
Copyright (c) 2024 Tech Enthusiast
-
Permission is hereby granted...