Skip to content

This tool extracts hidden metadata from online documents (PDF, Word, Excel, Images) to reveal information about the document's origin, author, and creation details.

Notifications You must be signed in to change notification settings

techenthusiast167/NFOMINER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

NFOMINER - Document Metadata Extraction Tool

Python Flask OSINT License

A powerful web-based tool for extracting hidden metadata from online documents. Perfect for digital investigators, journalists, and security professionals.

NFOMINER Demo

Table of Contents

Overview

NFOMINER automates the process of extracting hidden metadata from various document types. Simply provide a URL, and the tool downloads, analyzes, and presents all available metadata in an organized format.

Importance in OSINT & Cybersecurity

Digital Investigations

Use Case Importance
Author Identification Discover who created a document and their contact information
Timeline Analysis Establish creation and modification timelines for evidence
Document Authenticity Verify if documents have been tampered with or modified
Source Verification Confirm the origin of leaked or suspicious documents

Cybersecurity Applications

  • Threat Intelligence: Analyze malicious documents for attacker fingerprints
  • Incident Response: Trace the origin of data breaches through document metadata
  • Forensic Analysis: Extract evidence from documents in cybercrime cases
  • Policy Enforcement: Detect policy violations through unauthorized document sharing

Real-World Impact

  • Journalism: Verify source documents and whistleblower submissions
  • Law Enforcement: Gather evidence from digital documents in investigations
  • Corporate Security: Monitor for unauthorized information sharing
  • Academic Research: Ensure proper attribution and source verification

Features

  • Multi-Format Support: PDF, Word, Excel, Images, and more
  • Web-Based Interface: Easy-to-use web application
  • Bulk Analysis: Process multiple documents simultaneously
  • Export Capabilities: Download results in CSV/JSON format
  • REST API: Programmatic access for automation
  • Self-Hosted: Complete control over your data and analysis

Supported Document Types

Format Metadata Extracted
PDF Author, Creator, Producer, Creation Date, Modification Date, Keywords
DOCX Author, Created Date, Modified Date, Last Modified By, Revision, Title
XLSX Company, Manager, Application, Version, Various custom properties
Images EXIF data, GPS coordinates, Camera make/model, Software used
Legacy Office Title, Subject, Author, Comments, Revision History

Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Quick Start

Manual Installation

Follow the link below, copy and install the script manually using nano: https://gist.github.com/techenthusiast167/15538980658e65fef95f3c3e838ce196

Example:

  • nano nfominer.py
  • Press Ctrl + O, Enter, Ctrl + X to save and exit

Install dependencies

pip install flask requests PyPDF2 python-docx olefile Pillow exifread pandas

Usage

Create or activate your virtual environment:

 virtualenv my_temp_venv
 source my_temp_venv/bin/activate 

Web Interface:

  • Start the server: python nfominer.py

  • Open browser: Navigate to http://localhost:5000

  • Enter URL: Paste document URL and select file type

  • Analyze: Click "Extract Metadata" to view results


Document Type Examples

1. PDF Document Analysis

Example URL: https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf

Extracted Metadata:

"Title": "Dummy PDF file", "Author": "Author Name", "Creator": "PDFKit.NET", "Producer": "PDFKit.NET 3.0 for .NET", "CreationDate": "D:20220115103425+01'00'", "ModDate": "D:20220115103425+01'00'", "Keywords": "dummy, test, pdf"

OSINT Value: Author identification, software fingerprint, timeline analysis

2. Word Document (DOCX) Analysis

Example URL: https://calibre-ebook.com/downloads/demos/demo.docx

Extracted Metadata:

"Author": "John Doe", "Created": "2022-03-15 14:30:45+00:00", "Modified": "2022-03-20 09:15:22+00:00", "Last Modified By": "Jane Smith", "Revision": "3", "Title": "Quarterly Report Q1 2022", "Company": "Acme Corporation"

OSINT Value: Organizational intelligence, workflow analysis, employee identification

3. Image Metadata Extraction

Example URL: https://example.com/sample-image.jpg

Extracted EXIF Data:

"Image Make": "Canon", "Image Model": "Canon EOS 5D Mark IV", "DateTimeOriginal": "2023:06:15 14:35:22", "GPSLatitude": "34.0522", "GPSLongitude": "-118.2437", "Software": "Adobe Photoshop 2023", "Artist": "Photographer Name"

OSINT Value: Geolocation intelligence, device fingerprinting, photographer identification


4. Excel Spreadsheet Analysis

Example URL: https://file-examples.com/wp-content/storage/2017/02/file_example_XLSX_10.xlsx

Extracted Metadata:

"Author": "Data Analyst Team", "Last Modified By": "mary.johnson@company.com", "Company": "Global Solutions Inc.", "Application": "Microsoft Excel", "Created": "2021-11-10 08:45:00+00:00"

OSINT Value: Email discovery, organizational structure, business context


5. Older Office Documents (DOC, XLS)

Example URL: https://www.example.com/documents/old-report.doc

Extract Metadata:

"Title": "Confidential Market Analysis", "Subject": "Competitor Research", "Author": "John Analyst", "Keywords": "competitor, market, research, 2020", "Comments": "For internal use only - do not distribute", "Last Author": "Sarah Manager", "Revision Number": "5", "Application": "Microsoft Word 2010", "Create Time/Date": "2020-08-12 10:30:00", "Last Save Time/Date": "2020-08-15 16:45:00", "Template": "Normal.dotm", "Company": "Strategic Insights Ltd."

OSINT Value: Sensitivity indicators - "Confidential", "do not distribute". Workflow analysis - Multiple revisions and authors. Temporal analysis - Document creation and modification timeline. Organizational context - Company name and employee roles


MIT License

  • Copyright (c) 2024 Tech Enthusiast

  • Permission is hereby granted...

About

This tool extracts hidden metadata from online documents (PDF, Word, Excel, Images) to reveal information about the document's origin, author, and creation details.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published