0% found this document useful (0 votes)

37 views6 pages

TransformoDocs: Smart Doc Automation

Uploaded by

itfw2205

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views6 pages

TransformoDocs: Smart Doc Automation

Uploaded by

itfw2205

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

SMART INDIA HACKATHON 2024

• Organisation/Ministry - Ministry of Electronics and Information Technology

• Problem Statement ID - SIH1669

• Problem Statement Title - Transformo Docs Application: Empowering

Machine-Readable Document Management System.

• Theme- Smart Automation

• Team ID- 8471

• Institute- G.L Bajaj Institute of Technology and Management , Greater Noida ,UP

• Institute code(AISHE)- C-46239

• Team Name- CodeXplorers

• Team leader name - Shubhanshu Omer Transformo Docs

CodeXplorers IDEA/APPROACH DETAILS
IDEA/SOLUTION : Problem Resolution :
TransformoDocs is a comprehensive document
❖ Apply OCR and NLP to turn scanned images and text
transformation application designed to restrict
into structured formats (XML, JSON, CSV) for
non-machine-readable documents and automatically automated data processing and integration [4,6].
generate machine-readable formats, enhancing ❖ Use a validation engine to filter out non-compliant
automation, accessibility, and data extraction [1]. formats and accept only machine-readable documents.

❖ Document Ingestion Restriction: Blocks non-machine

readable formats (e.g., PDFs, DOCs) from being processed by
the system [1,3]. Unique Value Propositions (UVP) :
❖ Automatic Conversion: Automatically convert any
❖ Dual Functionality: This not only blocks
document-scanned, generated by software, or uploaded-into a
non-machine-readable formats but also converts them
machine-readable format [2,3].
into readable documents seamlessly [3,4].
❖ Enhanced Searchability: Allows documents to be indexed and
❖ Universal Compatibility: This handles scanned
searched effectively, enhancing accessibility and efficiency
images, PDFs, and software documents, easily
[5,6].
integrating into any workflow [2,6]
@SIH Idea submission- Template
.
CodeXplorers TECHNICAL APPROACH

Algorithm Development :

● Natural Language Processing (NLP)

● Optical Character Recognition (OCR)
● Search Algorithms
● Document Layout Analysis

Frontend : Database:

ReactJs PostgreSQL
TypeScript MongoDB
Tailwind CSS

Backend : Cloud Services:

Tesseract OCR AWS

Django Azure Blob Storage Document Management Process Flowchart
@SIH Idea submission- Template 3
CodeXplorers FEASIBILITY AND VIABILITY
POTENTIAL CHALLENGES :
FEASIBILITY:
❖ Non-Machine-Readable Documents: PDFs and
❖ Technical Feasibility: utilizes OCR and NLP tools DOCs contain data but are hard to process
like Tesseract,Google vision and AWS Textract for automatically due to their unstructured nature [5].
fast document conversion [2,4]. ❖ Limited Searchability and Insights: Extracting and
❖ Operational Feasibility:Scales with high analyzing information from these documents is often
manual, time-consuming, and inefficient [6].
document volumes using cloud infrastructure and
❖ Barrier to Automation: Non-machine-readable
microservices [5].
documents hinder automated workflows [1].
❖ Regularity & compliance feasibility: Ensure
security of sensitive documents with STRATEGIES:
encryption,access control and data anonymization
[6]. ❖ Utilize established OCR and NLP tools like
❖ Market Feasibility: Meets growing demand for Tesseract, Google Cloud Vision, or AWS Textract [4].
efficient documents management by handling ❖ Develop an intuitive user interface with input from
non-machine-readable formats [2]. end-users to ensure ease of use [5].
@SIH Idea submission- Template 4
CodeXplorers IMPACT AND BENEFITS

POTENTIAL IMPACT: BENEFITS:

❖ Positive Impact: ❖ Improved Efficiency: Automatic conversion of

non-machine-readable documents saves time, reducing
● Enhanced Efficiency: Automates conversion the need for manual intervention [6].
and validation, reducing manual effort [3].
● Cost Savings: Cuts labor costs by reducing ❖ Enhanced Data Accessibility: Organizations can
manual data entry [6]. access the data contained in their documents quickly
● Improved Data Accessibility: Enables faster and efficiently [5].
information retrieval through machine
readable formats [4,5]. ❖ Greater Compliance: Document conversion ensures
that data follows the required formats for regulatory and
❖ Negative Impact: accessibility standards [4].

● Learning Curve: Users may face challenges ❖ Scalability: The application is designed to scale across
adapting to new automated processes [5]. multiple departments or even organizations, handling
● Data Security Risks: Automation could expose large volumes of documents seamlessly [6].
sensitive data to vulnerabilities.
@SIH Idea submission- Template 5
CodeXplorers RESEARCH AND REFERENCES

REFERENCES:

1. Pandey, M., Arora, M., Arora, S., Goyal, C., Gera, V. K., & Yadav, H. (2023). AI-based Integrated Approach for the Development of
Intelligent Document Management System (IDMS). Procedia Computer Science, 230, 725-736. [CrossRef]
2. Parikh, A. (2023). Information Extraction from Unstructured data using Augmented-AI and Computer Vision. arXiv preprint
arXiv:2312.09880. [CrossRef]

3. Zhu, M., & Cole, J. M. (2022). PDFDataExtractor: A tool for reading scientific text and interpreting metadata from the typeset
literature in the portable document format. Journal of Chemical Information and Modeling, 62(7), 1633-1643. [CrossRef]
4. Pudasaini, S., Shakya, S., Lamichhane, S., Adhikari, S., Tamang, A., & Adhikari, S. (2022). Application of NLP for information
extraction from unstructured documents. In Expert Clouds and Applications: Proceedings of ICOECA 2021 (pp. 695-704).
Springer Singapore. [CrossRef]

5. Sage, C., Douzon, T., Aussem, A., Eglin, V., Elghazel, H., Duffner, S., ... & Espinas, J. (2021). Data-efficient information extraction
from documents with pre-trained language models. Springer International Publishing. [CrossRef]

6. Adnan, K., & Akbar, R. (2019). An analytical study of information extraction from unstructured and multidimensional big data.
Journal of Big Data, 6(1), 1-38. [CrossRef]

@SIH Idea submission- Template 6

SIH2024 IDEA Presentation Format
No ratings yet
SIH2024 IDEA Presentation Format
7 pages
Data Science Document Processing & Structuring Project
No ratings yet
Data Science Document Processing & Structuring Project
6 pages
PROJECT
No ratings yet
PROJECT
32 pages
SIH2024 Docbot
No ratings yet
SIH2024 Docbot
6 pages
DocuMorph AI Project Cloud 100 Page Formatter
No ratings yet
DocuMorph AI Project Cloud 100 Page Formatter
6 pages
LLM For QnA Proposal
No ratings yet
LLM For QnA Proposal
12 pages
Exploring AI-driven Approaches For Unstructured Document Analysis and Future Horizons
No ratings yet
Exploring AI-driven Approaches For Unstructured Document Analysis and Future Horizons
54 pages
Met PDF Extraction System
No ratings yet
Met PDF Extraction System
43 pages
Sciencedirect: Procedia Computer Science 230 (2023) 725-736
No ratings yet
Sciencedirect: Procedia Computer Science 230 (2023) 725-736
12 pages
Jobless Group Logithon PPT
No ratings yet
Jobless Group Logithon PPT
7 pages
Text Extraction From Document Image
No ratings yet
Text Extraction From Document Image
7 pages
Springer-Naman Khetrapal Final
No ratings yet
Springer-Naman Khetrapal Final
12 pages
Updated Project File
No ratings yet
Updated Project File
77 pages
CV NguyenVanTuan
No ratings yet
CV NguyenVanTuan
3 pages
Agentic AI Approval Document Requirements
No ratings yet
Agentic AI Approval Document Requirements
6 pages
Problem Statement
No ratings yet
Problem Statement
4 pages
AI Enhanced App Presentation
No ratings yet
AI Enhanced App Presentation
6 pages
Accounting Automation Using Genai
100% (1)
Accounting Automation Using Genai
18 pages
OCR Project Summary
No ratings yet
OCR Project Summary
4 pages
DL 9
No ratings yet
DL 9
10 pages
Pdfquery
No ratings yet
Pdfquery
68 pages
BTech DD Design Project Report MPD19I006
No ratings yet
BTech DD Design Project Report MPD19I006
36 pages
Projects For Ai
No ratings yet
Projects For Ai
8 pages
Byte Brawl
No ratings yet
Byte Brawl
11 pages
Mini Project Docubot Power Point
No ratings yet
Mini Project Docubot Power Point
17 pages
Hack Hustlers: Keshav Garg - Generative AI Engineer Jatin Raghav - Full Stack Engineer Parv Maurya - UI/UX Designer
No ratings yet
Hack Hustlers: Keshav Garg - Generative AI Engineer Jatin Raghav - Full Stack Engineer Parv Maurya - UI/UX Designer
5 pages
1010manindra Ai Resume
No ratings yet
1010manindra Ai Resume
2 pages
Concept Note Template (Nascomm Theme 1) 1d74f46
No ratings yet
Concept Note Template (Nascomm Theme 1) 1d74f46
1 page
Mdgsoc
No ratings yet
Mdgsoc
6 pages
Batch 25
No ratings yet
Batch 25
27 pages
Ai Websites and Ai That Find Patterns in Information That Strictly in A Safe Ethical Manner Help Lead To More Discoveries
No ratings yet
Ai Websites and Ai That Find Patterns in Information That Strictly in A Safe Ethical Manner Help Lead To More Discoveries
6 pages
Gen AI Use Cases
No ratings yet
Gen AI Use Cases
43 pages
Miniproject
No ratings yet
Miniproject
43 pages
A Survey of Deep Learning Approaches For OCR and D
No ratings yet
A Survey of Deep Learning Approaches For OCR and D
14 pages
MDG Soc Team
No ratings yet
MDG Soc Team
6 pages
The Ultimate Guide To Intelligent Document Processing 1709708578
No ratings yet
The Ultimate Guide To Intelligent Document Processing 1709708578
8 pages
Interview Task 1
No ratings yet
Interview Task 1
2 pages
Machine Learing Based Aadhar Card and Driving License Data Extraction in Database
No ratings yet
Machine Learing Based Aadhar Card and Driving License Data Extraction in Database
40 pages
AI-Powered Document Automation
No ratings yet
AI-Powered Document Automation
4 pages
Miniproject Sample Report Template
No ratings yet
Miniproject Sample Report Template
38 pages
Synopsis Shaur
No ratings yet
Synopsis Shaur
8 pages
Keynote 1 - Accelerate Your Programming Career Before You Get Left Behind
No ratings yet
Keynote 1 - Accelerate Your Programming Career Before You Get Left Behind
19 pages
Broad Information of AI
No ratings yet
Broad Information of AI
4 pages
Sahil Garg Updated For Azure
No ratings yet
Sahil Garg Updated For Azure
8 pages
AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity For Enterprise Documents
No ratings yet
AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity For Enterprise Documents
17 pages
Hackathon - Problem Statements
No ratings yet
Hackathon - Problem Statements
9 pages
Shyena Consultant Ayush S MLOps 5+ Years
No ratings yet
Shyena Consultant Ayush S MLOps 5+ Years
5 pages
Presentation 2 K
No ratings yet
Presentation 2 K
12 pages
Ijsred V8i3p312
No ratings yet
Ijsred V8i3p312
6 pages
Final - Synopsis (2) With Pages Removed
No ratings yet
Final - Synopsis (2) With Pages Removed
15 pages
Ishita Patel Resume 2025
No ratings yet
Ishita Patel Resume 2025
2 pages
From Handwritten Records To Digital Databases
No ratings yet
From Handwritten Records To Digital Databases
14 pages
A Study On Intelligent Document Processing Using AWS: Smt. K.S. Sukrutha, Ms. Harini.S, Ms - Kusuma. M. V
No ratings yet
A Study On Intelligent Document Processing Using AWS: Smt. K.S. Sukrutha, Ms. Harini.S, Ms - Kusuma. M. V
5 pages
Engineering Titles
No ratings yet
Engineering Titles
6 pages
Python Basics for M.Sc. IT Students
No ratings yet
Python Basics for M.Sc. IT Students
2 pages
LM3 - NA Manual
No ratings yet
LM3 - NA Manual
112 pages
Cyber Security
No ratings yet
Cyber Security
72 pages
Python Programming Lesson Plan
No ratings yet
Python Programming Lesson Plan
10 pages
Android Bluetooth
No ratings yet
Android Bluetooth
8 pages
WiFi Hunter
No ratings yet
WiFi Hunter
4 pages
Azure Container Registry Guide
No ratings yet
Azure Container Registry Guide
486 pages
Essentials of MIS 13th Edition
No ratings yet
Essentials of MIS 13th Edition
45 pages
Standards in Computer Networks Detailed
No ratings yet
Standards in Computer Networks Detailed
3 pages
CRS - IBM Training Course List
No ratings yet
CRS - IBM Training Course List
23 pages
Assembly Language Lab Guide
No ratings yet
Assembly Language Lab Guide
15 pages
Compiler Design Lab Record
No ratings yet
Compiler Design Lab Record
31 pages
Elevator Alarm & Monitoring Solutions
No ratings yet
Elevator Alarm & Monitoring Solutions
5 pages
Python Graph Construction Program
No ratings yet
Python Graph Construction Program
8 pages
Theory of Multiplication Algorithms in Computer Architecture
No ratings yet
Theory of Multiplication Algorithms in Computer Architecture
4 pages
Hima PLC Error Code
No ratings yet
Hima PLC Error Code
4 pages
Spring 2023 ARC Learning Assistants
No ratings yet
Spring 2023 ARC Learning Assistants
14 pages
Sri Vidya College of Engineering & Technology Unit 2 - Question Bank
No ratings yet
Sri Vidya College of Engineering & Technology Unit 2 - Question Bank
4 pages
قياس سعة التخزين الرقمي
No ratings yet
قياس سعة التخزين الرقمي
13 pages
Python Basic and Advanced-Day 6
No ratings yet
Python Basic and Advanced-Day 6
12 pages
MIC-2 MKII: Advanced Power Monitoring
No ratings yet
MIC-2 MKII: Advanced Power Monitoring
1 page
FALCON III® RF-7800V-V51X: VHF Networking Vehicular Radio
No ratings yet
FALCON III® RF-7800V-V51X: VHF Networking Vehicular Radio
2 pages
Python - Loop Dictionaries
No ratings yet
Python - Loop Dictionaries
1 page
ITI Examiner User Manual (Draft) V1-1
No ratings yet
ITI Examiner User Manual (Draft) V1-1
22 pages
Manual Testing Interview Questions (General Testing) : What Does Software Testing Mean?
No ratings yet
Manual Testing Interview Questions (General Testing) : What Does Software Testing Mean?
30 pages
PPA Unit - 1
No ratings yet
PPA Unit - 1
72 pages
Deloitte Fiber Tapping Q1 2017 English
No ratings yet
Deloitte Fiber Tapping Q1 2017 English
10 pages
Mobile Application Sample Test Cases (Version 1) .XLSB
No ratings yet
Mobile Application Sample Test Cases (Version 1) .XLSB
26 pages
Release-Notes BD9XX-V5.50 V6.10
No ratings yet
Release-Notes BD9XX-V5.50 V6.10
4 pages
5th Chapter 5 - C L Programming
No ratings yet
5th Chapter 5 - C L Programming
65 pages

TransformoDocs: Smart Doc Automation

Uploaded by

TransformoDocs: Smart Doc Automation

Uploaded by

SMART INDIA HACKATHON 2024

• Organisation/Ministry - Ministry of Electronics and Information Technology

• Problem Statement ID - SIH1669

• Problem Statement Title - Transformo Docs Application: Empowering

Machine-Readable Document Management System.

• Theme- Smart Automation

• Team ID- 8471

• Institute code(AISHE)- C-46239

• Team Name- CodeXplorers

• Team leader name - Shubhanshu Omer Transformo Docs

❖ Document Ingestion Restriction: Blocks non-machine

● Natural Language Processing (NLP)

Backend : Cloud Services:

Tesseract OCR AWS

POTENTIAL IMPACT: BENEFITS:

❖ Positive Impact: ❖ Improved Efficiency: Automatic conversion of

@SIH Idea submission- Template 6

You might also like