0% found this document useful (0 votes)
37 views6 pages

TransformoDocs: Smart Doc Automation

Uploaded by

itfw2205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views6 pages

TransformoDocs: Smart Doc Automation

Uploaded by

itfw2205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SMART INDIA HACKATHON 2024

• Organisation/Ministry - Ministry of Electronics and Information Technology

• Problem Statement ID - SIH1669

• Problem Statement Title - Transformo Docs Application: Empowering

Machine-Readable Document Management System.

• Theme- Smart Automation

• Team ID- 8471

• Institute- G.L Bajaj Institute of Technology and Management , Greater Noida ,UP

• Institute code(AISHE)- C-46239

• Team Name- CodeXplorers

• Team leader name - Shubhanshu Omer Transformo Docs


CodeXplorers IDEA/APPROACH DETAILS
IDEA/SOLUTION : Problem Resolution :
TransformoDocs is a comprehensive document
❖ Apply OCR and NLP to turn scanned images and text
transformation application designed to restrict
into structured formats (XML, JSON, CSV) for
non-machine-readable documents and automatically automated data processing and integration [4,6].
generate machine-readable formats, enhancing ❖ Use a validation engine to filter out non-compliant
automation, accessibility, and data extraction [1]. formats and accept only machine-readable documents.

❖ Document Ingestion Restriction: Blocks non-machine


readable formats (e.g., PDFs, DOCs) from being processed by
the system [1,3]. Unique Value Propositions (UVP) :
❖ Automatic Conversion: Automatically convert any
❖ Dual Functionality: This not only blocks
document-scanned, generated by software, or uploaded-into a
non-machine-readable formats but also converts them
machine-readable format [2,3].
into readable documents seamlessly [3,4].
❖ Enhanced Searchability: Allows documents to be indexed and
❖ Universal Compatibility: This handles scanned
searched effectively, enhancing accessibility and efficiency
images, PDFs, and software documents, easily
[5,6].
integrating into any workflow [2,6]
@SIH Idea submission- Template
.
CodeXplorers TECHNICAL APPROACH

Algorithm Development :

● Natural Language Processing (NLP)


● Optical Character Recognition (OCR)
● Search Algorithms
● Document Layout Analysis

Frontend : Database:

ReactJs PostgreSQL
TypeScript MongoDB
Tailwind CSS

Backend : Cloud Services:

Tesseract OCR AWS


Django Azure Blob Storage Document Management Process Flowchart
@SIH Idea submission- Template 3
CodeXplorers FEASIBILITY AND VIABILITY
POTENTIAL CHALLENGES :
FEASIBILITY:
❖ Non-Machine-Readable Documents: PDFs and
❖ Technical Feasibility: utilizes OCR and NLP tools DOCs contain data but are hard to process
like Tesseract,Google vision and AWS Textract for automatically due to their unstructured nature [5].
fast document conversion [2,4]. ❖ Limited Searchability and Insights: Extracting and
❖ Operational Feasibility:Scales with high analyzing information from these documents is often
manual, time-consuming, and inefficient [6].
document volumes using cloud infrastructure and
❖ Barrier to Automation: Non-machine-readable
microservices [5].
documents hinder automated workflows [1].
❖ Regularity & compliance feasibility: Ensure
security of sensitive documents with STRATEGIES:
encryption,access control and data anonymization
[6]. ❖ Utilize established OCR and NLP tools like
❖ Market Feasibility: Meets growing demand for Tesseract, Google Cloud Vision, or AWS Textract [4].
efficient documents management by handling ❖ Develop an intuitive user interface with input from
non-machine-readable formats [2]. end-users to ensure ease of use [5].
@SIH Idea submission- Template 4
CodeXplorers IMPACT AND BENEFITS

POTENTIAL IMPACT: BENEFITS:

❖ Positive Impact: ❖ Improved Efficiency: Automatic conversion of


non-machine-readable documents saves time, reducing
● Enhanced Efficiency: Automates conversion the need for manual intervention [6].
and validation, reducing manual effort [3].
● Cost Savings: Cuts labor costs by reducing ❖ Enhanced Data Accessibility: Organizations can
manual data entry [6]. access the data contained in their documents quickly
● Improved Data Accessibility: Enables faster and efficiently [5].
information retrieval through machine
readable formats [4,5]. ❖ Greater Compliance: Document conversion ensures
that data follows the required formats for regulatory and
❖ Negative Impact: accessibility standards [4].

● Learning Curve: Users may face challenges ❖ Scalability: The application is designed to scale across
adapting to new automated processes [5]. multiple departments or even organizations, handling
● Data Security Risks: Automation could expose large volumes of documents seamlessly [6].
sensitive data to vulnerabilities.
@SIH Idea submission- Template 5
CodeXplorers RESEARCH AND REFERENCES

REFERENCES:

1. Pandey, M., Arora, M., Arora, S., Goyal, C., Gera, V. K., & Yadav, H. (2023). AI-based Integrated Approach for the Development of
Intelligent Document Management System (IDMS). Procedia Computer Science, 230, 725-736. [CrossRef]
2. Parikh, A. (2023). Information Extraction from Unstructured data using Augmented-AI and Computer Vision. arXiv preprint
arXiv:2312.09880. [CrossRef]

3. Zhu, M., & Cole, J. M. (2022). PDFDataExtractor: A tool for reading scientific text and interpreting metadata from the typeset
literature in the portable document format. Journal of Chemical Information and Modeling, 62(7), 1633-1643. [CrossRef]
4. Pudasaini, S., Shakya, S., Lamichhane, S., Adhikari, S., Tamang, A., & Adhikari, S. (2022). Application of NLP for information
extraction from unstructured documents. In Expert Clouds and Applications: Proceedings of ICOECA 2021 (pp. 695-704).
Springer Singapore. [CrossRef]

5. Sage, C., Douzon, T., Aussem, A., Eglin, V., Elghazel, H., Duffner, S., ... & Espinas, J. (2021). Data-efficient information extraction
from documents with pre-trained language models. Springer International Publishing. [CrossRef]

6. Adnan, K., & Akbar, R. (2019). An analytical study of information extraction from unstructured and multidimensional big data.
Journal of Big Data, 6(1), 1-38. [CrossRef]

@SIH Idea submission- Template 6

You might also like