Poulami Paul
Madison, WI | LinkedIn | +1(608)440-4455 | ppaul5@wisc.edu
EDUCATION
University of Wisconsin Madison | GPA: 4.0/4.0 Madison, WI
Masters in Computer Science September 2022 – May 2024(Expected)
Courses: Big Data Systems, Database Management Systems, Data Integration for Data Science, Operating Systems,
Topics in Database Management Systems, Computer Vision
Teaching Assistant: Database Management Systems, Data Science in Python
Vellore Institute of Technology | GPA: 9.42/10.0 Vellore, India
Bachelor of Technology in ECE with Specialization in IoT and Sensors June 2017 – June 2021
Courses: Data Structures and Algorithms, MLne Learning, IoT in Automotive Systems
Achievements: Merit Scholarship winner, Consistently ranked top-3 department-wide
WORK EXPERIENCE
Lenovo Group Limited Morrisville, NC
Cloud Software Intern (Data Engineering Section) June 2023 – August 2023
• Migrated the existing codebase from Pandas to PySpark, achieving a 50% performance improvement and
optimizing for large-scale data processing on AWS S3.
• Built robust data pipelines that aided in proactive resolution of critical system errors, including issues like BSOD
• Enabled efficient data transformation tailored for predictive machine learning models.
• Leveraged cloud platforms to ensure scalable storage and reliable data management.
• Utilized Jenkins for CI/CD, enhancing code quality and optimizing development time consumption
Bank of New York Mellon Pune, India
Data Engineer August 2022 – May 2023
• Constructed and maintained large-scale data pipelines, delivering regulatory reporting data using Spark, Airflow
• Increased efficiency and reduced data processing time by 40% through parallel data processing techniques
• Developed Basel Processing pipelines, enabling monthly and daily data delivery
• Designed business-oriented UI dashboards using Dash Plotly and Tableau
Software Engineer Coop January 2021 – June 2021
• Orchestrated the transition of database and ETL operations towards a distributed data environment
• Executed Phase 2 of development to create system's own reference data and reduce dependency on legacy data
• Automated query creation for data pipelines and ensured data quality through notebook validation
Pixel Solutionz Kolkata, India
Artificial Intelligence Intern April 2020 – May 2020
• Developed Flask API’s for Optical Character Recognition of Shipping Container IDs, incorporated into a
company application to enhance user accessibility using keras, Google Cloud Platform, Postman
PROJECTS
Entity Matching Research: Zingg vs. Sparkly Performance Analysis Advisor: Prof. AnHai Doan
• Conducted a performance-based study of Zingg and Sparkly's entity matching capabilities (Lucene text search),
assessing output size, recall, and precision on various datasets
Benchmarking GFS/HDFS with modern storage systems
● Investigated the underlying storage systems, analyzing the impact of main memory availability and file size
distribution on the performance of HDFS implementation of Google File System
Skin Cancer Classifier
● Developed a Skin Cancer Classification system using MobileNet CNN and VGG 16 models and incorporated it in
a Web Application using TensorflowJS for faster predictions and eliminating client server architecture
TECHNICAL SKILLS
Python, Java, Apache Spark, Hadoop, Plotly, Scala, Tensorflow, SQL/NoSQL, GraphQL, GIT, EMR, Flink, Parquet,
BigQuery, Scikit-Learn, Regression Models, OLAP Datastores, Kafka, NumPy, A/B Testing, Real and Batch Data
Data Warehouse, ElasticSearch, Selenium, Distributed Systems, Data Privacy, Data Analysis, GCP, Data Infrastructure, Data Visualizations, Data Compliance, Map Reduce Operations, Flask, Analytics, Docker, Agile, Data Governance, SLA, Jira, Confluence,
Cassandra, Datalake, Data Ingestion, Data Modelling, Machine Learning, Javascript, HTML, CSS, MongoDB, Redis, Memcache, React, Angular, Backend, Frontend, Computer Architecture, MySQL, MQ, NLP, C/C++,AI, PostgresSQL, Web Services, Apache
Airflow, Linux, Cloud Networking, TCP/IP, SOAP/REST API, Maven/Gradle, Kubernetes, Google BigQuery, Jupyter Notebooks, Teradata, Hive, collaboration, communication, PyTorch, ML