0% found this document useful (0 votes)
65 views2 pages

Prathamesh Ghatole: Experience

Book Recommender System | Python, NumPy, Streamlit ∗ Developed a recommendation engine utilizing collaborative filtering and nearest neighbors algorithms for generating personalized book suggestions. ∗ Integrated Streamlit to create an intuitive web interface, empowering users to input book titles and visualize curated recommendations. ∗ Employed pickle for streamlined serialization and deserialization of machine learning models and book data. Technology Sta

Uploaded by

RAPTER GAMING
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views2 pages

Prathamesh Ghatole: Experience

Book Recommender System | Python, NumPy, Streamlit ∗ Developed a recommendation engine utilizing collaborative filtering and nearest neighbors algorithms for generating personalized book suggestions. ∗ Integrated Streamlit to create an intuitive web interface, empowering users to input book titles and visualize curated recommendations. ∗ Employed pickle for streamlined serialization and deserialization of machine learning models and book data. Technology Sta

Uploaded by

RAPTER GAMING
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Prathamesh Ghatole

Pune, India
ƒ +91 8669881189 # prathamesh.s.ghatole@gmail.com ï LinkedIn § GitHub
Experience
New Engen Inc. Dec 2023 – Jun 2024
Contributor (Data Engineer Intern) Seattle, USA (Remote)
• Working with New Engen Inc to build Data Pipelines and transformations to handle massive amounts of Marketing Data

for clients like Jockey, Google Fiber, Home Depot and more!
• Tech Stack: Python, dbt, Google Cloud Platform, BigQuery, Airflow, Adverity, SalesForce Intelligence (Datorama), Git,

etc.
Google Summer of Code ’23 May 2023 – Nov 2023
Contributor (Data Engineer & Analyst) Barcelona, Spain (Remote)
• Conducted Big Data Analytics using SPARQL, SQL, Pandas (Python) to the explore highest quality and

quantity data sources from Wikidata & Geonames.


• Orchestrated mission-critical infrastructure to automate the extraction, transformation, and loading of

mission-critical data, eliminating 90% manual data feeding processes for areas in MusicBrainz.
• Architected and executed an end-to-end data pipeline leveraging Python, PostgreSQL, and Shell Scripting to

seamlessly synchronize metadata for 500k+ entities between Wikidata and MusicBrainz.
• Established CI/CD pipelines, deployed services through Docker, devised comprehensive test suites with Pytest, and

meticulously managed documentation.


Google Summer of Code ’22 Jun 2022 – Nov 2022
Contributor (Data Engineer & Analyst) Barcelona, Spain (Remote)
• Executed various Data Engineering functions, employing high-performance Python and SQL scripts with

PostgreSQL, Pandas, Apache Arrow, and Numba to optimize & transform the Music Listening Histories Dataset.
• Led the overhaul of 611.39 GB (27 billion rows) of music streaming data, originating from 583k+ last.fm users.

• Significantly enhanced Data-Lake efficiency by reducing storage size by 53% and improving read/write speeds

by 9%.
• Streamlined Data Analytics and Visualization, created Dashboards, conducted Benchmarking, and handled

Report Generation for the ”ListenBrainz” project in collaboration with various teams at the MetaBrainz Foundation.
Technical Skills
Domain: Data Analytics, Data Engineering, Web Scraping, Machine Learning, Python Development.
Languages and Tools: Python, SQL, MS Excel, Linux, Git, Docker, Shell Scripting.
Frameworks: PostgreSQL, BigQuery, GitHub, Flask, BeautifulSoup, Pandas, Numpy, scikit-learn, Apache Arrow, Numba,
Multiprocessing, Mechanize.
Dashboarding & Visualization: Tableau, Matplotlib, Plotly, Seaborn, Hugo, HTML, CSS.
Cloud: Microsoft Azure - App Service, Linux VM Compute, ML Workspace.
Course Work: Business Data Management, Data Analytics, Statistics & Mathematics for Data Science, DBMS, Data
Warehousing, Big Data Computing.

Achievements
∗ Amongst 967 globally selected candidates out of 43,765 applicants in GSoC 2023.
∗ 2x Speaker at IIT Madras Student Placement Council about Open Source and GSoC, engaging and
educating 3k+ students about OSS.
∗ Elected as President of the Student’s Association of Artificial Intelligence, GHRCEM Pune.
∗ Elected as Vice-President of the IEEE Student’s Chapter, GHRCEM Pune.
∗ Wrote a GSoC guide blog with 35k+ Linkedin impressions and 3.7k+ views.
∗ Organized multiple college events with 200+ attendees each, achieving an average event rating of 4.59/5.00.
∗ Represented the “Hadar Cluster” (South East Asia) at IEEE Asia Pacific’s CLAP (2021) program.
Education
Indian Institute of Technology, Madras July 2021 – July 2025
BS. Data Science and Applications Chennai, India (Remote)
GH Raisoni College of Engineering and Management, Pune May 2020 – May 2024
BTech. Artificial Intelligence Pune, India
Projects
Lastfm-scraper | Python, Flask, Pandas, Git, Microsoft Azure, REST APIs, GitHub Actions, HTML, CSS.
∗ Lastfm-scraper is a simple platform to scrape, clean, analyze, and download your music listening history for
analytics and machine learning applications from last.fm, a music service to track and organize user music listening
history across multiple devices and streaming services.
∗ Implemented using Python, this project aims to scrape, process and deliver music streaming user data into
accessible formats like CSV and JSON using Pandas by scraping the last.fm API. This project is hosted on Azure app
service through a CI/CD pipeline using GitHub Actions.
∗ Domain: Data Wrangling, Data Processing, Cloud, DevOps
∗ Codebase | Demo
Document Topic Modelling | Python, NLTK, Scikit-learn, Gensim.
∗ A simple interactive command line utility to classify text into pre-defined topics using Machine Learning (NLP).
This project is based on the LDA (Latent Dirichlet Allocation) model, and built using Python, Scikit-learn, and
Gensim.
∗ Domain: Machine Learning, NLP.
∗ Codebase
Monthly Budget Tableau Dashboard | Tableau, MS Excel, Python.
∗ A simple and elegant tableau dashboard to visualize monthly financial spending habits. For this project, I fetched
data from my personal spreadsheet-based budget tracking system hosted on notion.so.
∗ Domain: Data Visualization, Dashboarding.
∗ Demo

Leadership / Extracurricular
Student’s Association of Artificial Intelligence, GHRCEM, Pune Nov 2021 – Mar 2022
President
∗ Operated Human Resources, Planning, and Execution for all events at the Department of AI, GHRCEM, Pune
∗ Hosted events and workshops like “Tech Talks 1.0: Biostatistics w/ Mr. Shariq Mohammed, Boston University”, and
“YOU 2.0: The complete personality upliftment program” with 200+ attendees and 4.59/5.00 average event
ratings.
IEEE Student’s Chapter, GHRCEM, Pune Mar 2022 – Feb 2023
Co-Chair
∗ Hosted several flagship events at IEEE Pune Section - like IEEE CODE-STROM [2022], EAC Funded Cloud and
Data Engineering Workshop [2022]
∗ Represented the “Hadar Cluster” (South East Asia) at IEEE Asia Pacific’s CLAP [2021] program.
Ek Bharat Shrestha Bharat Club, GHRCEM, Pune Sep 2021 – Nov 2021
Project Lead and Speaker
∗ Designed and presented 5+ inter-state presentations to Aryan Institute of Technology, Bhubaneshwar, Odisha; while
Representing GH Raisoni College of Engineering and Management Pune, Maharashtra.
Music Club, GHRCEM, Pune Aug 2021 – Nov 2021
Vice President
∗ Operated Human Resources, Planning, and execution for 6+ introductory and jamming sessions.

You might also like