Skip to content
View Smars-Bin-Hu's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report Smars-Bin-Hu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Smars-Bin-Hu/README.md

👋 Hi there

I'm Bin (Smars) Hu, 27 years old, and I was born and raised in China and relocated to Ontario, Canada🇨🇦. A developer also loves hitting the gym, hiking, videomaking, hiphop dancing, and minimalism

God help those who help themselves

自救者,人恒救之

GIF

🔨 Data Engineer | 3 Years of Experience in big data development

💼 Manulife | Canadian Global Insurance Cooperation | HKCAS IT Delivery Team | 1 Year 7 Months

💼 G7 | Leading IoT & Big Data Company in China | Data Product Team, Infrastructure R&D | 1 Year

🎓 Master of Science in Big Data Analytics @ Trent University, Ontario, Canada 🇨🇦

📝 My Bio https://www.smars.online/ (resume, project and tech blogs)

📞 Reach me via smarshu@trentu.ca

🚀 Certificates

associate-badge-de 0_atCqIOGA5HoUwVl0

🔨 Projects

Simulated an enterprise-level on-premise self-managed big data distributed cluster using Docker containers. Integrated components include Hadoop, Zookeeper, Spark, Hive, MySQL, Airflow, Prometheus, ClickHouse, and Power BI. Developed a data warehouse for an e-commerce backend based on dimensional modeling theory and built a BI analytics system for reporting and data analysis.

HTML tutorial

Reproduced a modern enterprise-grade Azure cloud data engineering architecture widely adopted in North America. Leveraged technologies such as Databricks, PySpark, ADLS Gen2, Unity Catalog, Delta Lake, Power BI, and Azure Data Factory (ADF) to develop cloud-native data pipelines on Azure and perform exploratory data analysis (EDA).

HTML tutorial

💻 Tech Stack

☘️ Languages

Python SQL Java Scala R

☘️ Distributed Computation & Data Warehouse

Apache Hadoop Apache Spark Apache Hive Delta Lake Apache ZooKeeper

☘️ Streaming & Lakehouse Architecture

Apache Flink Apache Kafka Strcutured Streaming

☘️ Data Engineering Practices

  • Theory: Dimensional Modelling Lakehouse Architecture Schema Evolution

  • Data Orchestration: Apache Airflow ADF databricks workflow

  • Data Quality: Delta Live Tables

  • Data Governance: Unity Catalog

  • Data Visualization & BI: Power BI Tableau

  • Data RESTful API Services & Integration: Spring Boot Postman

☘️ Databases: OLAP, OLTP & NoSQL

MySQL Oracle (OLTP)

ClickHouse Static Badge (OLAP)

Redis Elastic Search Kibana (NoSQL/Search)

☘️ Cloud-Native Data Engineering, Containerization & Platform Tools

Azure Databricks (Synapse, ADLS Gen2, Databricks, Data Factory)

AWS (S3, Lambda)

☘️ DevOps & Monitoring:

Docker Kubernetes Prometheus Grafana

GitLab CI GitLab CI Bitbucket

☘️ Basic Tools

CHAT GPT Claude 3.7 DeepSeek Cursor (AI)

Linux Git Apache Maven Anaconda (OS, Version Control, API, Dev environment)

Jira Markdown (Project Management, Doc)

🌐 Social

https://www.linkedin.com/in/smars-hu/ https://www.youtube.com/@smars_hu https://www.instagram.com/smars.hu/

Pinned Loading

  1. EComDWH-BatchDataProcessingPlatform EComDWH-BatchDataProcessingPlatform Public

    This project aims to build an enterprise-grade offline data warehouse solution based on e-commerce platform order data.

    Python 164 20

  2. azure-cloud-datapipeline-EDA azure-cloud-datapipeline-EDA Public

    A cloud-native data pipeline and visualization project analyzing Formula 1 racing data using Azure, Databricks, Delta Lake, Tableau, and Python for insightful EDA and interactive dashboards.

    Jupyter Notebook 91 9