Job description
JD GCP Data Engineering
Between 2 to 4 years' experience in GCP Data Engineering.
Design, develop, and maintain data pipelines using GCP services.
Strong data engineering experience using Python, PySpark programming languages or Spark on
Google Cloud.
Should have worked on handling big data.
Strong communication skills.
experience in Agile methodologies ETL, ELT skills, Data movement skills, Data processing skills.
Certification on Professional Google Cloud Data engineer will be an added advantage.
Proven analytical skills and Problem-solving attitude.
Ability to effectively function in a cross-team environment.
Primary Skill
GCP, data engineering
Python/PySpark, SQL Spark on GCP, Programming experience
GCS (Cloud Storage), Composer (Airflow) and BigQuery experience
experience building data pipelines using above skills.
Pipeline development experience using Dataflow or Dataproc (Apache Beam etc)
Experience in GCP services or databases like Cloud SQL, Datastore, Bigtable, Spanner, Cloud
Run, Cloud Functions etc
Proven analytical skills and Problem-solving attitude.
Excellent Communication Skills
Job description
Responsibilities:
• Strong development skills in Python.
• Writing effective and scalable Python codes.
• Strong experience in processing data and drawing insights from large data sets
• Good familiarity with one or more libraries: pandas, NumPy, SciPy etc.
• In-depth knowledge of spaCy and similar NLP libraries like NLTK, textacy etc.
• Experience with Python development environments, including, but not limited to Jupyter,
Google Colab notebooks, Matplotlib, Plotly, and geoplotlib.
• Advanced working SQL knowledge and experience working with relational databases, query authoring
(SQL) as well as working familiarity with a variety of databases.
• Experience performing root cause analysis on internal and external data and processes to answer specific
business questions and identify opportunities for improvement.
• Strong analytic skills related to working with unstructured datasets.
Good to have some exposure to
• Experience with setting up and maintaining Data warehouse (Google BigQuery, Redshift, Snowflake)
and Data Lakes (GCS, AWS S3 etc.) for an organization
• Experience with relational SQL and NoSQL databases, including Postgres and Cassandra / MongoDB.
• Experience with data pipeline and workflow management tools: Airflow, Dataflow, Dataproc, etc.
• Exposure to any Business Intelligence (BI) tools like Tableau, Dundas, Power BI etc.
• Agile software development methodologies.
• Working in multi-functional, multi-location teams
4+ years experience developing Big Data & Analytic solutions
Experience building data lake solutions leveraging Google Data Products (e.g. Dataproc, AI
Building Blocks, Looker, Cloud Data Fusion, Dataprep, etc.), Hive, Spark
Experience with relational SQL/No SQL
Experience with Spark (Scala/Python/Java) and Kafka
Work experience with using Databricks (Data Engineering and Delta Lake components)
Experience with source control tools such as GitHub and related dev process
Experience with workflow scheduling tools such as Airflow
In-depth knowledge of any scalable cloud vendor(GCP preferred)
Has a passion for data solutions
Strong understanding of data structures and algorithms
Strong understanding of solution and technical design
Has a strong problem solving and analytical mindset
Experience working with Agile Teams.
Able to influence and communicate effectively, both verbally and written, with team members and
business stakeholders
Able to quickly pick up new programming languages, technologies, and frameworks
Bachelors Degree in computer science
4+ Years of Experience in Data Engineering and building and maintaining large-scale data
pipelines.
Experience with designing and implementing a large-scale Data-Lake on Cloud Infrastructure
Strong technical expertise in Python and SQL and Shell Scripting.
Extremely well-versed in Google Compute Platform including BigQuery, Cloud Storage, Cloud
Composer, DataProc, Dataflow, Pub/Sub.
Experience with Big Data Tools such as Hadoop and Apache Spark (Pyspark)
Experience Developing DAGs in Apache Airflow 1.10.x or 2. x
Good Problem-Solving Skills
Detail Oriented
Strong Analytical skills working with a large store of Databases and Tables
Ability to work with geographically diverse teams.
Good to Have:
Certification in GCP service.
Experience with Kubernetes.
Experience with Docker
Experience with CircleCI for Deployment
Experience with Great Expectations.
Responsibilities:
Build Data and ETL pipelines in GCP.
Support migration of data to the cloud using Big Data Technologies like Spark, Hive, Talend, Java
Interact with customers on daily basis to ensure smooth engagement.
Responsible for timely and quality deliveries.
Fulfill organization responsibilities – Sharing knowledge and experience with the other groups in
the organization, and conducting various technical training sessions.
Job description
Greeting from HCL!!
We are looking for GCP-Data Engineer for Chennai Location
Exp:4+ yrs
Skills:
Hands-on experience in Google Cloud (BigQuery)
Strong SQL programming knowledge and hands-on experience on real-time projects.
Good data analysis and problem solving skills
Good communication skills and a quick learner
If you are interested please share your resume to jyothiveerabh.akula@hcl.com
Roles and Responsibilities
In this Role, the GCP Data Engineer is responsible to:
Design, develop, test, and implement technical solutions using GCP data technologies/tools.
Develop data solutions in distributed microservices and full stack systems.
Utilize programming languages like Python, Java and GCP technologies like BigQuery, Data
Proc, Data Flow, Cloud SQL, Cloud Functions, Cloud Run, Cloud Composer, Pub/Sub, APIs
Lead performance engineering and ensure the systems are scalable.
Desired Candidate Profile Technology & Engineering Expertise
Overall 5+ years of overall experience in implementing data solutions using Cloud/On-prem
technologies.
At least 3+ years of experience in data pipeline development using GCP cloud technologies.
Proficient in data ingestion, store and processing using GCP technologies like BigQuery, Data
Proc, Data Flow, Cloud SQL, Cloud Functions, Cloud Run, Cloud Composer, Pub/Sub and
APIs
Proficient in pipeline development using ELT and ETL approaches.
Experience in Microservices implementations on GCP
Knowledge in Master data management
Knowledge in Data Catalog, Data Governance, Data Security
Excellent SQL skills
Must be google certified.
Experience with different development methodologies (RUP | Scrum | XP) Soft skills
Soft skills
Desired Candidate Profile
We are looking for a GCP Data Engineer for Full time/Part time.
Needs to be Very strong in Python,GCP,Data Flow, BigQuery,Data Processing,ETL,API
Candidates with BE/BTech/MCA / MSC having the required experience .Tech stack: BigQuery , any ETL
tool (Informatica, Talend, DataStage), Dataflow, Dataproc
• 3-5 years Experience in Data warehouse and Data lake implementation
• 1-2 years of experience in Google Cloud Platform (especially Big Query).
• 1-2 years of working experience in converting ETL jobs( in Informatica/Talend,/DataStage) into
Dataflow or Dataproc and migrated in CI\CD pipeline
• Design, develop and deliver data integration/data extraction solutions using IBM DataStage or other ETL
tools and Data Warehouse platforms like Teradata, BigQuery.
• Proficiency in Linux/Unix shell scripting and SQL.
• Knowledge of data Modelling, database design, and the data warehousing ecosystem.
• Ability to troubleshoot and solve complex technical problems.
• Excellent analytical and problem-solving skills.
• Knowledge of working in Agile environments.
Essential Skills
3+ years’ experience in developing large scale data pipelines in a at least one Cloud Services-
Azure, AWS, GCP
Expertise in one or more (data base + ETL /pipeline + Visualization Reporting) of the following
skills, Azure: Synapse, ADF, HD Insights,AWS: Redshift, Glue, EMR
Highly Proficient in any or more of market leading ETL tools like Informatica, DataStage, SSIS,
Talend, etc.,
Fundamental knowledge in Data warehouse/Data Mart architecture and modelling
Define and develop data ingest, validation, and transform pipelines.
Fundamental knowledge of distributed data processing and storage
Fundamental knowledge of working with structured, unstructured, and semi structured data
For cloud data engineer, experience with ETL/ELT patterns, preferably using Azure Data
Factory and Databricks jobs
Nice to have on premise platform understanding covering one or
more of the below skills Teradata, Cloudera, Netezza, Informatica, DataStage, SSIS, BODS, SAS,
Business Objects, Cognos, MicroStrategy, WebFocus, Crystal
Essential Qualification
BE/Btech in Computer Science, Engineering or relevant field
Responsibilities:
1. Data Migration: Collaborate with cross-functional teams to migrate data from various sources to
GCP. Develop and implement efficient data migration strategies, ensuring data integrity and security
throughout the process.
2. Data Pipeline Development: Design, develop, and maintain robust data pipelines that extract,
transform, and load (ETL) data from different sources into GCP. Implement data quality checks and
ensure scalability, reliability, and performance of the pipelines.
3. Data Management: Build and maintain data models and schemas in GCP, ensuring optimal storage,
organization, and accessibility of data. Collaborate with data scientists and analysts to understand
their data requirements and provide solutions to meet their needs.
4. GCP Data Service Expertise: Utilize your deep understanding of GCP data services, including
BigQuery, Big Data, Data Proc, and other relevant services, to architect and implement efficient and
scalable data solutions. Stay up to date with the latest advancements in GCP data services and
recommend innovative approaches to leverage them.
5. Performance Optimization: Identify and resolve performance bottlenecks within the data pipelines
and GCP data services. Optimize queries, job configurations, and data processing techniques to
improve overall system efficiency.
6. Data Governance and Security: Implement data governance policies, access controls, and data
security measures to ensure compliance with regulatory requirements and protect sensitive data.
Monitor and troubleshoot data-related issues, ensuring high availability and reliability of data
systems.
7. Documentation and Collaboration: Create comprehensive technical documentation, including data
flow diagrams, system architecture, and standard operating procedures. Collaborate with cross-
functional teams, including data scientists, analysts, and software engineers, to understand their
requirements and provide technical expertise