linkhut

24 Oct 25

Learnings after 4 years of Data Engineering

https://javisantana.com/2024/11/30/learnings-after-4-years-data-eng.html

This blog post shares the key professional and technical lessons the author learned over four years working in the field of Data Engineering, covering topics like infrastructure, pipelines, and career growth.

by tmfnk 2 months ago

Tags:

DataExpert-io/data-engineer-handbook: This is a repo with links to everything you'd ever want to learn about data engineering

https://github.com/DataExpert-io/data-engineer-handbook

This is a repo with links to everything you’d ever want to learn about data engineering. The Data Engineering Handbook on GitHub is a comprehensive, open-source guide and curriculum intended to help aspiring and current professionals master the skills and tools required to become a Data Engineer.

by tmfnk 2 months ago saved 2 times

Tags:

Mastering PySpark SQL: From Basics to Advanced Querying - NashTech Blog

https://blog.nashtechglobal.com/mastering-pyspark-sql-from-basics-to-advanced-querying/

This blog post offers a tutorial on mastering PySpark SQL, guiding readers through the core concepts from basic data manipulation to advanced querying techniques for large-scale data processing.

by tmfnk 2 months ago

Tags:

02 Nov 15

The New Data Engineering Ecosystem: Trends and Rising Stars

http://insightdataengineering.com/blog/new-ecosystem/

An interactive map of data engineering tools

by wheresalice 10 years ago

Tags:

08 Oct 15

spark-jobserver/spark-jobserver

https://github.com/spark-jobserver/spark-jobserver

“Spark as a Service”: Simple REST interface (including HTTPS) for all aspects of job, context management Support for Spark SQL, Hive, Streaming Contexts/jobs and custom job contexts! See Contexts. LDAP Auth support via Apache Shiro integration Supports sub-second low-latency jobs via long-running job contexts Start and stop job contexts for RDD sharing and low-latency jobs; change resources on restart Kill running jobs via stop context and delete job Separate jar uploading step for faster job startup Asynchronous and synchronous job API. Synchronous API is great for low latency jobs! Preliminary support for Java (see JavaSparkJob) Works with Standalone Spark as well as Mesos and yarn-client Job and jar info is persisted via a pluggable DAO interface Named RDDs to cache and retrieve RDDs by name, improving RDD sharing and reuse among jobs. Supports Scala 2.10 and 2.11

by wheresalice 10 years ago

Tags:

05 Oct 15

Speaker slides: Big Data Conference - Strata + Hadoop World, September 29 - October 1, 2015, New York, NY

http://strataconf.com/big-data-conference-ny-2015/public/schedule/proceedings

by wheresalice 10 years ago

Tags:

24 Oct 14

Machine-Learning Maestro Michael Jordan on the Delusions of Big Data and Other Huge Engineering Efforts - IEEE Spectrum

http://spectrum.ieee.org/robotics/artificial-intelligence/machinelearning-maestro-michael-jordan-on-the-delusions-of-big-data-and-other-huge-engineering-efforts

by lamnatos 11 years ago

Tags:

25 Feb 14

Data Privacy, Machine Learning, and the Destruction of Mysterious Humanity - John Foreman, Data Scientist

http://www.john-foreman.com/1/post/2014/02/data-privacy-machine-learning-and-the-destruction-of-mysterious-humanity.html

by lamnatos 12 years ago