Spark Release 0.7.2
Spark 0.7.2 is a maintenance release that contains multiple bug fixes and improvements. You can download it as a source package (4 MB tar.gz) or get prebuilt packages for Hadoop 1 / CDH3 or CDH 4 (61 MB tar.gz).
We recommend that all users update to this maintenance release.
The fixes and improvements in this version include:
- Scala version updated to 2.9.3.
- Several improvements to Bagel, including performance fixes and a configurable storage level.
- New API methods: subtractByKey, foldByKey, mapWith, filterWith, foreachPartition, and others.
- A new metrics reporting interface, SparkListener, to collect information about each computation stage: task lengths, bytes shuffled, etc.
- Several new examples using the Java API, including K-means and computing pi.
- Support for launching multiple worker instances per host in the standalone mode.
- Various bug fixes across the board.
The following people contributed to this release:
- Jey Kottalam (Maven build, bug fixes, EC2 scripts, packaging the release)
- Andrew Ash (bug fixes, docs)
- Andrey Kouznetsov (bug fixes)
- Andy Konwinski (docs)
- Charles Reiss (bug fixes)
- Christoph Grothaus (bug fixes)
- Erik van Oosten (bug fixes)
- Giovanni Delussu (bug fixes)
- Hiral Patel (bug fixes)
- Holden Karau (error reporting, EC2 scripts)
- Imran Rashid (metrics reporting system)
- Josh Rosen (EC2 scripts)
- Mark Hamstra (new API methods, tests)
- Mikhail Bautin (build)
- Mosharaf Chowdhury (bug fixes)
- Nick Pentreath (Bagel, examples)
- Patrick Wendell (bug fixes)
- Reynold Xin (bug fixes)
- Stephen Haberman (bug fixes, tests, subtractByKey)
- Kalpit Shah (build, multiple workers per host)
- Mike Potts (run scripts)
- Matei Zaharia (Bagel, bug fixes, build)
We thank everyone who helped with this release, and hope to see more contributions from you in the future!
Spark News Archive