Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.5.19
What's Changed 🚀
We have a pretty crazy release this time around. Some especially notable features include:
- Interactive DataFrames in Jupyter Notebooks, with special support for some multimodal types
- An async API for LLM text generation, particularly with OpenAI
- A new
.into_batchesDataFrame API, the modern alternative to.into_partitions - Adding support for
.offset/OFFSEToperator across the engine. Thanks @plotor for the great work! - Various Flotilla performance and reliability improvements
- Various casting improvements
✨ Features
- feat: Async open ai llm generate @colin-ho (#4879)
- feat: Add offset syntax support to SQL @plotor (#4707)
- feat: adds support for SQL GROUP BY column position @rchowell (#4955)
- feat: better dtype type inference @universalmind303 (#4973)
- feat: Casting from Python into struct or list types @srilman (#4957)
- feat: support creating partitioned tables in Iceberg via the Catalog interface. @redpheonixx (#4951)
- feat: implement into_batches operator on flotilla distrubted engine @ohbh (#4958)
- feat: literal variants for (pretty much) all types @kevinzwang (#4947)
- feat: Add offset support to Flotilla Engine @plotor (#4918)
- feat: implement into_batches on the swordfish native daft runner @ohbh (#4935)
- feat: Flotilla broadcast join @colin-ho (#4867)
🐛 Bug Fixes
- fix: Always just use actor for flotilla scheduler @colin-ho (#4978)
- fix: Add handle for swordfish runtime stats manager @colin-ho (#4970)
- fix: Dudep lance read required columns @xloya (#4967)
- fix: Don't use wildcard for logical plan match in pushdown rules @colin-ho (#4945)
- fix: Coerce arrow schema for parquet decoding @colin-ho (#4948)
- fix: use associate type for swordfish into_batches operator state @ohbh (#4956)
- fix: raise error on invalid cross join parameters @rchowell (#4952)
- fix: interactive html fixes @colin-ho (#4943)
♻️ Refactor
📖 Documentation
- docs: update links in document processing example @ccmao1130 (#4946)
- docs: improve daft.func documentation and type inference @universalmind303 (#4942)
- docs: fix link for pandas @universalmind303 (#4941)
👷 CI
🔧 Maintenance
👋 New Contributors
- @redpheonixx made their first contribution in #4951
Full Changelog: v0.5.18...v0.5.19
v0.5.18
What's Changed 🚀
✨ Features
- feat: adds column set visitor and use in pushdowns @rchowell (#4929)
- feat: async @daft.func @universalmind303 (#4908)
- feat: Add offset operator support to DataFrame for Ray Runner @plotor (#4706)
- feat: model resource plumbing for inference expressions @rchowell (#4902)
- feat: Flotilla eager limit @colin-ho (#4887)
- feat: implement random repartition in flotilla distributed engine @ohbh (#4893)
- feat: Add support for specifying hash algorithm used in expression.hash() func. @Zyiqin-Miranda (#4640)
🐛 Bug Fixes
- fix: Batch RuntimeSubscriber updates for all nodes @srilman (#4932)
- fix: Column Ordering in UDF & Project Optimizations @srilman (#4923)
- fix: Refactor Progress Bar to be a RuntimeStatSubscriber @srilman (#4837)
🚀 Performance
- perf: scalar udfs use same optimizations as legacy udfs @universalmind303 (#4931)
- perf: improve @daft.func performance by ~27% @universalmind303 (#4920)
♻️ Refactor
- refactor: move literal to daft-core @kevinzwang (#4928)
📖 Documentation
- docs: Generate files for llms.txt @desmondcheongzx (#4937)
- docs: Pull colab examples into pages @desmondcheongzx (#4936)
- docs: Add embeddings generation example @desmondcheongzx (#4934)
- docs: improve navigation for functions doc pages @kevinzwang (#4924)
- docs: getdaft.io → daft.ai @ccmao1130 (#4926)
- docs: add video to examples @ccmao1130 (#4915)
- docs: adds coalesce to docs @rchowell (#4909)
- docs: fix doctest formatting errors @rchowell (#4911)
- docs: add docstrings to I/O & DataFrame methods (issue #4124) @TheOphige (#4854)
🔧 Maintenance
- chore: config isort known_third_party to fix import formatting errors @Jay-ju (#4840)
- chore: add warning on repartition in native runner @kevinzwang (#4910)
Full Changelog: v0.5.17...v0.5.18
v0.5.17
What's Changed 🚀
📖 Documentation
- docs: add examples to docs @ccmao1130 (#4903)
- docs: Fixing Link to Resource Requests from Managing Memory Usage Page @madvart (#4901)
🔧 Maintenance
- chore: change series literal to list @kevinzwang (#4896)
Full Changelog: v0.5.16...v0.5.17
v0.5.16
What's Changed 🚀
✨ Features
- feat: Interactive jupyter display @colin-ho (#4835)
- feat: supports passing a projection kwargs in select @rchowell (#4884)
- feat: Add offset operator support to DataFrame for Native Runner @plotor (#4582)
🐛 Bug Fixes
- fix: Return from streaming sink if channel closed @colin-ho (#4885)
- fix: No empty turbopuffer write @colin-ho (#4897)
- fix: Use planning config instead of env variables during filter pushdowns @desmondcheongzx (#4888)
- fix: azure storage resource url @kevinzwang (#4892)
🚀 Performance
- perf: Add parallel execution for Python UDFs with batched GIL acquisitions @universalmind303 (#4886)
📖 Documentation
- docs: Remove redirect for installation page @desmondcheongzx (#4895)
Full Changelog: v0.5.15...v0.5.16
v0.5.15
What's Changed 🚀
✨ Features
- feat: add openai provider in llm_generate function @huleilei (#4809)
- feat: Use
shuffle_aggregation_default_partitionsin flotilla aggregate @colin-ho (#4869) - feat: abstract the interface of scan pushdown @Jay-ju (#4772)
- feat: Add
get_or_infer_runner_typeto support getting runner type from context @plotor (#4810) - feat: support glob multiple path @stayrascal (#4811)
🐛 Bug Fixes
- fix(runtime): Reduce thread name length for compute and I/O threadpools @rohitkulshreshtha (#4877)
- fix: import ParamSpec from typing_extensions only for python < 3.10 @kevinzwang (#4878)
- fix: Allow file:/ schemes in read_iceberg @colin-ho (#4843)
📖 Documentation
- docs: Restructure docs to target users @desmondcheongzx (#4875)
🔧 Maintenance
Full Changelog: v0.5.14...v0.5.15
v0.5.14
What's Changed 🚀
✨ Features
🐛 Bug Fixes
- fix: Fix list sorting when operating on a morsel slice @desmondcheongzx (#4863)
📖 Documentation
- docs: add more docs for UDF concurrency @kevinzwang (#4856)
🔧 Maintenance
- chore: new code path for scalar UDF @kevinzwang (#4814)
Full Changelog: v0.5.13...v0.5.14
v0.5.13
What's Changed 🚀
✨ Features
- feat: Support multiple namespaces in the turbopuffer writer @desmondcheongzx (#4845)
- feat: add support for reading json arrays @universalmind303 (#4844)
- feat: Use daft-decoding for hive value deserialization @colin-ho (#4836)
- feat: use min size column as count pushdown column @Jay-ju (#4681)
- feat: optimize read parquet metadata logic via suffix-range get @stayrascal (#4775)
- feat(turbopuffer): Add client and write kwargs @desmondcheongzx (#4825)
🐛 Bug Fixes
- fix: add cachebusting parameter to huggingface urls @universalmind303 (#4853)
- fix: Handle unserializable exceptions from data sinks @desmondcheongzx (#4858)
- fix: concat chunked array safety @stayrascal (#4852)
- fix: Allow anonymous mode to be used for S3 uploads @desmondcheongzx (#4855)
- fix: read/write embeddings for Parquet and Lance @malcolmgreaves (#4834)
- fix: Fix
File writer must be created before bytes_written can be calledbug in native parquet writer @colin-ho (#4817) - fix: read/write embeddings for Parquet and Lance @malcolmgreaves (#4812)
- fix: expand ~ to home directory in deltatable read and write @r3stl355 (#4831)
📖 Documentation
- docs: categorize functions and make each function its own page @kevinzwang (#4838)
👷 CI
- ci: update daft domain @ccmao1130 (#4678)
- ci: fail on timeout @colin-ho (#4833)
🔧 Maintenance
⏪ Reverts
- revert: "fix: read/write embeddings for Parquet and Lance" @malcolmgreaves (#4832)
Full Changelog: v0.5.12...v0.5.13
v0.5.12
What's Changed 🚀
✨ Features
- feat: struct Expression.unnest() @kevinzwang (#4815)
🐛 Bug Fixes
- fix: add custom robots.txt @ccmao1130 (#4823)
- fix: update a new expression example in the contributing guide to work @r3stl355 (#4818)
- fix: LSP handling for
daft.contextanddaft.io@srilman (#4805)
🚀 Performance
📖 Documentation
🔧 Maintenance
- chore: add vscode debug example with env @Jay-ju (#4819)
- chore(deps-dev): bump boto3-stubs[essential,glue,s3,s3tables] from 1.36.20 to 1.38.46 @dependabot[bot] (#4661)
- chore(deps-dev): bump pyarrow from 19.0.1 to 20.0.0 @dependabot[bot] (#4647)
- chore: add DAFT_LOG env indicates log level @Jay-ju (#4807)
- chore: Support specify extra arguments when running
make test@plotor (#4801) - chore: remove dbg from tests @rchowell (#4804)
⬆️ Dependencies
- chore(deps-dev): bump boto3-stubs[essential,glue,s3,s3tables] from 1.36.20 to 1.38.46 @dependabot[bot] (#4661)
- chore(deps-dev): bump pyarrow from 19.0.1 to 20.0.0 @dependabot[bot] (#4647)
Full Changelog: v0.5.11...v0.5.12
v0.5.11
What's Changed 🚀
✨ Features
- feat: Spread actor udf tasks @colin-ho (#4802)
- feat: Allow flotilla to autoscale with existing workers @colin-ho (#4765)
- feat: Add a DataFrame.write_turbopuffer method @desmondcheongzx (#4798)
- feat(flotilla): Non-actor UDFs with gpu requirements @colin-ho (#4731)
- feat: Add experimental turbopuffer sink @desmondcheongzx (#4779)
- feat: support suffix range get @stayrascal (#4688)
🐛 Bug Fixes
- fix: Propagate worker failures to scheduler @colin-ho (#4800)
- fix: mark all pending nodes completed when plan completes @joyceerhl (#4799)
- fix(flotilla): Add logical node ids to task context @colin-ho (#4755)
- fix: Improve UDF Error Messaging @srilman (#4788)
- fix: use pc.field in turbopuffer sink @colin-ho (#4790)
- fix(repr): avoid crashing notebook UI when visualizing large strings @joyceerhl (#4789)
- fix: use
/for objectstorage instead of os.path.join when reading deltalake @universalmind303 (#4785) - fix: Filter out null ids for turbopuffer sink @colin-ho (#4784)
- fix: Don't recompute df for count rows if results exist @colin-ho (#4778)
- fix: Clarify
json.querydeprecation @colin-ho (#4774) - fix: document processing notebook and explode expression docs @kevinzwang (#4758)
📖 Documentation
- docs: fix docstrings for map and shift_right @ccmao1130 (#4797)
- docs: add docstrings to Schema methods (issue #4770) @TheOphige (#4792)
- docs: update slack link in runllm widget @ccmao1130 (#4769)
👷 CI
- ci: enable building wheels without LTO @joyceerhl (#4787)
🔧 Maintenance
- chore: Refactor flotilla pipeline node @colin-ho (#4793)
- chore: add docstring template to contributing @ccmao1130 (#4794)
- chore: add note about test cluster usage @colin-ho (#4795)
- chore: add bucket into endpoint for virtual host style @stayrascal (#4740)
- chore: add doc format and install dependency @Jay-ju (#4741)
Full Changelog: v0.5.10...v0.5.11
v0.5.10
What's Changed 🚀
✨ Features
- feat(flotilla): Sort & Top-N @srilman (#4734)
- feat(flotilla): Support Gather Variants of Aggs @srilman (#4701)
- feat: explode expression @kevinzwang (#4751)
- feat(flotilla): Added concat @rohitkulshreshtha (#4748)
- feat: support for deltalake v1.0 @kevinzwang (#4718)
- feat(flotilla): Hash Join @srilman (#4700)
- feat(flotilla): Add monotonically_increasing_id. @rohitkulshreshtha (#4717)
🐛 Bug Fixes
- fix: Ignore partition refs if they are empty @colin-ho (#4766)
- fix: Change dashboard warning to info @colin-ho (#4767)
- fix(flotilla): Add multiple notify tokens to task @colin-ho (#4757)
- fix: Put to_partition_tasks call in try block @colin-ho (#4762)
- fix: fix explode expr doc test @colin-ho (#4764)
- fix: Swap arg order for list_fill @colin-ho (#4761)
- fix: Accept embedding types in
cosine_distance@colin-ho (#4753) - fix: populate query graph nodes even if they lack task progress @joyceerhl (#4752)
🚀 Performance
📖 Documentation
- docs: Add document processing notebook tutorial @malcolmgreaves (#4750)
- docs: add some s3config on tos @Jay-ju (#4742)
Full Changelog: v0.5.9...v0.5.10