Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.7.1
What's Changed 🚀
✨ Features
- feat: Support pattern filtering for
SHOW TABLES@aaron-ang (#5423) - feat(docs): add copy page as markdown button @ykdojo (#5828)
- feat: support overwrite files by native IO @stayrascal (#5728)
- feat: Add support for Series[start:end] @everySympathy (#5815)
- feat: Implement retry-after mechanism for model apis and udfs @colin-ho (#5769)
🐛 Bug Fixes
- fix: using estimate memory bytes at first for display scan task source @stayrascal (#5845)
- fix: Remove pop_all assertion @colin-ho (#5850)
- fix: Check if deletion vector propagation is supported in deltalake @cckellogg (#5829)
- fix: Set default ImageMode in decode_image to RGB @colin-ho (#5827)
- fix: handle FileNotFoundError in read_huggingface fallback @ykdojo (#5831)
- fix: Add overflow protection to memory estimation @yudduy (#5417)
♻️ Refactor
- refactor(arrow2): get field and dtype roundtrips working @universalmind303 (#5849)
📖 Documentation
- docs: improve docstrings of IO read methods for remote URLs @aaron-ang (#5841)
- docs: fix broken links causing CI failure @ykdojo (#5832)
👷 CI
- ci: enable Windows Rust tests on PRs @ykdojo (#5823)
- ci: skip quickstart notebook in notebook-checker workflow @ykdojo (#5804)
🔧 Maintenance
- chore: remove overwrite_files & write_empty_tabular method @stayrascal (#5838)
- chore: Fix a minor ambiguity in the README docs @plotor (#5830)
Full Changelog: v0.7.0...v0.7.1
v0.7.0
What's Changed 🚀
💥 Breaking Changes
- chore!: remove spark connect @universalmind303 (#5743)
✨ Features
- feat(lance): row-level schema evolution support @Jay-ju (#5749)
- feat: Improve model api typing @colin-ho (#5809)
- feat(deltalake): allow users to ignore deletion vectors on read @kevinzwang (#5758)
- feat: Capture UDF argument names and values in structured logging. @rohitkulshreshtha (#5771)
- feat: support split and merge jsonl/ndjson files @caican00 (#5695)
- feat(tools): add markdown-to-notebook converter for documentation @ykdojo (#5691)
- feat: support label_selector specification in ray actor/task creation @Jay-ju (#5042)
- feat: support native csv writer @stayrascal (#5706)
- feat: Update optional dependencies for daft[postgres] @desmondcheongzx (#5586)
- feat(lance): add distributed compaction for Lance @huleilei (#5699)
- feat: Use JSON Serialization for Plans in Subscribers @srilman (#5709)
- feat: extended hash function to take and hash multiple inputs @rahulkodali (#5692)
- feat: Add Apache Gravitino catalog in catalog module @shaofengshi (#5694)
- feat: Better errors when lazy imports fail @samstokes (#5753)
- feat: Added structured logging for UDF errors. @rohitkulshreshtha (#5688)
- feat: Add statistics info to ScanTask when reading Lance dataset @plotor (#5727)
- feat: dynamic batching per operator @universalmind303 (#5676)
- feat: Allow users to disable the suffix range request @TheR1sing3un (#5188)
- feat: Streaming sample by size @colin-ho (#5663)
- feat: Allow dashboard to show query canceled/failed/dead information when query exited abnormally @VOID001 (#5576)
- feat: Add Google AI provider with prompt @everettVT (#5640)
- feat: Add pow expression @kliwongan (#5237)
- feat: No truncate in
.collectpreview @colin-ho (#5632) - feat: sample api supports precise sampling by size params @caican00 (#5600)
- feat: audio file subtype @universalmind303 (#5602)
- feat(tos): enhance the retry logic to aware response @stayrascal (#5569)
- feat: emit selectivity metric to OTel in swordfish filter op @samstokes (#5584)
- feat: Adding otel logger collector for collecting UDF errors. @rohitkulshreshtha (#5624)
🐛 Bug Fixes
- fix: handle Windows paths and query params in local_path_from_uri @ykdojo (#5819)
- fix: use pytest.importorskip for lance in test_limit_offset @ykdojo (#5818)
- fix: support skip empty json/jsonl files @caican00 (#5660)
- fix: minor doc fix @yuchaoran2011 (#5814)
- fix: CountRows with Limit returns unexpected result when reading Lance dataset @plotor (#5550)
- fix: Check for missing dependencies in OpenAI provider @everettVT (#5747)
- fix: fix btree index invalid issue when reading lance for point lookup @caican00 (#5673)
- fix: Combine deltalake with unity extra @everettVT (#5785)
- fix: enhance unit tests @caican00 (#5787)
- fix: patch CVE-2025-66478 update next dependencies to 16.0.7 @everettVT (#5786)
- fix: use single consolidated progress bar in Jupyter notebooks @ykdojo (#5774)
- fix: CuPy → NumPy needs explicit conversion @Jay-ju (#5680)
- fix: Fix Pydantic cloudpickle serialization in Google Colab @ykdojo (#5705)
- fix: update AI integration tests for new Subscriber interface @ykdojo (#5763)
- fix(optimizer): Prevent limits from being pushed below explodes in non-top-level projections @desmondcheongzx (#5292)
- fix(io): load all splits in read_huggingface fallback path @ykdojo (#5757)
- fix(test): use read_huggingface instead of read_parquet for HF test @ykdojo (#5755)
- fix: add disk cleanup to nightly integration-test-io job @ykdojo (#5711)
- fix: Postgres overwrite table should enable RLS and set up pgvector automatically @desmondcheongzx (#5657)
- fix: make it easier to enable different logging levels @Abyss-lord (#5661)
- fix: Dashboard logo animation. @j3nkii (#5672)
- fix(ci): add disk cleanup to integration-test-ai job @ykdojo (#5733)
- fix: update hypothesis test to use new expression API @ykdojo (#5723)
- fix: Fix type annotation check on Python 3.14 @srilman (#5721)
- fix: add fallback mechanism for HuggingFace datasets without parquet files @ykdojo (#5650)
- fix: Import or skip lance @colin-ho (#5662)
- fix: Add missing trailing slashes to S3-compatible endpoint urls @desmondcheongzx (#5575)
- fix: Add outer try-finally block in executor generator @colin-ho (#5633)
- fix: test_explain @universalmind303 (#5656)
- fix: Unify the naming and type of URI parameter for Lance-related APIs @plotor (#5634)
- fix: Fix blocked and oom issues for scan lance @caican00 (#5592)
- fix: Executing
explainwill panic when ScanTask is empty @plotor (#5582) - fix: Embed text dropping texts @colin-ho (#5641)
- fix: limit(n) return n rows directly @caican00 (#5597)
- fix: Upgrade to deltalake 1.2.1 @colin-ho (#5580)
- fix: add disk cleanup to integration-test-io-credentialed job @ykdojo (#5610)
- fix: add disk cleanup to doctests job @ykdojo (#5609)
- fix: Hashable identifier @colin-ho (#5598)
🚀 Performance
- perf: Lazy udf worker @colin-ho (#5542)
- perf: Use growable for build side @colin-ho (#5613)
- perf: optimize setting lance schema @Jay-ju (#5704)
♻️ Refactor
- refactor(arrow2): values_iter removals for primitive array @universalmind303 (#5802)
- refactor(arrow2): remove arrow2 from daft-functions-binary @universalmind303 (#5799)
- refactor(arrow2): remove deprecated usages from daft-functions-utf8 @universalmind303 (#5800)
- refactor(arrow2): remove deprecated methods from daft-functions-uri crate @universalmind303 (#5798)
- refactor(arrow2): remove arrow2 from
daft-functions-tokenize@universalmind303 (#5797) - refactor(arrow2): rename and deprecate
to_arrowandas_arrowfunctions @universalmind303 (#5796) - refactor: write empty dataframe to parquet/json files via native IO @stayrascal (#5682)
- refactor(arrow-rs): Move temporal conversions from arrow2 to arrow-rs @srilman (#5782)
- refactor(arrow-rs): Remove arrow2 Index generics usages @srilman (#5761)
- refactor(arrow2): rename and deprecate .to_arrow methods @universalmind303 (#5789)
- refactor(arrow): use arrow for ffi instead of arrow2 @universalmind303 (#5775)
- refactor(arrow): remove daft-arrow from daft-sql & daft-context crates @universalmind303 (#5773)
- refactor(arrow-rs): Move all validity
daft_arrow::bitmap::Bitmaps todaft_arrow::buffer::NullBuffer@srilman (#5750) - refactor: abstract MultipartWriter to write data to object store @stayrascal (#5702)
- refactor: Remove Unloaded MicroPartitions @srilman (#5710)
- refactor(arrow-rs): Add
daft-arrowmiddleman crate for Rust & Arrow usage @srilman (#5730)
📖 Documentation
- docs: Update slack invite @everettVT (#5813)
- docs: add logging settings @Jay-ju (#5671)
- docs: fix broken Bodo benchmark link @ykdojo (#5762)
- docs: add voice-analytics-example and update index @everettVT (#5737)
- docs: fix broken Lance documentation link @ykdojo (#5724)
- docs: remove redundant About Daft section from README @ykdojo (#5689)
- docs: remove redundant Table of Contents from README @ykdojo (#5684)
- docs: add Daft Cloud mentions to distributed execution docs @ykdojo (#5686)
- docs: fix quickstart connector links formatting @ykdojo (#5687)
- docs: update README to reflect AI/multimodal positioning @ykdojo (#5677)
- docs: Improve mkdocstrings template for Python examples rendering @ykdojo (#5642)
- docs: changed dev url to a live link to prevent 404 @j3nkii (#5669)
- docs: add Python version requirement to README @ykdojo (#5655)
- docs: update index overview page @ykdojo (#5627)
- docs: remove Python tabs from quickstart @ykdojo (#5626)
- docs: update contributor policy, add contributing section, remove old… @madvart (#5251)
- docs: add tip to find your dylib @universalmind303 (#5625)
- docs: add data persistence section to quickstart @ykdojo (#5607)
- docs: revamp quickstart with Amazon product dataset example @ykdojo (#5585)
✅ Tests
- test: fix flaky OpenAI test by using pattern constraint for hex color format @ykdojo (#5808)
- test(io): remove flaky test_read_huggingface_http_urls test @ykdojo (#5795)
👷 CI
- ci: increase unit-test timeout to 75 minutes for macOS @ykdojo (#5731)
- ci: exclude Kaggle from link checker @ykdojo (#5725)
🔧 Maintenance
- chore(deps): bump the minor group across 1 directory with 45 updates @dependabot[bot] (#5734)
- chore: Provide query end state to
RuntimeStatsManageron query end @colin-ho (#5791) - chore: update uvlock to remove tensorflow @kevinzwang (#5780)
- chore: Codeowners @colin-ho (#5502)
- chore: Don't install pytorch in iceberg test docker compose @colin-ho (#5781)
- chore: remove arrow dep from common-image @universalmind303 (#5772)
- chore: Pin dependencies @colin-ho (#5667)
- chore!: remove spark connect @universalmind303 (#5743)
- chore: remove ir and proto crates @universalmind303 (#5742)
- chore(deps): bump actions/checkout from 5 to 6 in the all group @dependabot[bot] (#5713)
- chore(deps): bump ctor from 0.5.0 to 0.6.1 @dependabot[bot] (#5717)
- chore: Cleanup additional Ray runner artifacts @srilman (#5714)
- chore: Remove the old Ray Runner @srilman (#5375)
- chore: Remove expression namespaces @colin-ho (#5619)
- chore: Remove runner from context @colin-ho (#5628)
- chore: add deprecation to daft.udf @kevinzwang (#5665)
- chore(deps): bump the all group with 13 updates @dependabot[bot] (#5480)
- chore: update bug report template @universalmind303 (#5652)
- chore: remove checklist from pr template @universalmind303 (#5653)
- chore: Remove deprecated agg methods and series split @colin-ho (#5630)
- chore: remove broken docpublish job from build-docs workflow @ykdojo (#5631)
- chore: Hint users to use
.collectwhen printing empty data...
v0.6.14
What's Changed 🚀
✨ Features
- feat: embed text metrics @colin-ho (#5583)
- feat: Add description and attributes to custom udf metrics @colin-ho (#5574)
- feat(flotilla): Aggregate Completed Worker Metrics in StatsManager @srilman (#5531)
- feat: add amplification metric for explode operator in native runner @samstokes (#5565)
🐛 Bug Fixes
- fix: Fix empty dataframe
showissue @caican00 (#5595) - fix: Fix openai test metrics fixture @colin-ho (#5593)
- fix: resolve docgen disk space failures by removing unused tools @ykdojo (#5589)
- fix: fix imports in explode.rs @colin-ho (#5573)
- fix: Add support for parsing STRUCT with parentheses syntax @Lucas61000 (#5449)
- fix: Dashboard Verbose Tracing Error @srilman (#5567)
- fix: Skip prompt metrics tests on ray runner @colin-ho (#5564)
📖 Documentation
- docs: Update AI functions usage patterns @everettVT (#5568)
🔧 Maintenance
Full Changelog: v0.6.13...v0.6.14
v0.6.13
What's Changed 🚀
💥 Breaking Changes
- refactor!: Remove support for creating File objects from bytes @universalmind303 (#5556)
✨ Features
- feat: Prompt metrics @colin-ho (#5549)
- feat: Async udf metrics @colin-ho (#5541)
- feat: Support specifying dimensions for text embedding @samstokes (#5543)
- feat: support customized retries error message of S3 request @stayrascal (#5447)
- feat: UDF metrics @colin-ho (#5507)
- feat: Support product @luoyuxia (#5515)
- feat: OTEL Metrics from Swordfish @srilman (#5454)
- feat: add tos object source @stayrascal (#5372)
- feat: Bind the name of the running UDF to the UDFActor @plotor (#5514)
- feat: Support text documents in prompt @colin-ho (#5520)
🐛 Bug Fixes
- fix: sorting on a literal value and aggregation with order-by @kevinzwang (#5547)
- fix: Limit with Offset returns unexpected result when reading Lance dataset @plotor (#5540)
- fix: Add absolute diff threshold to embed_text integration test @colin-ho (#5527)
- fix: test_embed_text_with_none_values with the OpenAI provider fails @desmondcheongzx (#5534)
- fix: Check for numpy dependency in prompt @colin-ho (#5521)
- fix: Handle Nones when embedding text with openai @desmondcheongzx (#5513)
- fix: Fix broken benchmark blog link @colin-ho (#5522)
♻️ Refactor
- refactor!: Remove support for creating File objects from bytes @universalmind303 (#5556)
📖 Documentation
- docs: fix migration guide @kevinzwang (#5563)
- docs: standardize key features casing to sentence case @ykdojo (#5559)
- docs: legacy UDF migration guide @kevinzwang (#5562)
- docs: simplify getting started tip in introduction @ykdojo (#5560)
- docs: update blog icon from bookmark to blog @ykdojo (#5557)
🔧 Maintenance
Full Changelog: v0.6.12...v0.6.13
v0.6.12
What's Changed 🚀
✨ Features
🐛 Bug Fixes
🚀 Performance
📖 Documentation
👷 CI
Full Changelog: v0.6.11...v0.6.12
v0.6.11
What's Changed 🚀
✨ Features
- feat: PostgresCatalog and PostgresTable followups @desmondcheongzx (#5508)
- feat: Add Catalog and Table implementations for PostgreSQL @desmondcheongzx (#5487)
- feat: make maintain_order configurable @stayrascal (#5505)
- feat: chat completions api for prompt function @colin-ho (#5497)
🐛 Bug Fixes
Full Changelog: v0.6.10...v0.6.11
v0.6.10
What's Changed 🚀
✨ Features
- feat: add --addr flag to daft-dashboard cli @VOID001 (#5444)
- feat: Support multiple image and file inputs for prompt function @colin-ho (#5481)
🐛 Bug Fixes
- fix: removes checking model directly for embedding dimensions @rchowell (#5445)
- fix: return-dtype for embed_text/image @universalmind303 (#5496)
- fix: Lower json inflation factor @colin-ho (#5461)
📖 Documentation
- docs: adds daft.func and daft.cls usage with migration page @everettVT (#5475)
🔧 Maintenance
- chore: Drop Python 3.9 @srilman (#5479)
- chore: remove extra from build command @stayrascal (#5493)
Full Changelog: v0.6.9...v0.6.10
v0.6.9
What's Changed 🚀
✨ Features
- feat: experimental vllm provider @kevinzwang (#5443)
- feat: Common Crawl x Daft tutorial using Qwen3 @malcolmgreaves (#5472)
- feat: better display for FileArrays @universalmind303 (#5482)
- feat: add a new subtype of file for video ops @universalmind303 (#5346)
- feat(dashboard): Starting Page for No Queries @srilman (#5452)
- feat: Flotilla OTEL Stats @srilman (#5463)
- feat: JSON Serialization for All Plans @srilman (#5356)
- feat: add mechanism for creating async rust based functions @universalmind303 (#5455)
- feat: Async batch func @colin-ho (#5459)
🐛 Bug Fixes
- fix: Don't update lockfile in package builds @srilman (#5484)
- fix: Undetach flotilla runner @colin-ho (#5473)
🚀 Performance
- perf: make DataType.infer 99.99% faster for core datatypes (image, file) @universalmind303 (#5469)
📖 Documentation
👷 CI
- ci: fix bun install by using node 20 @kevinzwang (#5491)
🔧 Maintenance
Full Changelog: v0.6.8...v0.6.9
v0.6.8
What's Changed 🚀
✨ Features
- feat: Allow images in prompt @colin-ho (#5466)
- feat: add pre-existence checks for lance_data_sink @huleilei (#5381)
- feat: Support setting
actor_udf_ready_timeoutvia Env @plotor (#5426) - feat: retryable udfs @universalmind303 (#5392)
- feat: Add a Bigtable data sink @desmondcheongzx (#5431)
- feat: classify_image expression @universalmind303 (#5428)
- feat: Add support for Metrics tab in quickstart Ray dashboard @jeevb (#5429)
- feat(lance): distributed FTS index creation via Daft UDF with fragment-level parallelism @huleilei (#5236)
- feat: add mimetype detection for daft.file @universalmind303 (#5411)
🐛 Bug Fixes
- fix: report a more reasonable error message when select * from <some_keywords> @VOID001 (#5440)
- fix: Fix async udf with
use_process@colin-ho (#5457) - fix: Drop table error in current active session @plotor (#5439)
- fix: add retry on "unable to open file" @kevinzwang (#5442)
- fix: Allow publishing quickstart helm chart to GHCR @jeevb (#5437)
- fix: convert num_rows to int when query count(*) from clickhouse @dujl (#5421)
- fix: Actually clone the repo before publishing quickstart helm chart @jeevb (#5433)
- fix: file reads for huggingface @universalmind303 (#5427)
- fix: Make benchmarking Ray cluster setup commands idempotent @jeevb (#5425)
- fix(lance): correct limit pushdown semantics with filters @huleilei (#5408)
🚀 Performance
- perf: Call async udfs asynchronously @colin-ho (#5451)
- perf: Elide shuffle for window if already partitioned @colin-ho (#5450)
- perf: defer allocation when creating series from literals @universalmind303 (#5391)
- perf: Double workers per udf actor handle @colin-ho (#5415)
♻️ Refactor
- refactor: Make helper function for calling async python functions from rust @colin-ho (#5432)
- refactor: combine sentence_transformers + transformers, and clean up … @universalmind303 (#5422)
📖 Documentation
- docs: adds ai functions, ai providers, contributing, and docstrings with nav @everettVT (#5438)
- docs: warning for Common Crawl dataset API instability @malcolmgreaves (#5436)
- docs: update the example to access S3-compatible services @huleilei (#5405)
- docs(connectors): add connector page for Lance format @huleilei (#5397)
Full Changelog: v0.6.7...v0.6.8
v0.6.7
What's Changed 🚀
💥 Breaking Changes
- feat!: Catch transient errors on turbopuffer writes @desmondcheongzx (#5380)
✨ Features
- feat: add viz for embedding @samster25 (#5419)
- feat!: Catch transient errors on turbopuffer writes @desmondcheongzx (#5380)
- feat(dashboard): Cleanup Queries Page @srilman (#5416)
- feat: Extend hash variants for xxhash @srilman (#5276)
- feat: prompt @colin-ho (#5394)
- feat: Add case function for better SQL-style conditional expressions @rasanpreetsingh3 (#5383)
🐛 Bug Fixes
- fix: Reduce number udfs by 1 in multi udf test @colin-ho (#5414)
- fix: Wrap azure fsspec in pafs.FSSpecHandler @colin-ho (#5412)
- fix(flotilla): Set flotilla actor cpu requests to 1 @colin-ho (#5404)
- fix: Fix prompt integration tests @colin-ho (#5401)
- fix: Fix Operator Finalization in Swordish Stat Manager @srilman (#5398)
🚀 Performance
♻️ Refactor
📖 Documentation
- docs: Fix some document errors @plotor (#5409)
- docs: update minhash example to use cc dataset @everettVT (#5390)
- docs: fix daft.File usage examples @kevinzwang (#5403)
👷 CI
- ci: Remove Tests for the Old Ray Runner @srilman (#5374)
- ci: disable running tpch profiling on push @kevinzwang (#5384)
🔧 Maintenance
- chore: bump pyo3 dependency @universalmind303 (#5410)
- chore: revert #5383 @kevinzwang (#5396)
- chore: optimize operator naming @Jay-ju (#5204)
Full Changelog: v0.6.6...v0.6.7