pydremio is a Python API wrapper for interacting with Dremio.
It allows you to perform operations on datasets and metadata within Dremio via either the HTTP API or Arrow Flight.
Since Arrow Flight offers significantly better performance, it is the recommended method for data operations.
This repository includes the core library, unit tests, and example code to help you get started.
The wrapper is distributed as a Python wheel (.whl) and can be found in the Releases section.
Published to PyPI.
You need Python 3.13 or higher.
pip install pydremiopip install --upgrade --force-reinstall https://github.com/continental/pydremio/releases/download/v0.3.2/dremio-0.3.2-py3-none-any.whlIf you are behind a corporate firewall and you need a workaround (NOT recommended for use in production!):
pip install --upgrade --force-reinstall \
--trusted-host pypi.org \
--trusted-host files.pythonhosted.org \
--trusted-host github.com \
--trusted-host objects.githubusercontent.com \
--cert False \
https://github.com/continental/pydremio/releases/download/v0.3.2/dremio-0.3.2-py3-none-any.whlpip install https://github.com/continental/pydremio/releases/download/<version>/dremio-<version>-py3-none-any.whlpython-dotenv == 1.0.1
https://github.com/continental/pydremio/releases/latest/download/dremio-latest-py3-none-any.whlThe simplest way to create a logged-in client instance:
from dremio import Dremio
dremio = Dremio(<hostname>, username=<username>, password=<password>)Replace the placeholders or, preferably, use environment variables (via a .env file) to avoid storing credentials in code.
Example .env file:
DREMIO_USERNAME="your_username@example.com"
DREMIO_PASSWORD="xyz-your-password-or-pat-xyz"
DREMIO_HOSTNAME="https://your.dremio.host.cloud"You can then use the convenience method:
from dremio import Dremio
from dotenv import load_dotenv
load_dotenv()
dremio = Dremio.from_env()By default pydremio assumes no TLS encryption. If you have set up TLS please use:
from dremio import Dremio
from dotenv import load_dotenv
load_dotenv()
dremio = Dremio.from_env()
dremio.flight_config.tls = Trueor set it up in your .env-file:
DREMIO_FLIGHT_TLS=TRUEMore information here: Dremio authentication
- By default, the queries are run with Arrow Flight.
- The reason behind is that http-queries generate a lot of temporary cache. This cache is stored for longer time and for each query again. This may cause high storage-costs if you query big tables!
- For small datasets this may not a good trade-off in duration. Try
run(method='http')instead.
from dremio import Dremio
dremio = Dremio.from_env()
ds = dremio.get_dataset("path.to.vds")
polars_df = ds.run().to_polars()
pandas_df = ds.run().to_pandas()from dremio import Dremio, NewFolder
folder = dremio.create_folder("path.to.folder")from dremio import Dremio, NewFolder, AccessControlList, AccessControl
folder = dremio.create_folder("path.to.folder")
user_id = dremio.get_user_by_name('<user_name>')
folder.set_access_for_user(user_id, ['SELECT'])All models are located in the models/ directory.
Below is an overview of available methods grouped by category.
login(username: str, password: str) -> strauth(auth: str = None, token: str = None) -> Dremio
get_catalog_by_id(id: UUID) -> CatalogObjectget_catalog_by_path(path: list[str]) -> CatalogObject- Accepts both list format (
["space", "dataset"]) and string format ("space/dataset")
- Accepts both list format (
create_catalog_item(item: NewCatalogObject | dict) -> CatalogObject
update_catalog_item(id: UUID | item: NewCatalogObject | dict) -> CatalogObjectupdate_catalog_item_by_path(path: list[str], item: NewCatalogObject | dict) -> CatalogObject
delete_catalog_item(id: UUID) -> bool- Returns
Trueif successful
- Returns
copy_catalog_item_by_path(path: list[str], new_path: list[str]) -> CatalogObject
refresh_catalog(id: UUID) -> CatalogObject
get_catalog_tree(id: str = None, path: str | list[str] = None)β οΈ Expensive operation, intended for exploration and mapping only
get_dataset(path: list[str] | str | None = None, *, id: UUID | None = None) -> Datasetcreate_dataset(path: list[str] | str, sql: str | SQLRequest, type: Literal['PHYSICAL_DATASET', 'VIRTUAL_DATASET'] = 'VIRTUAL_DATASET') -> Datasetdelete_dataset(path: list[str] | str) -> boolcopy_dataset(source_path: list[str] | str, target_path: list[str] | str) -> Datasetreference_dataset(source_path: list[str] | str, target_path: list[str] | str) -> Dataset
get_folder(path: list[str] | str | None = None, *, id: UUID | None = None) -> Foldercreate_folder(path: str | list[str]) -> Folderdelete_folder(path: str | list[str], recursive: bool = True) -> boolcopy_folder(source_path: list[str] | str, target_path: list[str] | str, *, assume_privileges: bool = True, relative_references: bool = False) -> Folderreference_folder(source_path: list[str] | str, target_path: list[str] | str, *, assume_privileges: bool = True) -> Folder
Wiki and tags are associated by the ID of the collection item.
The tags object contains an array of tags.
get_wiki(id: UUID) -> Wikiset_wiki(id: UUID, wiki: Wiki) -> Wikiget_tags(id: str) -> Tagsset_tags(id: str, tags: Tags) -> Tags
sql(sql_request: SQLRequest) -> JobIdstart_job_on_dataset(id: UUID) -> JobIdget_job_info(id: UUID) -> Jobcancel_job(id: UUID) -> Jobget_job_results(id: UUID) -> JobResultsql_results(sql_request: SQLRequest) -> Job | JobResult
get_users() -> list[User]get_user(id: UUID) -> Userget_user_by_name(name: str) -> Usercreate_user(user: User) -> Userupdate_user(id: UUID, user: User) -> Userdelete_user(id: UUID, tag: str) -> bool- Returns
Trueif deletion was successful
- Returns
- Publish to PyPI
- CLI support
Contributions are welcome! Please open issues or pull requests for features, bugs, or improvements.
This project is licensed under the BSD License. See the LICENSE file for details.