fmus-big

A Python library for big data processing with a unified, intuitive API across different data processing frameworks.

Features

Unified API: Work seamlessly with Pandas, Polars, Dask, or PySpark with a consistent API
Smart Execution: Automatic switching between eager and lazy evaluation based on data size
Intuitive Interface: Designed to align with natural thought patterns about data operations
Optimized Performance: Built-in query optimization and execution planning
Comprehensive I/O: Rich support for reading and writing data in CSV, Parquet, JSON, and SQL formats
Powerful Visualization: Create static and interactive visualizations with automatic type detection

Installation

# Basic installation with pandas support
pip install fmus-big

# With Polars support
pip install fmus-big[polars]

# With Dask support
pip install fmus-big[dask]

# With PySpark support
pip install fmus-big[spark]

# Full installation with all features
pip install fmus-big[all]

Quick Start

import fmus_big as fb

# Read data from various sources
df = fb.read("data.csv")  # Auto-detects file format
df = fb.read_csv("large_data.csv", execution="lazy")  # Explicit lazy execution
df = fb.read_parquet("s3://bucket/data.parquet")  # Cloud storage

# Perform operations with a consistent API
result = df.select("name", "age", "city") \
           .filter("age > 25") \
           .group_by("city") \
           .aggregate({"age": ["mean", "max", "count"]}) \
           .order_by("age_mean", ascending=False)

# Write results to various formats
result.to_csv("results.csv")
result.to_parquet("results.parquet", partition_cols=["city"])
result.to_json("results.json", orient="records")

# Generic write function
fb.write(result, "results.parquet", compression="snappy")

# Compute and view results
print(result.head())

Data I/O Capabilities

fmus-big provides extensive I/O capabilities:

Format Support: Read and write CSV, Parquet, JSON, and SQL databases
Auto-detection: Automatic format detection based on file extension or path
Partitioned Data: Support for reading and writing partitioned Parquet datasets
SQL Integration: Query and write to SQL databases with a simple API
Backend Optimizations: Format-specific optimizations for each backend

# Read examples
df = fb.read("data.csv")                           # CSV
df = fb.read("data.parquet")                       # Parquet
df = fb.read("data/")                              # Partitioned directory
df = fb.read_sql_query("SELECT * FROM table", conn) # SQL

# Write examples
df.to_csv("output.csv", delimiter="|")
df.to_parquet("output.parquet", compression="snappy")
df.to_sql("table_name", conn, if_exists="replace")

Visualization Capabilities

fmus-big provides rich visualization capabilities:

Automatic Plot Type: Suggests the best visualization based on your data
Multiple Backends: Supports Matplotlib, Plotly, and Bokeh
Interactive Plots: Create interactive visualizations with tooltips, zooming, and panning
Dashboards: Build interactive dashboards with multiple plots and controls

# Basic visualizations
df.viz.line(x='date', y='value')
df.viz.scatter(x='x', y='y', color='category')
df.viz.bar(x='category', y='value')
df.viz.correlation_matrix()

# Interactive visualizations
df.interactive_viz().scatter_3d(x='x', y='y', z='z', color='category')
df.interactive_viz().bubble(x='x', y='y', size='size', color='category')
df.interactive_viz().geo_map(lat='latitude', lon='longitude', color='value')

# Build a dashboard
dashboard = df.dashboard(title='Sales Dashboard')
dashboard.add_plot('line', x='date', y='sales')
dashboard.add_plot('bar', x='region', y='revenue')
dashboard.add_control('dropdown', label='Region', options=['North', 'South', 'East', 'West'])
dashboard.create().show()

Why fmus-big?

One API to Learn: Master a single API that works across different backends
Scale Seamlessly: Start with small data on your laptop, scale to clusters without changing code
Optimized Performance: Automatic query optimization for each backend
Developer Experience: Clear error messages, sensible defaults, and comprehensive documentation

Contributing

Contributions are welcome! Check out the contributing guidelines to get started.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
fmus_big		fmus_big
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fmus-big

Features

Installation

Quick Start

Data I/O Capabilities

Visualization Capabilities

Why fmus-big?

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fmus-big

Features

Installation

Quick Start

Data I/O Capabilities

Visualization Capabilities

Why fmus-big?

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages