Pronouned like fiasco, but with a t instead of a c
(F)ormulas (I)n (AST) (O)ut
Python bindings for fiasto - A language-agnostic modern Wilkinson's formula parser and lexer.
- Parse Wilkinson's Formulas: Convert formula strings into structured JSON metadata
- Tokenize Formulas: Break down formulas into individual tokens with detailed information
- Python Dictionaries: Returns native Python dictionaries for easy integration
parse_formula()
- Takes a Wilkinson’s formula string and returns a Python dictionarylex_formula()
- Tokenizes a formula string and returns a Python dictionary
Install from PyPI (recommended):
pip install fiasto-py
import fiasto_py
from pprint import pprint
# Parse a formula into structured metadata
print("="*30)
print("Parse Formula")
print("="*30)
result = fiasto_py.parse_formula("y ~ x1 + x2 + (1|group)")
pprint(result, compact = True)
Output:
==============================
Parse Formula
==============================
{'all_generated_columns': ['y', 'x1', 'x2', 'group'],
'columns': {'group': {'generated_columns': ['group'],
'id': 4,
'interactions': [],
'random_effects': [{'correlated': True,
'grouping_variable': 'group',
'has_intercept': True,
'includes_interactions': [],
'kind': 'grouping',
'variables': []}],
'roles': ['GroupingVariable'],
'transformations': []},
'x1': {'generated_columns': ['x1'],
'id': 2,
'interactions': [],
'random_effects': [],
'roles': ['FixedEffect'],
'transformations': []},
'x2': {'generated_columns': ['x2'],
'id': 3,
'interactions': [],
'random_effects': [],
'roles': ['FixedEffect'],
'transformations': []},
'y': {'generated_columns': ['y'],
'id': 1,
'interactions': [],
'random_effects': [],
'roles': ['Response'],
'transformations': []}},
'formula': 'y ~ x1 + x2 + (1|group)',
'metadata': {'family': None,
'has_intercept': True,
'has_uncorrelated_slopes_and_intercepts': False,
'is_random_effects_model': True}}
import fiasto_py
from pprint import pprint
print("="*30)
print("Lex Formula")
print("="*30)
tokens = fiasto_py.lex_formula("y ~ x1 + x2 + (1|group)")
pprint(tokens, compact = True)
Output:
==============================
Lex Formula
==============================
[{'lexeme': 'y', 'token': 'ColumnName'},
{'lexeme': '~', 'token': 'Tilde'},
{'lexeme': 'x1', 'token': 'ColumnName'},
{'lexeme': '+', 'token': 'Plus'},
{'lexeme': 'x2', 'token': 'ColumnName'},
{'lexeme': '+', 'token': 'Plus'},
{'lexeme': '(', 'token': 'FunctionStart'},
{'lexeme': '1', 'token': 'One'},
{'lexeme': '|', 'token': 'Pipe'},
{'lexeme': 'group', 'token': 'ColumnName'},
{'lexeme': ')', 'token': 'FunctionEnd'}]
import fiasto_py
import polars as pl
import numpy as np
from pprint import pprint
# Load data
mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
df = pl.read_csv(mtcars_path)
# Parse formula
formula = "mpg ~ wt + cyl"
result = fiasto_py.parse_formula(formula)
pprint(result)
# Find the response column(s)
response_cols = [
col for col, details in result["columns"].items()
if "Response" in details["roles"]
]
# Find non-response columns
preds = [
col for col, details in result["columns"].items()
if "Response" not in details["roles"]
]
# Has intercept
has_intercept = result["metadata"]["has_intercept"]
# Prepare data matrices
X = df.select(preds).to_numpy()
y = df.select(response_cols).to_numpy().ravel()
# Add intercept if metadata says so
if has_intercept:
X_with_intercept = np.column_stack([np.ones(X.shape[0]), X])
else:
X_with_intercept = X
# Solve normal equations: (X'X)^-1 X'y
XTX = X_with_intercept.T @ X_with_intercept
XTy = X_with_intercept.T @ y
coefficients = np.linalg.solve(XTX, XTy)
# Extract intercept and slopes
if has_intercept:
intercept = coefficients[0]
slopes = coefficients[1:]
else:
intercept = 0.0
slopes = coefficients
# Calculate R2
y_pred = X_with_intercept @ coefficients
ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - np.mean(y)) ** 2)
r_squared = 1 - (ss_res / ss_tot)
# Prep Output
# Combine intercept and slopes into one dict
coef_dict = {"intercept": intercept} | dict(zip(preds, slopes))
# Create a tidy DataFrame
coef_df = pl.DataFrame(
{
"term": list(coef_dict.keys()),
"estimate": list(coef_dict.values())
}
)
# Print results
print(f"Formula: {formula}")
print(f"R² Score: {r_squared:.3f}")
print(coef_df)
Output:
{'all_generated_columns': ['mpg', 'intercept', 'wt', 'cyl'],
'all_generated_columns_formula_order': {'1': 'mpg',
'2': 'intercept',
'3': 'wt',
'4': 'cyl'},
'columns': {'cyl': {'generated_columns': ['cyl'],
'id': 3,
'interactions': [],
'random_effects': [],
'roles': ['Identity'],
'transformations': []},
'mpg': {'generated_columns': ['mpg'],
'id': 1,
'interactions': [],
'random_effects': [],
'roles': ['Response'],
'transformations': []},
'wt': {'generated_columns': ['wt'],
'id': 2,
'interactions': [],
'random_effects': [],
'roles': ['Identity'],
'transformations': []}},
'formula': 'mpg ~ wt + cyl',
'metadata': {'family': None,
'has_intercept': True,
'has_uncorrelated_slopes_and_intercepts': False,
'is_random_effects_model': False,
'response_variable_count': 1}}
Formula: mpg ~ wt + cyl
R² Score: 0.830
shape: (3, 2)
┌───────────┬───────────┐
│ term ┆ estimate │
│ --- ┆ --- │
│ str ┆ f64 │
╞═══════════╪═══════════╡
│ intercept ┆ 39.686261 │
│ cyl ┆ -1.507795 │
│ wt ┆ -3.190972 │
└───────────┴───────────┘
fiasto
supports comprehensive Wilkinson's notation including:
- Basic formulas:
y ~ x1 + x2
- Interactions:
y ~ x1 * x2
- Smooth terms:
y ~ s(z)
- Random effects:
y ~ x + (1|group)
- Complex random effects:
y ~ x + (1+x|group)
- Multivariate models:
mvbind(y1, y2) ~ x + (1|g)
- Non-linear models:
y ~ a1 - a2^x, a1 ~ 1, a2 ~ x + (x|g), nl = TRUE
For the complete reference, see the fiasto documentation.
The package is available on PyPI and can be installed with:
pip install fiasto-py
- PyPI Page: pypi.org/project/fiasto-py
- Source Code: github.com/alexhallam/fiasto-py
- Documentation: This README and inline docstrings
Parse a Wilkinson's formula string and return structured JSON metadata.
Parameters:
formula
(str): The formula string to parse
Returns:
dict
: Structured metadata describing the formula
Raises:
ValueError
: If the formula is invalid or parsing fails
Tokenize a formula string and return JSON describing each token.
Parameters:
formula
(str): The formula string to tokenize
Returns:
dict
: Token information for each element in the formula
Raises:
ValueError
: If the formula is invalid or lexing fails
Contributions are welcome! Please feel free to submit a Pull Request.
- fiasto - The underlying Rust library
- PyO3 - Python-Rust bindings
- maturin - Build system for Python extensions
- PyPI - Python Package Index for distribution
This project is licensed under the MIT License - see the LICENSE file for details.