English | 简体中文
High-performance, Rust-backed columnar kernel for stock / candlestick (OHLCV) time-series data.
volas is a Rust-backed, pandas-shaped DataFrame for live OHLCV pipelines: 242 trading-indicators, incremental O(lookback) refresh, and NumPy/Torch-ready output.
It is not a general-purpose pandas replacement. It is a narrow, fast DataFrame for candlestick / OHLCV workflows: append a new bar, keep indicator columns cached, and refresh only the stale tail.
from volas import read_csv
df = read_csv("btc_1m.csv")
# Cache indicator directives as DataFrame columns.
df["rsi:14"]
df[["macd", "macd.signal", "atr:14"]]
# In a live loop:
df.append(new_bar) # one-row OHLCV frame
df["rsi:14"] # refreshes only the affected tail, O(lookback)
features = df.to_numpy()- 242 built-in indicators and TA-Lib-compatible directives
- Incremental refresh after
append: O(lookback), not O(n) - Rust kernels, no pandas runtime dependency
- pandas-shaped indexing:
.loc/.iloc/.at/read_csv/to_numpy - NumPy / Torch-ready output
pip install volasOn our reproducible benchmark suite, volas is faster than pandas, polars, stock-pandas and TA-Lib on most live-update indicator workloads.
- pandas-shaped API. The same
.loc/.iloc/.at,read_csv,to_numpyand resampling — for OHLCV workflows, change the import and keep your code. It is not a general-purpose pandas replacement. (See what's not covered) - Fast on live OHLCV indicator workloads, with reproducible benchmarks —
see the always-current live benchmark report.
- On the current published report, volas beats TA-Lib on 139 / 157
covered indicators by the default ratio — reproducible via
make benchmark. - On incremental update (each new bar), volas is the fastest of every library across all indicators — ~5× faster than TA-Lib, and up to ~360× faster than pandas.
- On the current published report, volas beats TA-Lib on 139 / 157
covered indicators by the default ratio — reproducible via
- Built for the live tick. A new bar touches only the affected tail
(
O(lookback), notO(n)); indicators refresh in microseconds, never a full recompute. - Rust inside, NumPy / Torch out. Compiled kernels, zero pandas at runtime;
to_numpy()feeds NumPy andtorch.Tensorpipelines.
volas is not a general-purpose pandas replacement — for plain dataframe analysis, keep pandas or polars. It is a narrow, fast DataFrame for the case where a new OHLCV bar arrives and indicators must refresh now:
| pandas | polars | TA-Lib | volas | |
|---|---|---|---|---|
pandas-shaped indexing (.loc / .iloc / .at) |
✅ | ❌ | ❌ | ✅ |
OHLCV-native indicator directives (df['rsi:14']) |
❌ | ❌ | ✅ | ✅ |
| Indicator cache owned by the frame | ❌ | ❌ | ❌ | ✅ |
Incremental O(lookback) refresh on a new bar |
❌ | ❌ | ❌ | ✅ |
| Rust-backed kernels, no pandas at runtime | ❌ | ✅ | C | ✅ |
| NumPy / Torch export | ✅ | ✅ | arrays | ✅ |
- Installation
- Quick start
- Usage
- Cumulation and DatetimeIndex
- TimeFrame
- Syntax of directive
- Indexing & selection
- Writing & assignment
- Timezones
- Missing values (
volas.NA) - pandas interop
- Error handling
- Built-in Indicators
- License
- For Developers
pip install volasRequires Python >= 3.11. Wheels are published for Linux (x86_64 / aarch64), macOS (x86_64 / arm64) and Windows (x86_64). For a local build from source, see For Developers.
Verify the install in 30 seconds, then see the examples/ — each
is self-contained and prints an OK: line:
pip install volas
python examples/00_install_check.py
python examples/03_live_ohlcv_append.py # append a bar, refresh only the stale tailfrom volas import DataFrame
df = DataFrame({
'open': [2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
'high': [12.0, 13.0, 14.0, 15.0, 16.0, 17.0],
'low': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
'close': [3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
'volume': [100, 200, 300, 400, 500, 600],
})
# A plain column -> Series
df['close']
# 0 3.0
# 1 4.0
# 2 5.0
# 3 6.0
# 4 7.0
# 5 8.0
# Name: close, dtype: float64
# An indicator directive -> Series (2-period SMA of `close`)
df['ma:2']
# 0 <NA>
# 1 3.5
# 2 4.5
# 3 5.5
# 4 6.5
# 5 7.5
# Name: ma:2, dtype: float64
# A boolean directive -> bool Series, usable as a row mask
bullish = df['close > open']
df[bullish] # DataFrame of the rows where close > open
# Several directives at once -> DataFrame
df[['ma:2', 'ma:3', 'close > open']]
# Zero-copy-ish export to NumPy
df['close'].to_numpy() # 1-D ndarray
df.to_numpy() # 2-D ndarray (rows x columns)from volas import (
DataFrame, Series, read_csv, to_datetime, TimeFrame, Timestamp,
)The sub-sections below follow volas's public surface in order: the DataFrame
class, then its instance methods, its static methods, the other classes, and the
top-level package functions — closing with the rest of the pandas-compatible
API that behaves exactly as it does in pandas. (A top-level name imported from
volas, such as read_csv, is written without a volas.
prefix.)
DataFrame has a pandas-compatible API, so if you are familiar with
pandas.DataFrame, you are already ready to use volas. Unlike pandas, volas is
backed by a Rust kernel and has no pandas runtime dependency.
df = read_csv('stock.csv')We can use [], which is called pandas indexing (a.k.a.
__getitem__ in python) to select out lower-dimensional slices. In addition to
indexing with colname (the column name of the DataFrame), we could also do
indexing by directives.
df[directive] # Gets a Series
df[[directive0, directive1]] # Gets a DataFrameWe have an example to show the most basic indexing using [directive]
df = DataFrame({
'open' : ...,
'high' : ...,
'low' : ...,
'close': [5, 6, 7, 8, 9]
})
df['ma:2']
# 0 <NA>
# 1 5.5
# 2 6.5
# 3 7.5
# 4 8.5
# Name: ma:2, dtype: float64Which gets the 2-period simple moving average on column "close".
- data
dict[str, list | np.ndarray] | DataFramethe column data — a dict mapping each column name to an equal-length list or NumPy array (float, int, bool,datetime64or string) — or another volasDataFrame, which is then copied (likepandas.DataFrame(df)). To attach aDatetimeIndex, parse a column withto_datetime, promote it withset_index, then tag a zone withtz_localize/tz_convert. See Timezones. - columns
Optional[list[str]] = NoneSelect and order the columns to keep — the same projection asdf[[...]]. A name not present raisesKeyError; an empty list or a duplicate name is rejected, and an absent column is never silently filled. - time_frame
Optional[str | TimeFrame] = NoneIf set, makes this a tf-aware (cumulating) DataFrame at this bar interval: the given rows are taken as already-final bars at that frame, and laterappends fold finer bars into the forming bar. Requires aDatetimeIndex. See Cumulation and DatetimeIndex. - cumulators
Optional[dict[str, str]] = NonePer-column aggregator overrides used when folding (e.g.{'amount': 'sum'}); defaults to OHLCV semantics (open=first,high=max,low=min,close=last,volume=sum; any other columnlast). Only meaningful together withtime_frame.
Executes the given directive and returns a numpy ndarray according to the directive.
df['ma:5'] # returns a Series
df.exec('ma:5', create_column=True) # returns a numpy ndarray# This will only calculate without creating a new column in the dataframe
df.exec('ma:20')The difference between df[directive] and df.exec(directive) is that
- the former will create a new column for the result of
directiveas a cache for later use, whiledf.exec(directive)does not unless we pass the parametercreate_columnasTrue - the former one accepts other pandas indexing targets, while
df.exec(directive)only accepts a valid volas directive string - the former one returns a
SeriesorDataFrameobject while the latter one returns annp.ndarray
Directly gets the column value by key, returning a Series — and never
computes: unlike df[key], which parses an unknown key as an indicator
directive and executes it, get_column only fetches an existing column and
raises KeyError otherwise. Use it whenever the column name comes from
external data (CSV headers, user input, configuration), so a name that happens
to look like a directive (e.g. "ma:5") can never silently trigger a
computation.
If the given key is an alias name, it returns the value of the corresponding
original column. If the column is not found, a KeyError is raised.
df = DataFrame({
'open' : ...,
'high' : ...,
'low' : ...,
'close': [5, 6, 7, 8, 9]
})
df.get_column('close')
# 0 5
# 1 6
# 2 7
# 3 8
# 4 9
# Name: close, dtype: int64Appends rows of other (a DataFrame or a Row) to the end of the caller in
place, returns the same DataFrame, and applies the DatetimeIndex to the
newly-appended row(s) if possible. Use copy() first when the original frame
must stay unchanged.
If the caller is a tf-aware DataFrame (one built with a time_frame, or
the result of cumulate), append instead folds each finer bar into the
forming bar rather than adding a row — see
Live cumulation.
By default, appending new rows does not update the indicator columns of the new
rows; they stay stale until they are read again or until df.fulfill() is
called (see below).
Cumulate (resample) the data frame to a coarser time_frame, returning a new
DataFrame. Requires a DatetimeIndex.
- time_frame
TimeFrame | strthe target bar interval, e.g.TimeFrame.m5or'5m'. See TimeFrame. - cumulators?
dict[str, str] | None = Noneper-column aggregator overrides (e.g.{'amount': 'sum'}); defaults to OHLCV semantics (open=first,high=max,low=min,close=last,volume=sum; any other columnlast).
# from 1-minute klines to 5-minute klines
five_minute = one_minute.cumulate('5m')See Cumulation and DatetimeIndex for details.
Fulfill all indicator columns. By default, adding new rows to a DataFrame will
not update the indicators of the new rows.
Indicators are only updated when accessing the indicator column or calling
df.fulfill(). Accessing df[directive] refreshes only the affected tail
incrementally (O(lookback), not an O(n) recompute); for bulk reads
(to_numpy(), .iloc) call fulfill() once to batch-refresh every cached
directive column in place.
df['ma:20'] # cache the 20-period SMA as a column
df = df.append(new_bar) # the new row's ma:20 is stale (a missing placeholder)
df.fulfill() # recompute only the tail of every cached column
df.to_numpy() # now freshDefines a column alias.
- as_name
strthe alias name - src_name
strthe name of an existing column
# Some plot libraries such as `mplfinance` require a column named capitalized
# `Open`, but it is ok, we could create an alias.
df.alias('Open', 'open')The alias resolves everywhere a column is looked up, including inside
directives, and survives drop / copy / slicing.
df['Open'] # same data as df['open']
df['ma:5@Open'] # the alias resolves inside directives toodf[col] and df[directive] return a Series — a named 1-D column whose API is
pandas-compatible: arithmetic / comparison / logical operators, .sum() /
.mean() / .std() / …, .shift() / .diff() / .fillna(), .iloc /
.loc, .to_numpy() / .to_list(). See
the rest of the pandas-compatible API
for the full list. There is no public Series constructor — a Series is
always obtained by indexing a DataFrame.
s = df['close']
s.name # 'close'
(s - s.shift(1)).mean()
df['ma:5 > ma:20'] # a directive likewise returns a Series (here a bool one)Beyond pandas, a Series also exposes the 15 TA-Lib Math Transform functions
as methods — acos asin atan ceil cos cosh exp floor ln
log10 sin sinh sqrt tan tanh:
df['close'].ln()
df['high'].sqrt()A datetime64[ns] Series exposes the pandas .dt accessor: calendar
components (year month day hour minute second microsecond
nanosecond quarter dayofweek dayofyear days_in_month), calendar
predicates (is_month_start … is_year_end, is_leap_year), names
(day_name() / month_name()), formatting (strftime(fmt)), bar alignment
(floor(freq) / ceil(freq) / round(freq) / normalize()), and
isocalendar(). A missing element yields NA in every component:
t = volas.to_datetime(df['time'])
t.dt.hour # int64 Series, 0..23
t.dt.dayofweek # Monday=0 .. Sunday=6
t.dt.floor('15min') # datetime Series aligned to the 15-minute bardf.iloc[i] and df.loc[label] return a Row — a single record whose .name
is its index label. A Row has no public constructor (Row(...) raises
TypeError: No constructor defined for Row); you only obtain one by indexing a
frame, and you may pass it to df.append.
row = df.iloc[-1] # the latest bar
row.name # its index label (e.g. a Timestamp for a DatetimeIndex)
row.to_dict() # {column: value}
row.to_numpy() # the numeric cells as a 1-D ndarrayFor live streaming, give a DataFrame a time_frame and append finer bars
into it, instead of re-cumulating the whole frame each tick. df.cumulate(tf)
returns such a frame (the forming period kept live), or build one directly with
DataFrame(data, time_frame=..., cumulators=...) (the given rows are taken as
already-final bars at that frame; requires a DatetimeIndex).
On a tf-aware frame:
- df.append(bar) folds the bar in: one in the open period updates the
forming last row (
df.iloc[-1]); one in a new period rolls over into a fresh row; a re-sent forming bar (same timestamp) updates rather than double-counts. - df.iloc[-1] is the current (still-open) period — the live bar.
- df[directive] / df.exec(directive) computes indicators over the
cumulated frame including the forming row — lazily, on read: an
appendonly marks them stale, and the next read recomputes just the tail. - df.cumulate(target) must be a whole multiple of the source frame (e.g.
5m→15m, not5m→7m; a week or 3-day bar does not nest into a month/year); the same frame is acopy().
df = history.cumulate('5m') # a tf-aware 5m frame (history is finer, e.g. 1m)
for bar in stream: # each `bar` is a finer DataFrame
df.append(bar) # folds into the forming 5m bar
df.iloc[-1] # the live, still-forming bar
df['macd'] # indicators over the cumulated frameSee Cumulation and DatetimeIndex for details.
read_csv(path, sep=',', header=True, parse_dates=None, index_col=None, na_values=None, keep_default_na=True, tz=None, date_unit=None) -> DataFrame
A top-level function that reads a CSV file into a DataFrame, inferring per-column
dtypes — a fast, pandas-subset CSV reader.
- path
strthe CSV file path. - sep?
str = ','the field delimiter (a single character);delimiteris an accepted alias. - header?
bool = TrueTrue(or omitted) treats the first row as the header;False/Nonemeans no header (columns are named'0'…'n-1'). - parse_dates?
list[str] | None = Nonecolumn names to parse into datetime columns. - index_col?
str | int | None = Nonea column name or integer position to move into the row index; applied afterparse_dates, so naming a parsed date column yields aDatetimeIndex. - na_values?
str | list[str] | None = Noneextra missing-value tokens. - keep_default_na?
bool = Truealso treat the default NA tokens as missing. - tz?
str | None = Nonethe timezone for theindex_coldatetime: a naive date string is read intz(stored UTC, the index tagged). Accepts a fixed offset ('+08:00') or an IANA name ('America/New_York'); pass the date column viaindex_coland do not also list it inparse_dates. See Timezones. - date_unit?
str | None = Nonereadindex_colas an epoch integer in this unit ('s'/'ms'/'us'/'ns', absolute UTC);tzthen only sets the display zone.
from volas import read_csv
df = read_csv('klines.csv') # RangeIndex
df = read_csv('klines.csv',
parse_dates=['time_key'], # parse to datetime
index_col='time_key') # -> DatetimeIndex
df = read_csv('data.tsv', sep='\t', header=False, # no header -> '0'..'n-1'
na_values=['NA', 'null'])A top-level function that bridges a pandas.DataFrame (pdf) into volas (and
df.to_pandas() bridges back). See pandas interop.
A top-level function that converts epoch numbers or datetime strings to a
datetime Series, mirroring pandas.to_datetime. obj may be a Series, a 1-D
NumPy array, or a list. A missing input (a float NaN, or a volas.NA in an
int column) becomes NaT, like pd.to_datetime.
- obj the values to convert — numeric epochs, datetime strings, or an
already-datetime
Series(returned unchanged). - unit?
str = 'ns'the epoch unit for numeric input ('s'/'ms'/'us'/'ns'); sub-unit fractions are preserved, likepd.to_datetime. - format?
str | None = Nonean explicit datetime format for string input (pandasformat=, e.g.'%Y-%m-%d %H:%M:%S') — faster and unambiguous; ignored for numeric input.
Naive strings parse as UTC and offset-aware strings (…+08:00) are absolute. To
display the resulting index in a zone, make it the index and tag the zone with
tz_localize / tz_convert (see Timezones).
from volas import to_datetime
# parse an epoch-seconds column to datetime, then make it the index
df['time'] = to_datetime(df['time'], unit='s')
df = df.set_index('time') # -> DatetimeIndex
df = df.tz_localize('America/New_York') # tag the display zone (see Timezones)For an in-place, truncating cast (the NumPy / pandas astype idiom), use
df.astype({'time': 'datetime64[s]'}) instead.
Get the canonical full name of a directive — the actual column name volas caches
it under. The command name is lowercased and default arguments / series are dropped
to save space.
from volas import directive_stringify
directive_stringify('kdj.j')
# 'kdj.j'
directive_stringify('kdj.j:9,3,2,100@high,close,close')
# 'kdj.j:,,2,100@,close'
# command names are case-insensitive and canonicalize to lowercase
directive_stringify('MACD:12,26')
# 'macd'Get the lookback period of a directive — the minimum number of prior data points
required before the indicator produces a valid result.
from volas import directive_lookback
directive_lookback('ma:20')
# 19
directive_lookback('boll')
# 19 (default period 20)
# Compound directive: lookback accumulates across nested expressions.
# repeat:5 needs 4 extra points, boll.upper (period 20) needs 19 -> 23
directive_lookback('repeat:5@(close > boll.upper)')
# 23Everything below behaves like its pandas counterpart — if you know it from
pandas, it works the same in volas, except for the deliberate
NA-model divergences noted after
the listing.
# --- DataFrame: metadata --------------------------------------------------
df.columns / df.shape / len(df) / df.dtypes # dtypes -> dict
df.index # row labels, as a NumPy array
col in df ; for col in df # membership / iterate column names
df.tz / df.tz_localize(tz) / df.tz_convert(tz) # DatetimeIndex tz; see Timezones
# --- DataFrame: selection -------------------------------------------------
df[col] # -> Series
df[[col, ...]] # -> DataFrame
df[bool_mask] # -> DataFrame (filter rows; mask = Series | ndarray)
df.iloc[...] / df.loc[...] / df.at[label, col] / df.iat[i, j]
df.head(n=5) / df.tail(n=5)
# --- DataFrame: reshaping & dtypes ----------------------------------------
df.drop([label, ...], axis=0) # drop rows by label (axis=1 -> columns)
df.dropna(how='any') / df.sort_index(ascending=True) / df.reset_index(drop=False)
df.rename({old: new}) / df.astype({col: dtype}) / df.set_index(col)
df.astype({col: 'datetime64[s]'}) # numeric epoch -> datetime (unit s|ms|us|ns; truncating)
df.copy() / df.to_numpy(dtype=None) / df.equals(other) / df.to_csv(path=None, ...)
# --- DataFrame: writing ---------------------------------------------------
df[col] = scalar | array | Series # add / replace a column (positional)
df.loc[mask, col] = value ; df.iloc[i, j] = value ; df.at[label, col] = value
# --- Series ---------------------------------------------------------------
s.name / s.dtype / len(s) / s.tz / s.index
s.to_numpy(dtype=None) / s.to_list()
s.iloc[...] / s.loc[...]
s + s, s - 1, -s, ... # elementwise arithmetic
s > 0, s == t, s != t, ... # comparison -> bool Series
s & t, s | t, ~s, s ^ t # logical -> bool Series
s.sum() / s.mean() / s.min() / s.max() / s.std() / s.var() / s.median() # skip missing
s.shift(n=1) / s.diff(n=1) / s.fillna(v) / s.ffill() / s.bfill() # see Missing values: NA keeps the dtype
s.isna() / s.notna() / s.dropna() / s.equals(t)This surface exists so pandas research / labeling code moves over verbatim. It is NOT the recommended way to compute indicators, and it should NOT be used in a live trading system: a window result is a plain Series — it does not join the directive cache and is not incrementally refreshed by
append()/fulfill(); every new bar costs a fullO(n)recompute. Prefer the equivalent directive (df['ma:20'],df['median:30'],df['stddev:20'], …): same kernels, plus caching andO(lookback)per-bar refresh.
s.rolling(window, min_periods=None, center=False) # int window; min_periods defaults to window
s.expanding(min_periods=1)
s.ewm(com=|span=|halflife=|alpha=, min_periods=0, adjust=True, ignore_na=False)
# exactly ONE decay spelling
# Rolling / Expanding (pandas semantics: NA skipped, min_periods gates):
.count() .nunique() # -> int64 Series (native NA)
.sum() .mean() .median() .min() .max()
.var(ddof=1) .std(ddof=1) .sem(ddof=1) .skew() .kurt()
.quantile(q, interpolation='linear') .rank(method='average', ascending=True, pct=False)
.first() .last() # dtype-preserving
.corr(other) .cov(other, ddof=1)
# Ewm:
.mean() .sum() .var(bias=False) .std(bias=False) .corr(other) .cov(other, bias=False)center=True labels each window at its center — it reads future bars
relative to the label. That is exactly what a labeling pass wants, and exactly
what a live signal must never do; it is supported for the former.
Time-based windows (rolling('5min') / a timedelta) are deliberately not
implemented. For multi-timeframe computation, maintain two tf-aware
DataFrames (see Cumulation) and append
each bar to both — that is the supported, O(lookback)-per-bar design;
emulating a coarser timeframe through window arithmetic recomputes everything
on every bar.
Not provided (pandas members that conflict with volas's model): apply /
agg / pipe (arbitrary-Python-per-window), win_type, step, on,
closed, method, ewm(times=...), ewm.online() — append() + directives
already cover the streaming use case.
A handful of APIs diverge from pandas by design, because volas stores missing
values natively as volas.NA (no object dtype, no
silent float upcast):
shift/diff/fillnaand friends keep the column's dtype — a missing value isvolas.NA, not an int/bool/str column upcast to float/object.- Comparisons (
==!=<<=>>=) return a non-nullable bool mask: a missing value comparesFalse(and!=comparesTrue), following IEEE / NumPy — not pandas-nullable's three-valuedNA. This keeps masks free ofNAsodf[mask]and assignment stay total. to_numpy()exports a missing cell asNaN(NumPy has noNA), so an int / bool / datetime column materializes asfloat64/NaT. Storage andto_list()keep the dtype andvolas.NA.
For the full picture — why volas's type system is built this way, where pandas's breaks, and the migration gotchas — see volas vs pandas — the type system.
The pandas-shaped indexing and writing details have their own sections — Indexing & selection and Writing & assignment.
Suppose we have a csv file containing kline data of a stock in the 1-minute time frame:
csv = read_csv(csv_path)
print(csv) date open high low close volume
0 2020-01-01 00:00:00 329.4 331.6 327.6 328.8 14202519
1 2020-01-01 00:01:00 330.0 332.0 328.0 331.0 13953191
2 2020-01-01 00:02:00 332.8 332.8 328.4 331.0 10339120
3 2020-01-01 00:03:00 332.0 334.2 330.2 331.0 9904468
4 2020-01-01 00:04:00 329.6 330.2 324.9 324.9 13947162
5 2020-01-01 00:04:00 329.6 330.2 324.8 324.8 13947163 <- an update of
2020-01-01 00:04:00
...
19 2020-01-01 00:19:00 327.0 327.2 322.0 323.0 15086985
Note that duplicated records of the same timestamp are not cumulated. All records except the latest one are discarded.
Read the same csv, but parse the date column into a DatetimeIndex:
df = read_csv(
csv_path,
parse_dates=['date'],
index_col='date'
)
print(df) open high low close volume
2020-01-01 00:00:00 329.4 331.6 327.6 328.8 14202519
2020-01-01 00:01:00 330.0 332.0 328.0 331.0 13953191
...
2020-01-01 00:19:00 327.0 327.2 322.0 323.0 15086985
You must have figured it out that the data frame now has a
DatetimeIndex.
But it will not become a 5-minute kline unless we cumulate it:
df_5m = df.cumulate('5m')
print(df_5m)Now we get a 5-minute kline:
open high low close volume
2020-01-01 00:00:00 329.4 334.2 324.8 324.8 62346461.0
2020-01-01 00:05:00 325.0 327.8 316.2 322.0 82176419.0
2020-01-01 00:10:00 323.0 327.8 314.6 327.6 74409815.0
2020-01-01 00:15:00 330.0 335.2 322.0 323.0 82452902.0
cumulate defaults to OHLCV semantics — open=first, high=max, low=min,
close=last, volume=sum — and any other column falls back to last. Pass
cumulators= to override a column's aggregator; the common case is a non-OHLCV
column that should be summed, such as a turnover (amount) column that would
otherwise default to last:
df.cumulate('1h', cumulators={'amount': 'sum'})The supported aggregators are first, max, min, last and sum.
The time_frame may be a string label or a TimeFrame constant — see
TimeFrame for the full list.
Every time frame lies on a fixed grid, and a cumulated bar is labelled with
its period's grid start — even when the first raw bar arrives mid-period. A
bar that opens with a 09:07 tick on a 15m frame is labelled 09:00, never
09:07, so volas bars line up exactly with exchange klines and with pandas
resample (label='left').
The grid origins per frame: intraday frames anchor at midnight of the
index's (timezone-aware) trading day — a 15m bar starts at :00/:15/:30/
:45, a 4h bar at 00:00/04:00/…; 1d starts at midnight; 1w on
Monday; 3d is a continuous grid from the Unix epoch; 1M / 1y on the
calendar month / year. If a daylight-saving transition removes or repeats a
period's boundary, the label resolves to the period's earliest real instant.
For live streaming you do not re-cumulate the whole history on every tick —
you keep the current 5-minute bar forming and update it as each finer bar
arrives. A tf-aware DataFrame does exactly that: it stays an ordinary
DataFrame (read columns, run directives, slice it), except append folds
each finer bar into the bar currently forming instead of adding a row. You make
one with df.cumulate('5m') or DataFrame(data, time_frame='5m'), and the live
loop is then just:
| step | call |
|---|---|
make a 5m frame |
cum = df.cumulate('5m') |
| feed it the next finer bar | cum.append(bar) |
| read the current forming bar | cum.iloc[-1] |
| read an indicator over it | cum['macd'] |
Build the 5-minute frame from the 1-minute df above one bar at a time. Seed it
with the 00:00 bar, then fold in 00:01. Both fall in the same 00:00–00:05
window, so the frame still holds one row — the forming bar — now updated
(high rose to 332.0, close to 331.0, volume summed):
cum = df.iloc[0:1].cumulate('5m') # seed the 5m frame with the 00:00 bar
cum.append(df.iloc[1:2]) # fold in 00:01 (same 5m window)
print(cum) open high low close volume
2020-01-01 00:00:00 329.4 332.0 327.6 331.0 28155710.0
Fold in 00:02, 00:03 and 00:04 and the window fills up. That single forming
row is now the finished first 5-minute bar — identical to the first row of
the one-shot df.cumulate('5m') printed earlier:
for i in range(2, 5):
cum.append(df.iloc[i:i + 1])
print(cum) open high low close volume
2020-01-01 00:00:00 329.4 334.2 324.8 324.8 62346461.0
Now fold in 00:05. It opens the next window, so the 00:00 bar is finalized
and a fresh forming bar starts; the frame grows to two rows and cum.iloc[-1] is
the new, still-forming 00:05 bar:
cum.append(df.iloc[5:6])
print(cum) open high low close volume
2020-01-01 00:00:00 329.4 334.2 324.8 324.8 62346461.0 <- finalized
2020-01-01 00:05:00 325.0 327.8 324.8 327.6 10448427.0 <- still forming
Two properties make this safe for a live feed:
- Indicators are lazy, and fresh on read.
appenddoes not recompute anything — it only flags the dependent directive columns as stale (their valid-row cursor now lags the frame height). The recompute happens when you readcum['ema:9'](or any directive): only the stale tail is refreshed —O(lookback), not the whole column — over the frame including the forming row, bit-identical to a one-shot cumulate-then-compute. (A bulk read such asto_numpy()does not auto-refresh; callcum.fulfill()first, or just read the directive.) - Re-sent bars do not double-count. Folding a bar whose timestamp you have already seen updates that period instead of adding to it — the same dedup rule shown at the top of this section — matching exchanges that revise their most recent bar.
See Live cumulation for the API summary.
A TimeFrame names a bar interval. It is accepted anywhere volas resamples —
df.cumulate, the time_frame DataFrame argument, and the hv indicator —
either as a TimeFrame constant or as its equivalent string label. There is no TimeFrame(...)
constructor — use one of the constants below or a label string.
TimeFrame.m5 # the 5-minute frame
'5m' # the equivalent label string, accepted everywhere too
df.cumulate(TimeFrame.m5) # identical to df.cumulate('5m')Supported frames (constant ⇄ label):
| Constant | Label | Alignment |
|---|---|---|
TimeFrame.s1 |
'1s' |
Civil second. |
TimeFrame.m1 |
'1m' |
Civil minute. |
TimeFrame.m3 |
'3m' |
Minute-of-hour buckets starting at 00, 03, 06, ... |
TimeFrame.m5 |
'5m' |
Minute-of-hour buckets starting at 00, 05, 10, ... |
TimeFrame.m15 |
'15m' |
Minute-of-hour buckets starting at 00, 15, 30, 45. |
TimeFrame.m30 |
'30m' |
Minute-of-hour buckets starting at 00 and 30. |
TimeFrame.H1 |
'1h' |
Civil hour. |
TimeFrame.H2 |
'2h' |
Hour-of-day buckets starting at 00, 02, 04, ... |
TimeFrame.H4 |
'4h' |
Hour-of-day buckets starting at 00, 04, 08, ... |
TimeFrame.H6 |
'6h' |
Hour-of-day buckets starting at 00, 06, 12, 18. |
TimeFrame.H8 |
'8h' |
Hour-of-day buckets starting at 00, 08, 16. |
TimeFrame.H12 |
'12h' |
Hour-of-day buckets starting at 00 and 12. |
TimeFrame.D1 |
'1d' |
Civil day in the frame timezone. |
TimeFrame.D3 |
'3d' |
Continuous 3-day buckets anchored to the Unix epoch; they do not reset at month boundaries. |
TimeFrame.W1 |
'1w' |
Continuous Monday-start weeks, including runs that cross month boundaries. |
TimeFrame.M1 |
'1M' |
Civil calendar month in the frame timezone. |
TimeFrame.Y1 |
'1y' |
Civil calendar year in the frame timezone. |
Every bucket is aligned in the frame timezone's local wall-clock while storage
stays UTC: the hour-of-day frames (2h/4h/6h/8h/12h) start at local 00
and step in local hours; 3d counts continuous 3-local-civil-day buckets keyed
from the Unix epoch day in that zone (not reset at month boundaries); 1w is
Monday-start in local civil time. So a daily/weekly bar follows the local trading
day, and a named zone makes the hour buckets DST-aware.
command . sub : args @ series op command ...
| | | |
| | | └── operand column / sub-expression (e.g. @open, @(boll))
| | └── comma-separated arguments (e.g. ma:20, kdj.k:9,3)
| └── sub-command (e.g. macd.signal)
└── indicator name (e.g. ma, macd, boll)
Here lists several use cases of column names
# The middle band of bollinger bands
# which is actually a 20-period (default) moving average
df['boll']
# kdj j less than 0
# This returns a series of bool type
df['kdj.j < 0']
# kdj %K cross up kdj %D
df['kdj.k // kdj.d']
# 5-period simple moving average
df['ma:5']
# 10-period simple moving average on (@) open prices
df['ma:10@open']
# A DataFrame of 5-period, 10-period and 30-period ma
df[[
'ma:5',
'ma:10',
'ma:30'
]]
# Which means we use the default values of the first and the second parameters,
# and specify the third parameter (for macd.signal)
df['macd.signal:,,10']
# We must wrap a parameter which is a nested command or directive
df['increase:3@(ma:20@close)']
# volas has a powerful directive parser,
# so we could even write directives like this:
df['''
repeat
: 5
@ (
close > boll.upper
)
''']left operator right
//— whetherleftcrosses up throughright(from below to above), which we call a "gold cross":df['macd // macd.signal'].\\— whetherleftcrosses down throughright, a "dead cross". In a Python string the backslash must be escaped, so we write'macd \\ macd.signal'.><— whetherleftcrossesright, either up or down.<<===!=>=>— for the same record, the value comparison betweenleftandright, returning aboolseries.- arithmetic
+ - * /, logical& | ^, and unary~(not) /-(negate).
df[directive] caches the result as a real column (so repeated reads are
free), then auto-refreshes its stale tail on access after an append. Use
df.exec(directive) to compute a directive as a NumPy array without
caching it (see Usage).
A pandas-compatible subset for label and positional access. The row index may be
a range, a DatetimeIndex, an integer index, or a string index.
df.iloc[2] # a Row by position (row.name is its index label)
df.iloc[10:] # a DataFrame slice by position
df.loc[label] # a Row by index label
df.loc[lo:hi] # inclusive label slice (lexicographic for string indexes)
df.at[label, col] # a scalar by label + column
df.iat[i, j] # a scalar by position
df.index # the row labels, as a NumPy arrayString (symbol) index — set_index on a string column, then look up by symbol:
df = DataFrame({'sym': ['aa', 'bb', 'cc'], 'px': [1.0, 2.0, 3.0]}).set_index('sym')
df.loc['bb'] # the row keyed 'bb'
df.loc['aa':'bb'] # inclusive, lexicographic slice
df.at['cc', 'px'] # 3.0
df.drop(['bb']) # drop by string labelvolas is pandas-shaped on the surface, but its type system is deliberately
different in more than the index: missing values keep their dtype, there is no
object dtype, value-returning methods stay Series, and a lossy conversion
raises instead of degrading silently. See
volas vs pandas — the type system for the full
comparison — why volas is built this way, where pandas's type system breaks, and
the migration gotchas.
The index specifically is a single level of one homogeneous label type. Relative to pandas, volas does not support:
MultiIndex(hierarchical / multi-level indexes), on rows or columns — columns are a flat list of unique string names.- Arbitrary label dtypes — an index is exactly one of range, datetime
(
datetime64[ns]), integer, or string. There is no float, categorical, interval, period, timedelta, or mixed-typeobjectindex. - Index algebra — reindexing, index set operations (union / intersection), and automatic alignment-on-index when combining frames.
- Duplicate-label lookups (label access assumes unique labels).
If your workflow needs any of these, keep using pandas; volas targets the single-level, OHLCV-shaped index that candlestick data uses.
Assign a whole column, or write into a positional / label / boolean selection (copy-on-write under the hood). Series assignment is positional (by row order, not index-aligned).
df['signal'] = 0.0 # add / replace a column (scalar | array | Series)
df.iat[3, 0] = 99.0 # one cell by position
df.at[label, 'close'] = 99.0 # one cell by label + column
df.iloc[10:20, 0] = 0.0 # a column slice
df.loc[df['close'] > df['open'], 'signal'] = 1.0 # masked column assignmentWriting a fractional value into an integer column raises — the int dtype is
kept, and a lossy write errors rather than silently widening to float (see
Differences from pandas; writing volas.NA / None
keeps the int dtype and marks the cell missing). Writing into a cached directive
column drops its cached status, so a later fulfill() can never silently
overwrite your edit.
Storage is always UTC epoch-nanoseconds — the universal axis on which crypto,
US, HK and A-share frames coexist and align on the absolute instant. A
DatetimeIndex additionally carries a per-frame timezone that governs how
those instants render, how bare-string labels match, and how cumulate aligns
day-and-coarser buckets. A timezone is either a fixed offset ('+08:00',
cheap; crypto / A-share / HK) or a named IANA zone ('America/New_York',
DST-aware via chrono-tz; US / EU). The default is UTC.
Here is the whole picture. Build a DatetimeIndex by parsing a column with
to_datetime, promoting it with set_index, then tagging the display zone with
tz_localize (reinterpret a naive wall-clock as that zone — the instant moves)
or tz_convert (keep the instant, restate the zone). A US exchange opens at 09:30
local on 2021-01-04, held as a naive local string:
from volas import DataFrame, to_datetime, Timestamp
# Parse the naive 't' strings to UTC instants and make them the index, then read
# the wall-clock *as New York local time* with tz_localize. The instant is stored
# UTC (14:30Z), but the index renders and matches in New York.
df = DataFrame({'t': ['2021-01-04 09:30:00'], 'close': [100.0]})
df['t'] = to_datetime(df['t'])
df = df.set_index('t').tz_localize('America/New_York')
df.tz # 'America/New_York'
df.index # ['2021-01-04T14:30:00.000000000'] (raw .index is UTC, matching pandas .values)
# The tz is what lets a bare local string match the right row — it is parsed in df.tz:
df.at['2021-01-04 09:30:00', 'close'] # 100.0
# A Timestamp is a typed, cross-tz label. The SAME instant in Shanghai is
# 22:30+08:00, and it still matches, regardless of df.tz:
ts = Timestamp('2021-01-04 22:30:00', tz='+08:00') # == 09:30 New York
df.at[ts, 'close'] # 100.0
ts.value # its UTC epoch-nanoseconds (int)
ts.tz # '+08:00'
# Integer epochs: to_datetime(unit=...) reads the unit. An epoch is *absolute* —
# anchor it as UTC, then restate the zone for display. 1609770600000 ms == 14:30Z:
e = DataFrame({'t': [1609770600000], 'close': [100.0]})
e['t'] = to_datetime(e['t'], unit='ms')
e.set_index('t').tz_localize('UTC').tz_convert('America/New_York').index
# ['2021-01-04T14:30:00.000000000']
# An offset-aware string is already absolute too — to_datetime resolves the offset:
o = DataFrame({'t': ['2021-01-04T09:30:00+08:00'], 'close': [1.0]})
o['t'] = to_datetime(o['t'])
o.set_index('t').index
# ['2021-01-04T01:30:00.000000000'] (09:30+08:00 == 01:30Z)A frame's time axis is in one of two states (the pandas model): naive (an
unanchored wall-clock, df.tz is None) or tz-aware (anchored, df.tz names
the zone — 'UTC' included). tz_localize anchors a naive axis (the instant
moves to match the wall-clock in that zone); tz_convert restates an aware
axis in another zone (the instant is unchanged). Each refuses the other state —
converting an unanchored clock or re-anchoring an anchored one would silently
shift instants:
naive = df # df.tz is None
aware = naive.tz_localize('America/New_York') # anchor: instants move, wall-clock kept
aware.tz_convert('+08:00') # restate: instants kept, wall-clock moves
naive.tz_convert('+08:00') # TypeError — anchor with tz_localize first
aware.tz_localize('UTC') # TypeError — already anchored; use tz_convertcumulate to a daily (or coarser) bar aligns buckets to the frame's local
trading day — DST-aware for a named zone — while the raw .index numpy export
stays UTC (matching pandas .values).
volas.NA is the single missing-value marker, and every dtype supports it —
crucially, a missing value never changes the column's dtype:
| dtype | how missing is stored | element access | console display |
|---|---|---|---|
float64 / float32 |
NaN, in-band |
np.float64(nan) |
<NA> |
int64 / int32 / bool / str |
a validity mask (dtype kept) | volas.NA |
<NA> |
datetime64[ns] |
NaT |
np.datetime64('NaT') |
<NA> |
Whatever the storage, the console always prints <NA> — one symbol for a
missing value, regardless of dtype (a float NaN, a datetime NaT, and an int /
bool / str hole all render identically; to_string(na_rep=...) overrides it).
Element access and to_numpy stay dtype-specific (a float hole reads back as
np.nan), so numpy / pandas interop is lossless.
This tracks pandas' own direction (PDEP-16) and means volas has no object
dtype: an int / bool / str column with a hole stays int / bool /
str, where pandas 3.0 upcasts to float64 / object.
import volas
s = volas.DataFrame({'a': [1, None, 3]})['a']
s.dtype # 'int64' (pandas would give float64)
s[1] # <NA> (s[1] is volas.NA; a float hole stays np.nan)
s.sum() # np.int64(4) reductions skip NA
s.fillna(0).to_list() # [1, 0, 3]
s.isna().to_numpy() # [False, True, False]
print(s) # the missing cell prints as <NA>
# shift / diff keep the int dtype (pandas upcasts to float); the gap is NA:
volas.DataFrame({'a': [10, 20, 30]})['a'].shift(1).to_list() # [<NA>, 10, 20]- Producing NA —
None(orvolas.NA) in a constructor list, theshift/diffgap, and the default fill ofwhere/mask. - Consuming NA — reductions (
sum/mean/min/ …) andcountskip it; arithmetic propagates it (x ∘ NA = NA);~/&/|/^use Kleene three-valued logic (NA & False = False,NA | True = True);cumsum/abs/round/clip/ indexing carry it through;isna/notna/dropna/fillna/ffill/bfillwork on every dtype. - Comparisons treat a missing value IEEE / numpy style:
==,<,<=,>,>=involving NA compareFalse, while!=comparesTrue— so a boolean mask is always purebool, clean fordf[mask]. Note the!=exception:s != valuetherefore includes missing rows.
pandas is not a runtime dependency; these bridges import it lazily, only when
called, so import volas stays pandas-free.
from volas import from_pandas
df = from_pandas(pandas_df) # numeric / bool / str / datetime native; a (tz-aware) DatetimeIndex round-trips;
# a nullable Int64 / boolean / string column reads back as int / bool / str + volas.NA
pdf = df.to_pandas() # -> pandas.DataFrame ('numpy' backend: an int/bool column with NA becomes float64 + NaN)
pdf = df.to_pandas(dtype_backend='numpy_nullable') # faithful masked Int64 / boolean (a lossless NA round-trip)
df.to_csv('out.csv', index=True) # subset of pandas to_csv; returns a str if path=NoneDirective problems raise typed exceptions. Both subclass DirectiveError and the
built-in ValueError, so existing except ValueError handling keeps working.
from volas import DirectiveSyntaxError, DirectiveValueError
try:
df['ma:2,3'] # too many arguments
except DirectiveValueError as e:
... # unknown command/sub-command, bad arg, bad value
try:
df['a >'] # malformed expression
except DirectiveSyntaxError as e:
... # message carries the line / column of the errorThe complete directive reference lives in INDICATORS.md. It covers Volas-exclusive indicators, built-in statistical commands, and TA-Lib-compatible directives.
Issues, indicator requests, benchmark challenges, and PRs are welcome — see CONTRIBUTING.md and start a thread in Discussions. The most useful feedback is on the API surface and the benchmark methodology.
If you build live OHLCV / technical-indicator pipelines in Python, star the repo to follow new indicators, benchmark results, and releases.
Developer notes, local build commands, dependency groups, and benchmark report guidance live in DEVELOPMENT.md.