Guided by some feedback
semantic-cpp is a lightweight, high-performance stream processing library for C++17, consisting of a single header file paired with a separate implementation file. It blends the fluency of Java Streams, the laziness of JavaScript generators, the ordering capabilities reminiscent of database indices, and the temporal awareness essential for financial applications, IoT data processing, and event-driven systems.
Key distinguishing features:
- Every element is paired with a Timestamp (a signed
long longindex that may be negative). - Module is an unsigned
long longused for element counts and concurrency levels. - Streams remain lazy until materialised via
.toOrdered(),.toUnordered(),.toWindow(), or.toStatistics(). - Materialisation does not terminate the pipeline — further chaining is fully supported (“post-terminal” streams).
- Comprehensive parallel execution, sliding and tumbling windows, and advanced statistical operations.
| Feature | Java Stream | Boost.Range | ranges-v3 | spdlog::sink | semantic-cpp |
|---|---|---|---|---|---|
| Lazy evaluation | Yes | Yes | Yes | No | Yes |
| Temporal indices (signed timestamps) | No | No | No | No | Yes (core concept) |
| Sliding / tumbling windows | No | No | No | No | Yes (first-class support) |
| Built-in statistical operations | No | No | No | No | Yes (mean, median, mode, kurtosis, etc.) |
| Parallel execution (opt-in) | Yes | No | Yes | No | Yes (global thread pool or custom) |
| Continue chaining after terminal op | No | No | No | No | Yes (post-terminal streams) |
| Single header + implementation file, C++17 | No | Yes | Yes | Yes | Yes |
If you frequently write custom windowing or statistical code for time-series data, market feeds, sensor streams, or logs, semantic-cpp eliminates that boilerplate.
#include "semantic.h"
#include <iostream>
using namespace semantic;
int main() {
// Generate a stream of 100 integers with timestamps 0..99
auto stream = Generative<int>{}.of(std::vector<int>(100, 0))
.map([](int, Timestamp t) { return static_cast<int>(t); });
stream.filter([](int x) { return x % 2 == 0; }) // even numbers only
.parallel(8) // use 8 threads
.toWindow() // materialise with ordering and windowing support
.getSlidingWindows(10, 5) // windows of size 10, step 5
.toVector();
// Statistics example
auto stats = Generative<int>{}.of(1,2,3,4,5,6,7,8,9,10).toStatistics();
std::cout << stats.mean() << "\n"; // 5.5
std::cout << stats.median() << "\n"; // 5.5
std::cout << stats.mode() << "\n"; // first mode in case of multimodality
stats.cout();
return 0;
}Generative<E> provides a convenient, fluent interface for creating streams:
Generative<int> gen;
auto s = gen.of(1, 2, 3, 4, 5);
auto empty = gen.empty();
auto filled = gen.fill(42, 1'000'000);
auto ranged = gen.range(0, 100, 5);All factory functions return a Semantic<E> lazy stream.
Semantic<int> s = of<int>(1, 2, 3, 4, 5);Standard operations are available:
s.filter(...).map(...).skip(10).limit(100).parallel().peek(...)Elements carry timestamps, which can be manipulated:
s.translate(+1000) // shift timestamps
.redirect([](int x, Timestamp t){ return x * 10; }) // custom timestamp logicCall one of the four terminal converters before using collecting operations:
.toOrdered() // preserves order, enables sorting
.toUnordered() // fastest, no ordering guarantees
.toWindow() // ordered + full windowing capabilities
.toStatistics<D> // ordered + statistical methodsChaining continues after materialisation:
auto result = from(huge_vector)
.parallel()
.filter(...)
.toWindow()
.getSlidingWindows(100, 50)
.toVector();auto windows = stream
.toWindow()
.getSlidingWindows(30, 10); // size 30, step 10Or emit a stream of windows:
stream.toWindow()
.windowStream(50, 20)
.map([](const std::vector<double>& w) { return mean(w); })
.toOrdered()
.cout();auto stats = from(prices).toStatistics<double>([](auto p){ return p; });
std::cout << "Mean : " << stats.mean() << "\n";
std::cout << "Median : " << stats.median() << "\n";
std::cout << "StdDev : " << stats.standardDeviation() << "\n";
std::cout << "Skewness : " << stats.skewness() << "\n";
std::cout << "Kurtosis : " << stats.kurtosis() << "\n";Statistical functions are aggressively cached (frequency map computed once).
globalThreadPool // created automatically with hardware concurrency
stream.parallel() // use global pool
stream.parallel(12) // force exactly 12 worker threadsConcurrency level is inherited correctly throughout the chain.
empty<T>() // empty stream
of(1,2,3,"hello") // variadic arguments
fill(42, 1'000'000) // repeated value
fill([]{return rand();}, 1'000'000) // supplied values
from(container) // vector, list, set, array, initializer_list, queue
range(0, 100) // 0 .. 99
range(0, 100, 5) // 0,5,10,…
iterate(generator) // custom Generatorsemantic-cpp consists of a header and implementation file. Copy semantic.h and semantic.cpp into your project or install via CMake:
add_subdirectory(external/semantic-cpp)
target_link_libraries(your_target PRIVATE semantic::semantic)g++ -std=c++17 -O3 -pthread examples/basic.cpp semantic.cpp -o basic && ./basic| Operation | Java Stream | ranges-v3 | semantic-cpp (parallel) |
|---|---|---|---|
| 100 M integers → sum | 280 ms | 190 ms | 72 ms |
| 10 M doubles → sliding window mean | N/A | N/A | 94 ms (window 30, step 10) |
| 50 M ints → toStatistics | N/A | N/A | 165 ms |
Contributions are warmly welcomed! Areas currently needing attention:
- Additional collectors (percentiles, covariance, etc.)
- Improved integration with existing range libraries
- Optional SIMD acceleration for simple mappings
Please read CONTRIBUTING.md.
MIT © Eloy Kim
Enjoy truly semantic streams in C++!