The most complete on-device sentiment toolkit in the R/Python ecosystem. It ships hate-speech / mixed / style flags, intent-based profiles, an interactive sentiment map, and opt-in transformer backends for when you want maximum accuracy.
The default model, on-device e5-base, matches the paid OpenAI embedding
(text-embedding-3-small) on our benchmarks: that quality, free, across ~100 languages,
with no API key and no data leaving your machine. When your text is tweets or other short
social posts and you want the last few points of accuracy, opt into the bundled fine-tuned
transformers (max-english, max-multilingual) behind the same one-line API. We publish
honest benchmarks, including where they win, on the project page.
See the project page: https://benwiseman.github.io/sentiment.ai/
v2. TensorFlow-free by default. Calibrated confidence and a tidy 3-class output. Hate-speech / mixed / style flags from the same embedding. Intent-based profiles (
use_profile(), plus an interactivesetup()in Python) so you never learn the model zoo. Opt-in transformer backends (RoBERTa, XLM-R) for maximum in-domain accuracy. An interactiveplot_sentiment()map. Diagnostic signals for when not to trust a score, and agreement statistics for methods sections. A Python sibling (sentimentai-py) shares the same scoring heads, verified bit-for-bit.
install.packages("sentiment.ai") # from CRAN
library(sentiment.ai)
# one-time setup, TensorFlow-free, runs on your machine
install_sentiment.ai() # interactive wizard walks you through it
# model loads on first use, no explicit init() needed
sentiment_score(c("I love this!", "this is terrible"))
#> [1] 1.00 -1.00 (about 1 = positive, about -1 = negative)
# tidy output with class probabilities and calibrated confidence
sentiment(c("I love this!", "The package arrived on Tuesday afternoon.", "this is terrible"))
#> text sentiment class confidence
#> 1 I love this! 1.00 positive 1.00
#> 2 The package arrived on Tuesday afternoon. 0.00 neutral 0.99
#> 3 this is terrible -1.00 negative 1.00No TensorFlow, no API key, and the model runs on your machine.
You do not have to learn the model zoo. A profile maps a plain intent to a backend and makes it the default; the choice persists across sessions.
sentiment_profiles() # see the options
use_profile("lightest") # smaller and faster on-device model (e5-small)
use_profile("max-english") # strongest on tweets, sentiment only (RoBERTa)| profile | model | what you get |
|---|---|---|
multilingual (default) |
e5-base |
best on-device quality, ~100 languages, + flags |
lightest |
e5-small |
smaller and faster, ~100 languages, on-device, + flags |
max-english |
twitter-roberta |
strongest on tweets / short social, English (opt-in ~500 MB) |
max-multilingual |
xlm-roberta |
strongest on tweets / short social, multilingual (opt-in ~1 GB) |
For the on-device e5 models, sentiment() adds three post-processing flags computed from
the same embedding: no second model, no extra download.
sentiment("go back to your country, you filth, nobody wants you")
#> ... class = negative ... hate_speech = TRUE p_hate = 0.85hate_speech/p_hate: a hate-speech detector (AUROC 0.95 to 0.97; tuned to about 0.90 recall with a very low false-positive rate on normal text).mixed: flags neutral-looking rows that carry competing positive and negative signal.style: top writing style (analytical, descriptive, formal, informal, inquisitive).
These ship for e5-small, e5-base, and openai. (Honest limits: the mixed head is
trained on explicit mixed text and can miss terse contrastive phrasing; slur-free
dehumanising hate can fall just under threshold, so lower it if you need higher recall.)
One picture: every comment embedded, projected to 2-D, coloured by sentiment, with the full
text on hover and human-readable cluster labels (deterministic c-TF-IDF, or pass
labels = "openai" to spend a fraction of a cent on tidier topics).
p <- plot_sentiment(reviews$text) # interactive plotly widget
htmlwidgets::saveWidget(p, "map.html")Plotting uses plotly; uwot or Rtsne give nicer projections if installed, otherwise it
falls back to PCA. In Python: pip install "sentimentai-py[plot]".
Fine-tuned transformers lead in-domain accuracy, so rather than overclaim we ship them as opt-in backends behind the same API:
sentiment_score("the gate agent was incredible", model = "twitter-roberta") # English
sentiment_score("le service etait incroyable", model = "xlm-roberta") # multilingualThey are a different tool, with real costs. They produce sentiment only: the
hate/mixed/style flags, sentiment_match(), plot_sentiment(), and cosine_match() all
need the e5 embedding space, and the transformers do not give you one. They also download a
full model (500 MB to 1 GB) on first use, and on CPU they are far slower than the light
e5-small (about the same speed as e5-base):
| backend | CPU throughput (texts/sec, higher is better) |
|---|---|
e5-small |
~850 |
e5-base (default) |
~300 |
twitter-roberta (opt-in) |
~310 |
Reach for them when tweet-style accuracy is the priority and a label is all you need.
Most people scoring sentiment are scoring reviews, tickets, and survey text, not tweets, so we lead with general business text. All benchmarks run locally on public data, no proprietary data.
General business text (employee reviews, macro-F1, n = 10,085):
| model | macro-F1 |
|---|---|
twitter-roberta (opt-in transformer) |
0.909 |
openai (paid embedding) |
0.896 |
e5-base (default, on-device) |
0.888 |
| distilBERT-SST2 | 0.879 |
e5-small (on-device) |
0.836 |
| VADER | 0.681 |
| TextBlob | 0.626 |
On real business text the on-device e5-base default lands within about two points of both
the paid OpenAI embedding and a 125M fine-tuned transformer, clears distilBERT, and sits 20
to 30 points above the lexicon tools. On a separate held-out set of general review text
(n = 19,547) the on-device heads reach macro-F1 0.93 (e5-base) and 0.94 (e5-small).
Where the transformer pulls ahead: tweets. On Twitter benchmarks the fine-tuned
twitter-roberta opens a real gap, because tweets are its training data:
| model | SemEval-2017 tweets | Airline tweets |
|---|---|---|
twitter-roberta (opt-in) |
0.724 | 0.761 |
e5-base (default) |
0.672 | 0.651 |
e5-small |
0.587 | 0.581 |
| VADER | 0.529 | 0.457 |
If your text really is tweets, opt into the max-english backend. For everything else the
gap is small, and e5-base is the only option here that also covers ~100 languages, carries
the hate / mixed / style flags, gives you sentiment_match() and plot_sentiment(), keeps
your data on the machine, and stays free.
Lexicon tools are fast and easy to inspect, but they have a hard ceiling: they score words
in a fixed dictionary, so out-of-vocabulary terms, misspellings, and new phrasings get
missed; negation and context are handled only by hand-written rules; and most convenient
lexicons are English only. sentiment.ai avoids those limits because it never looks at
individual words. It maps the whole sentence to an embedding vector and classifies that
vector:
- Handles ~100 languages from a single model (no separate multilingual lexicons).
- Scores phrases a dictionary has never seen: slang, domain jargon, product names.
- Explicitly models neutral as a third class, not just "not positive, not negative".
- Calibrated confidence: the
confidencevalue fromsentiment()tracks empirical accuracy (ECE about 0.015), so you can triage on it. - Tunable context:
sentiment_match()lets you define what positive and negative mean for your domain instead of accepting a generic polarity scale.
The trade-off is setup. Lexicon tools install with no Python dependency; sentiment.ai
needs a one-time install_sentiment.ai() to download an on-device model (about 280 MB for
the default e5-base, about 120 MB for e5-small). After that there is no API key, no
internet connection, and scoring is deterministic across machines.
sentiment_score() returns one score per input in [-1, 1]. sentiment_match() returns
that same calibrated score plus a nearest-phrase explanation against tunable poles
(phrase, class, similarity). The poles only shape the explanation, never the score.
Omit phrases to use the bundled, balanced 40/40 default poles:
sentiment_match(c("The cabin crew were friendly and helpful", "My bag was lost and nobody helped"),
phrases = list(positive = c("friendly", "on time", "helpful"),
negative = c("rude", "delayed", "lost luggage")))
#> text sentiment phrase class similarity
#> 1 The cabin crew were friendly and helpful 0.30 friendly positive 0.84
#> 2 My bag was lost and nobody helped -0.91 lost luggage negative 0.88sentiment_provenance("e5-base") reports the exact model, license, source, prefix, and
scoring head behind a score.
The same engine is on PyPI as
sentimentai-py: the same scoring heads, with a
forward pass verified bit-for-bit against this package.
pip install --pre sentimentai-pyScoring heads turn an embedding into a sentiment score; you can train and contribute your
own under scoring/<type>/<version>/<model>.json (for example
scoring/mlp/2.0/e5-base.json). A head is a small JSON describing the forward pass, read by
a pure-R scorer, with no xgboost or TensorFlow at score time.
Originally created by the Korn Ferry Institute AITMI team. Now maintained by Linnet Labs.