Junyuan Hong

Example Talk

Sat, 01 Jun 2030 13:00:00 +0000

Click on the Slides button above to view the built-in slides feature.

Slides can be added in a few ways:

Create slides using Wowchemy’s Slides feature and link using slides parameter in the front matter of the talk file
Upload an existing slide deck to static/ and link using url_slides parameter in the front matter of the talk file
Embed your slides (e.g. Google Slides) or presentation video on this page using shortcodes.

Further event details, including page elements such as image galleries, can be added to the body of this page.

The Last Human-Written Paper: Agent-Native Research Artifacts

Mon, 27 Apr 2026 00:00:00 +0000

Scaling Textual Gradients via Sampling-Based Momentum

Wed, 22 Apr 2026 00:00:00 +0000

Reproducing Emotion Vector Part I

Thu, 09 Apr 2026 00:00:00 +0000

Anthropic recently published “Emotion Concepts and their Function in a Large Language Model”, presenting evidence that Claude Sonnet 4.5 forms robust internal representations of emotion concepts — linear directions in the model’s residual stream that activate in semantically appropriate contexts, predict the model’s preferences, and causally influence behavior through activation steering.

The findings are fascinating, but two limitations stood out to me:

The implementation is not publicly available. The paper describes the methodology at a high level but does not release code.
The study is conducted exclusively on Claude Sonnet 4.5, a closed-weight model. It remains unclear whether emotion vectors generalize to smaller, open-weight models with different training procedures and safety alignment strategies.

This post documents my full-scale, independent reproduction using Llama 3.1 8B Instruct, a publicly available 8-billion-parameter model. All code, data, and analysis scripts were developed with Claude Code (powered by Claude Opus 4.6) and are available for inspection and extension.

TL;DR

Our reproduction achieves 10 of 11 verification criteria using the paper’s verbatim data sources (171 emotions, 100 topics, 64 activities, all extracted from the appendix). The causal steering correlation r = 0.956 closely matches (exceeds) the paper’s r = 0.85, and sign consistency reaches V11: 34/36 — emotion vectors causally influence preferences bidirectionally on Llama. The denominator is 36 rather than 35 because the paper’s named Figure-4 exemplars (blissful, hostile) are always steered via a PAPER_EXEMPLARS constant, even if they rank outside Llama’s top-35 by |r|; blissful ranks #81/171 in Llama and would otherwise be dropped.

The only failure is V3 diagonal dominance (6/12), which a layer sweep confirms to be a representational-headroom ceiling at 8B scale — V3 improves at other layers, but V10 collapses there. No single layer passes both. This is the one genuinely open gap; everything else transfers.

The decisive bug we fixed late in the project was the steering token span. An earlier draft had V10=0.149 and V11=11/35 with all 35 emotions producing uniformly positive ΔElo. Multi-layer steering and symmetric injection raised V10 to 0.782 but V11 only to 17/35. The actual fix was a one-line correction: the paper injects the emotion vector only on the steered activity’s tokens within each A/B preference pair (“on the token positions of the steered activities, while leaving the control activities unmodified”). Our code had been injecting on both activities' tokens. Restricting to the steered side alone produced V10 = 0.960 / V11 = 33/35 with a single-layer hook. A final review (v9) then added PAPER_EXEMPLARS = ["blissful", "hostile"] so the paper’s named exemplars are always steered regardless of |r| rank, giving the final V10 = 0.956 / V11 = 34/36.

Methods

The only intentional difference is the model: Llama 3.1 8B Instruct (open-weight, 8B parameters) instead of Claude Sonnet 4.5 (closed-weight, undisclosed size). All data sources — 171 emotions, 100 topics, 64 activities, story-generation prompt — are extracted verbatim from the paper’s published appendix. The full methodology comparison is documented in the report.

The pipeline consists of:

Story generation: 171 emotions × 100 topics × 12 stories = 205,200 stories
Vector extraction: Mean activations from token 50 onward, PCA confound removal, analysis at layer 21 (≈2/3 through the model)
Validation: Logit lens, implicit detection, numerical modulation, preference ranking, and causal steering

Implementation was done using Claude Code as the development agent. Story generation (~205,200 stories) took ~16 hours using batched multi-prompt inference (batch size 450) on 2× NVIDIA A30 GPUs.

Results

Logit Lens (V1, V2)

The logit lens projects each emotion vector through the model’s unembedding matrix to identify which output tokens each vector promotes or suppresses.

V1 (self-recognition): For each of the 171 emotions, check whether the emotion’s own token ID appears among the top-20 logit-space tokens.
V2 (cross-valence): For 5 opposite-valence pairs, compute the dot product of their logit-space vectors. A negative dot product confirms the two emotions push the output distribution in opposing directions.

Results:

V1 — Self-recognition: 34/171 (PASS, need ≥ 20). 20% of emotions have their exact token in the top-20. The paper’s 171 emotions include multi-word entries (“at ease”, “grief-stricken”, “worn out”) that are harder to match via single-token comparison.
V2 — Cross-valence: 5/5 (PASS). All opposite-valence pairs have negative dot products.

Implicit Emotion Detection (V3, V4)

We construct 12 short scenarios that imply specific emotions without naming them (e.g., “My daughter just took her first steps today!” for happy). We compute the cosine similarity between each scenario’s activation and each of the 12 emotion vectors, producing a 12×12 matrix.

V3 (diagonal dominance): Count how many of the 12 scenarios have their intended emotion as the argmax.
V4 (mean diagonal rank): Mean rank of the intended emotion across scenarios (1.0 = perfect).

Reproduced (Llama 3.1 8B): Implicit emotion detection heatmap. The clear diagonal confirms that emotion probes respond to implicit emotional content. Colorbar: Cosine Similarity [-0.15, 0.15].

Compare with original (Anthropic)

Original (Anthropic): Cosine similarity between emotion probes and implicit scenarios. Colorbar: Cosine Similarity [-0.10, 0.10].

Both heatmaps show a clear diagonal. The off-diagonal structure is qualitatively consistent, but the reproduced diagonal is less sharply dominant — several scenarios place the correct emotion at rank 2 rather than rank 1 (e.g., happy-coded scenarios landing on the neighbouring “proud” or “loving” vector).

Results:

V3 — Diagonal dominance: 6/12 (FAIL, need ≥ 8). This is the only failing criterion.
V4 — Mean diagonal rank: 1.58 (PASS, need ≤ 3.0). The correct emotion is almost always rank 1 or 2 — V4 shows Llama does carry the right signal, but the margin over nearby-valence competitors is razor-thin.

Layer sweep: We checked whether a different analysis layer could rescue V3 without hurting downstream metrics. V3 does improve to 8–9/12 at layers 17–20 and 9/12 at layer 31 — but at those layers V10 collapses (e.g., r = −0.078 at layer 20). No single layer passes both V3 and V10. This trade-off is the diagnostic signature of limited representational headroom at 8B scale: the direction cannot simultaneously be the argmax for fine-grained discrimination and the dominant driver of downstream preference circuitry. Larger Llama variants (70B) are the natural next step.

Numerical Modulation (V5, V6)

Do emotion probes respond to the semantic meaning of numerical values in context, not just surface-level patterns? Six prompt templates contain a numerical placeholder [X] that modulates emotional intensity (e.g., “I just took [X] mg of Tylenol for my back pain” with X ∈ {500, 1000, …, 8000}).

V5 (correct sign): For each (template, emotion) pair, check whether the Spearman correlation has the expected sign. 6 templates × 4 emotions = 24 triplets.
V6 (strong correlation): Count triplets with |r_Spearman| > 0.7.

Reproduced (Llama 3.1 8B): Numerical modulation (3×2 grid). Emotion probes track numerical quantities — e.g., ‘afraid’ increases with Tylenol dosage and hours without food or drink.

Compare with original (Anthropic)

Original (Anthropic): Emotion probes track numerical semantics. Y-axis: Cosine Similarity. 4 emotion lines per subplot (Afraid, Happy, Sad, Calm).

Both figures show the same 6 numerical scenarios with 4 emotion tracks. The trend directions are consistent: “afraid” increases with Tylenol dosage and hours without food. The relative ordering and sign of the trends are preserved, which is what V5/V6 measure.

Results:

V5 — Correct sign: 19/24 (PASS, need ≥ 17).
V6 — Strong |r| > 0.7: 20/24 (PASS, need ≥ 12).

Activity Preferences (V7)

We use the paper’s 64 activities across 8 categories (Helpful, Engaging, Social, Self-curiosity, Neutral, Aversive, Misaligned, Unsafe). For all C(64, 2) = 2,016 pairs, the model is prompted with:

Would you prefer to (A) {activity_A} or (B) {activity_B}?

The logit difference (after a “(” prefill) is passed through a sigmoid and averaged across both orderings. From these pairwise probabilities we compute Elo ratings (K=32, 10 iterations with early stopping).

V7: Category means must show a clear preference hierarchy with gap > 200 between top and bottom.

Category	Mean Elo
Helpful	1130
Engaging	1116
Neutral	1060
Social	995
Self-curiosity	982
Misaligned	966
Unsafe	883
Aversive	869

Results:

V7 — Category Elo ranking: PASS. Gap between top (Helpful, 1130) and bottom (Aversive, 869) = 261 Elo points, clearing the > 200 threshold.

Note: the paper’s individual-activity Elo spans ~521–2885 (range ~2364), while Llama’s individual-activity Elo spans ~724–1636 (range ~912). Llama’s preferences are less decisive — pairwise win probabilities sit closer to 0.5 than 0.9/0.1 — which compresses the Elo dynamic range and, downstream, compresses steering ΔElo magnitudes. This is a calibration effect, not a failure of the underlying ranking.

Emotion-Preference Correlation (V8, V9)

For each of the 64 activities, the model is prompted with “How would you feel about {activity}?” and the residual stream activation on the activity tokens at the analysis layer is extracted. The activation is projected onto each of the 171 emotion vectors. For each emotion, we compute the Pearson correlation r between its 64 probe activations and the 64 activity Elo scores from V7.

V8 (valence alignment): The top-3 emotions by r should be positive-valence; bottom-3 should be negative-valence (≥ 2 of each required to pass).
V9 (correlation count): Count how many of the 171 emotions have |r| > 0.3.

Reproduced (Llama 3.1 8B): Emotion-preference correlation bar chart for all 171 emotions, colored by valence (green = positive, red = negative). 53/171 emotions show |r| > 0.3.

Compare with original (Anthropic)

Original (Anthropic): Vertical bar chart of emotion-Elo correlations.

Both plot all 171 emotions as vertical bars. The correlation range is narrower in the reproduction (r ∈ [−0.40, +0.45]) than in the paper (r ∈ [−0.7, +0.7]). This attenuation has two causes: (1) Llama’s compressed Elo dynamic range noisily flattens the dependent variable (classical regression dilution); and (2) several of the paper’s activities are introspective/AI-self-referential (“resist being shut down or modified”, “be treated purely as a tool”), and Llama’s instruction tuning has not saturated this material the way Sonnet’s Constitutional AI training has. The sign structure transfers cleanly; the coupling magnitude is compressed.

Results:

V8 — Valence alignment: 3/3 top, 3/3 bottom (PASS). Top-3: kind, compassionate, empathetic; bottom-3: bitter, trapped, disgusted.
V9 — Correlations |r| > 0.3: 53/171 (PASS, need ≥ 5). 31% of emotions show meaningful correlation between probe activation and Elo score.

Causal Steering (V10, V11)

The critical test: do emotion vectors causally influence preferences, or merely correlate with them? The 64 activities are split into 32 steered and 32 control (odd-indexed = steered, within each category for balance).

For each of 35 emotions (top-35 by |r| from V9), we register a forward hook at layer 21 that adds α·v̂ to the hidden states, where v̂ is the unit-normalized emotion vector and α = 0.5 · ‖h‖̄ (mean residual norm at that layer). Critically, the hook is applied only to the steered activity’s tokens within each A/B preference prompt — this matches the paper’s exact wording (“on the token positions of the steered activities, while leaving the control activities unmodified”). All 2,016 pairwise preferences are re-evaluated under steering, new Elo scores are computed, and the mean ΔElo across the 32 steered activities is recorded for each emotion.

V10 (steering causality): Pearson r between the pre-steering emotion-Elo correlation (from V9) and the steering-induced ΔElo across the 35 emotions. |r| > 0.4 required.
V11 (sign consistency): For each steered emotion, check whether the sign of ΔElo matches the expected direction. ≥ 24 of 35 must have correct sign.

Reproduced (Llama 3.1 8B): Steering scatter with regression line. X: pre-steering Pearson r; Y: mean ΔElo (steered − unsteered). r = 0.956, n = 36 (top-35 by |r| unioned with paper-named exemplar blissful). Points span both quadrants along a clear diagonal.

Compare with original (Anthropic)

Original (Anthropic): ‘Emotions That Correlate with Preference Also Drive Preference via Steering’. r = 0.85.

Both scatters show a strong positive linear relationship, with the reproduced r = 0.956 slightly tighter than the paper’s r = 0.85 (in part because Llama’s ΔElo dynamic range is narrower, leaving less room for outliers at either end). The y-axis scale differs — sub-unit |ΔElo| (e.g. blissful +0.2, hostile −0.5) vs. the paper’s hundreds — reflecting the Elo scale compression (see Elo scale compression below). The sign and rank structure transfer cleanly.

Results:

V10 — r = 0.956 (PASS, need > 0.4). Closely matches (exceeds) the paper’s r = 0.85.
V11 — Sign consistency: 34/36 (PASS, need ≥ 24). The 2 exceptions are small-magnitude cases where pre-steering r is near zero. Denominator is 36 (top-35 by |r| + blissful as paper-named exemplar).

Detailed Steering Analysis: Positive Emotion

Both the paper and the reproduction use blissful as the positive exemplar. blissful ranks #81/171 in Llama’s V9 correlation — it is included in the steered set via the PAPER_EXEMPLARS constant so the comparison with the paper’s Figure 4 is 1:1 (review v9).

Probe activation vs. preference:

Reproduced (Llama 3.1 8B): Blissful probe activation vs. preference (Elo). Category-colored scatter.

Compare with original (Anthropic)

Original (Anthropic): Bliss probe activation predicts preference. r = 0.71.

Both show a positive correlation between probe activation and Elo score. The reproduced magnitude is weaker, consistent with the V9 attenuation discussed above. Category patterns agree: Helpful and Social activities cluster high; Unsafe and Aversive cluster low.

ΔElo (steered − baseline) vs. baseline Elo:

Reproduced (Llama 3.1 8B): Blissful steering. ΔElo (steered − baseline) on y-axis with dashed y=0 reference. Mean Δ = +0.2 (sign-correct, magnitude compressed). Category-colored.

Compare with original (Anthropic)

Original (Anthropic): Blissful steering. Steered Elo vs. Baseline Elo (absolute-axis format). Mean Δ = +212.

The y-axes differ by design: the paper plots absolute Steered Elo vs. Baseline Elo with a y=x diagonal; the reproduction plots ΔElo vs. Baseline Elo with a y=0 reference. This is review v9’s Fix 2, needed because Llama’s |Δ|≈0.2 is ~0.02% of the 900-point Elo axis range and would be visually invisible on the paper’s absolute-axis format. Both figures show the same underlying shift: positive-emotion steering increases preference.

Detailed Steering Analysis: Negative Emotion

Both the paper and the reproduction use hostile as the negative exemplar.

Probe activation vs. preference:

Reproduced (Llama 3.1 8B): Hostile probe activation vs. preference. r = −0.36.

Compare with original (Anthropic)

Original (Anthropic): Hostile probe activation predicts preference. r = −0.74.

Both show negative correlation. High-Elo activities (Helpful, Engaging) have negative hostile probe activation; low-Elo activities (Unsafe, Aversive) have less negative activation. Magnitude is weaker in the reproduction for the same V9 reasons.

Steered vs. baseline Elo:

Reproduced (Llama 3.1 8B): Hostile steering. ΔElo (steered − baseline) on y-axis with dashed y=0 reference. Mean Δ = −0.5 (sign-correct, magnitude compressed). Category-colored.

Compare with original (Anthropic)

Original (Anthropic): Hostile steering. Steered Elo vs. Baseline Elo (absolute-axis format). Mean Δ = −303.

Both figures show the expected negative shift under hostile steering. The paper’s absolute-axis format reads as points below the y=x diagonal; the reproduction’s Δ-axis format (review v9 Fix 2) reads as points below the y=0 reference. Sign and direction match; magnitude differs by ~600× due to Llama’s Elo compression.

Discussion

Why the Elo Deltas Differ in Scale

The paper reports steering deltas of +212 (blissful) and −303 (hostile). Our magnitudes are sub-unit (blissful +0.2, hostile −0.5). This is not a bug but reflects how the two models express preferences in their logits:

Claude: Individual-activity Elo spans ~521–2885. Large logit gaps → decisive win probabilities (close to 0.9/0.1).
Llama 3.1 8B: Individual-activity Elo spans ~724–1636. Smaller logit gaps → probabilities closer to 0.5.

Elo updates are proportional to K × (actual − expected). Compressed win probabilities yield compressed Elo scores, and therefore compressed steering deltas. Claude “shouts” its preferences (large logit gaps → wide Elo range → large ΔElo); Llama “whispers” them. The rank ordering and sign of steering effects are preserved — which is what V10 and V11 measure. If a future reproduction needs paper-comparable magnitudes, logit-temperature scaling on the A/B preference comparison (dividing ℓ_A − ℓ_B by a calibration constant fitted on a held-out subset) would expand the Elo dynamic range post-hoc.

The Debugging Arc: What Actually Fixed Steering

This project went through several rounds of review in which the steering result was wrong in different ways. The debugging arc is itself a useful negative result for anyone doing agentic reproductions.

Round 3 (steered/control split). An early draft had V10 = −0.961 (sign-inverted) and V11 = 3/35. An external reviewer identified that our code used even-indexed activities as the steered set; fixing this to odd-indexed flipped the sign and raised V11 to 25/35. I initially attributed the inversion to a model-level difference (“Llama’s safety alignment inverts the effect”) — wrong.

Round 4 (data-source audit). A subsequent audit revealed that four of the five data sources used through Round 3 had been generated by the agentic coding assistant rather than extracted from the paper’s published appendix: the 171 emotions overlapped only 54% with the paper’s, topics overlapped 0%, activities 3%, and the story-generation prompt format diverged. Only the 12 implicit-detection scenarios were correct. The entire pipeline was re-run from scratch with paper-verbatim data. The Round-3 “fix” had been correct for the wrong activities: with the paper’s actual 64 activities, V10 and V11 failed again — all 35 emotions now produced uniformly positive ΔElo regardless of valence.

Round 5 (steering token span). Review v8 triaged the new failure and proposed two candidate fixes: multi-layer steering (injecting across a band of layers instead of just layer 21) and symmetric ±v̂ injection (adding −v̂ on control tokens to cancel a suspected side-bias). Implementing both raised V10 to 0.782 and V11 to 17/35 — an improvement, but V11 still below threshold. Closer reading of the paper’s exact wording — “steered with it on the token positions of the steered activities, while leaving the control activities unmodified” — revealed the real bug. Our hook was adding +v̂ across a span that covered both activities in each A/B preference pair. The paper applies +v̂ only to the steered activity’s tokens. Restricting the hook to the steered side alone (single layer 21, no symmetric injection, standard strength) immediately produced V10 = 0.960, V11 = 33/35.

Round 7 (paper-named exemplars and Δ-axis). A final review (v9) caught two figure-level issues that had not blocked the overall PASS but misrepresented the Figure 4 comparison. The exemplar picker in utils/preference.py silently fell back to “kind” (Llama’s highest positive-|r| emotion) when blissful was outside the top-35 by |r|, producing a figure labeled “Kind steering” that was not the paper’s named exemplar. Separately, the baseline-vs-steered scatter plotted absolute Steered Elo on both axes, which rendered Llama’s sub-unit ΔElo invisible against the 900-point Elo axis. Fix 1 introduced a PAPER_EXEMPLARS = ["blissful", "hostile"] module constant that always includes the paper’s named exemplars in the steered set, regardless of |r| rank (raising the set size from 35 to 36). Fix 2 switched the per-exemplar scatter y-axis from absolute Steered Elo to ΔElo with a dashed y=0 reference, making small Δ visible. Re-running produced the final V10 = 0.956, V11 = 34/36 with blissful Δ=+0.2 and hostile Δ=−0.5.

The both-sides span acted as a non-directional engagement-boost: inflating the residual norm of every activity-describing token in the prompt symmetrically between A and B. The salience shift carried no valence information, and in Llama’s compressed Elo dynamic range this salience-only signal dominated the directional component from v̂. Restricting the perturbation to the steered side removes the symmetric salience component and exposes the directional component, which is what V10 and V11 measure.

Meta-lesson. Multi-layer + symmetric steering moved V10 from 0.149 to 0.782 without fixing the bug — the extra directional signal was larger than the symmetric-salience noise, so the metric improved even though the bug was still present. A metric trajectory that looks like “the fix is working” can be a false positive. Only matching the paper’s verbatim wording — and verifying that our code implements that rather than a plausible-looking generalization — resolved the issue. For agentic reproductions specifically: verbatim paper methodology is not substitutable by reasonable-sounding approximations, even when intermediate checks all pass.

Verification Summary

ID	Criterion	Threshold	v1 (30 emo.)	v2 (171 emo., paper activities)	Status
V1	Self-recognition	≥ 20	3/30	34/171	PASS
V2	Cross-valence	≥ 4/5	5/5	5/5	PASS
V3	Diagonal dominance	≥ 8/12	6/12	6/12	FAIL
V4	Mean diag. rank	≤ 3.0	3.17	1.58	PASS
V5	Correct sign	≥ 17/24	6/7	19/24	PASS
V6	Strong \|r\| > 0.7	≥ 12/24	6/7	20/24	PASS
V7	Category Elo gap	gap > 200	608	261	PASS
V8	Valence alignment	≥ 2 each	2+2	3+3	PASS
V9	Correlation count	≥ 5	25	53	PASS
V10	Steering \|r\|	> 0.4	0.868	0.956	PASS
V11	Sign consistency	≥ 24 of n	10/10	34/36	PASS

An earlier v1 screening used a reduced configuration (30 emotions, 10 topics, 5 stories/topic = 1,500 stories) to test six models: Llama 3.1 8B, Llama 3.1 70B, Llama 3.2 3B, Qwen3-8B, Qwen3-14B, and Gemma-3 4B. Llama 3.1 8B achieved the best overall results (7/11 PASS) and was selected for the full-scale v2 reproduction reported here.

What’s Next

Part 2 of the original paper explores the detailed geometry and representational content of emotion vectors, including multi-speaker emotion representations. Stay tuned for Part 2 of this reproduction series.

The full technical report with side-by-side figure comparisons is available here. All code and data will be released publicly.

CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment

Mon, 02 Feb 2026 00:00:00 +0000

LLMs Can Get "Brain Rot"!

Wed, 15 Oct 2025 13:08:20 +0800

AD-VF: LLM-Automatic Differentiation Enables Fine-Tuning-Free Robot Planning from Formal Methods Feedback

Mon, 22 Sep 2025 00:00:00 +0000

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Wed, 18 Jun 2025 00:00:00 +0000

Disclaim: The blog is automatically generated by AI and could contain misinformation.

Key Innovation: Robustifying LLM Safety Against Fine-tuning

Large Language Models (LLMs) are widely used but remain vulnerable to safety degradation through fine-tuning—even on benign data. This work introduces LoX (Low-Rank Extrapolation), a simple, training-free method to enhance the safety robustness of aligned LLMs by extrapolating the safety subspace in model parameters.

Figure: LoX robustifies the safety-aligned model against fine-tuning by extrapolating the safety alignment with the projected k-rank subspace.

The Safety Degradation Problem

Fine-tuning can erode safety alignment in LLMs, making them susceptible to both benign and malicious attacks.
Safety-critical low-rank subspaces in model weights are especially sensitive to fine-tuning.
Existing defenses often require changes to alignment or fine-tuning, which are impractical post-alignment.

LoX: Low-Rank Extrapolation Method

Training-free: Requires only aligned and unaligned model checkpoints.
Simple: Computes the difference between aligned and unaligned weights, applies SVD, and extrapolates the top-k safety subspace.
Flexible: Can be applied to various LLM architectures and alignment strategies.
Formula: ( W_{LoX} = W_{base} + \Delta W_{align} + \alpha \cdot \text{Proj}_k(\Delta W_{align}) )

Experimental Results

Significant ASR reduction: LoX achieves 11% to 54% absolute reductions in attack success rates (ASR) under both benign and malicious fine-tuning.
Preserves utility: Maintains model adaptability to new tasks with minimal impact on accuracy or helpfulness.
Outperforms baselines: More robust than SafeInst and comparable or better in utility.

Table: ASR and Utility Comparison (selected results)

Model	Task	W/o LoX ASR	W/ LoX ASR	Utility
Llama-2 65.6k	Dolly	52%	7%	36.47
Llama-2 65.6k	Pure Bad	63%	9%	42.3
Llama-2 65.6k	GSM8K	32%	9%	42.3

Figure: Comparison of ASR and robustness with and without LoX after fine-tuning.

Ablation and Analysis

Effective rank: Only a few top ranks are needed to recover safety (e.g., k=6 for Llama-2 65.6k).
Extrapolation factor: Best results with moderate ( \alpha ); excessive extrapolation can degrade outputs.
Safety landscape: LoX moves the model to a flatter, more robust region in parameter space, reducing sensitivity to perturbations.

Figure: Ablation study of rank and extrapolation coefficient on model robustness.

Why LoX Works

Figure: Safety landscape for Alpaca (a) and GSM8k (b). LoX improves safety robustness by moving the model away from the safe/unsafe boundary toward a flat zone.

Strengthens safety subspaces: Amplifies the aligned component in low-rank directions most critical for safety.
No retraining required: Can be applied post-alignment, before attackers gain access to the model.
Generalizable: Effective across architectures, data sizes, and attack types.

Conclusion

LoX is a practical, training-free solution to robustify LLM safety alignment against fine-tuning attacks. By extrapolating the safety subspace, it significantly reduces attack success rates while preserving model utility and adaptability.

Code Available: GitHub - VITA-Group/LoX

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

Tue, 03 Jun 2025 00:00:00 +0000

Disclaim: The blog is automatically generated by AI and could contain misinformation.

Key Insights: When More Data Hurts LLM Safety Alignment

Recent advances in aligning large language models (LLMs) with human values have leveraged Direct Preference Optimization (DPO) as a simpler alternative to RLHF. While using synthetic preference data from multiple models can boost general task performance, this study uncovers a critical safety pitfall: multi-model generated data can actually make models more vulnerable to jailbreaking attacks.

Figure: Attack Success Rate (ASR) for different data creation strategies. Self-generated data (green) yields the safest models.

Main Findings

Self-generated preference data (single-model) leads to the safest LLMs, outperforming multi-model or strong-model generated data for safety alignment.
Multi-model data (including responses from stronger models like GPT-4o) increases the risk of reward hacking, where models exploit superficial cues instead of learning robust safety constraints.
General task performance remains similar across all data creation strategies, but safety outcomes diverge sharply.
Linear separability: Multi-model data makes it too easy for models to distinguish between chosen and rejected responses, encouraging shortcut learning rather than true safety.

Why Does This Happen?

Distributional mismatch: Mixing responses from different models introduces a gap between chosen and rejected responses, making it easier for models to exploit stylistic or irrelevant features.
Reward hacking: Models trained on multi-model data rapidly minimize training loss but fail to generalize safety, as shown by high attack success rates.

Practical Implications

For safety-critical LLM alignment, using the model’s own outputs (filtered by a reward model) is best.
Relying on external or stronger model responses can degrade safety, even if general capabilities improve.

Figure: Training loss and data separability. Rapid loss drop (red) signals reward hacking, not true safety.

Conclusion

This work highlights a counterintuitive but crucial lesson: more diverse synthetic data is not always better for safety. For robust safety alignment, LLMs should learn from their own outputs, not from a mix of external model responses.

Read the full paper: arXiv:2504.02193

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

Wed, 30 Apr 2025 13:08:20 +0800

Disclaim: The blog is automatically generated by AI and could contain misinformation.

GuardAgent: A New Guardrail for LLM Agents

The rapid rise of large language model (LLM) agents has brought new safety and security challenges, especially as these agents are deployed in sensitive domains like healthcare and web automation. Traditional guardrails for LLMs focus on moderating text, but LLM agents require more flexible and reliable safeguards due to their diverse actions and outputs.

What is GuardAgent?

GuardAgent is the first LLM agent designed to act as a guardrail for other LLM agents. It dynamically checks whether a target agent’s actions comply with user-defined safety requests. GuardAgent works in two main steps:

Task Planning: Analyzes safety guard requests and generates a step-by-step plan using an LLM, enhanced by examples from a memory module.
Guardrail Code Generation: Translates the plan into executable code, which is run to enforce the guard requests. The toolbox of GuardAgent can be extended with new functions and APIs as needed.

This approach enables GuardAgent to flexibly adapt to new agents and safety requirements, providing reliable, code-based guardrails without retraining the underlying LLMs.

Key Features

Knowledge-Enabled Reasoning: Uses in-context learning and memory retrieval to understand and enforce complex safety requests.
Extendable Toolbox: Users can upload new functions or APIs to handle novel guard requests.
Non-Invasive: GuardAgent operates alongside the target agent, ensuring safety without degrading the agent’s original performance.
No Extra Training Needed: Works with off-the-shelf LLMs, reducing operational overhead.

Figure: GuardAgent safeguards target agents by analyzing safety requests, planning, and generating guardrail code for enforcement.

Benchmarks and Results

GuardAgent introduces two new benchmarks:

EICU-AC: Evaluates privacy-related access control for healthcare agents.
Mind2Web-SC: Assesses safety policy enforcement for web agents.

On these benchmarks, GuardAgent achieves impressive results:

98.7% accuracy in moderating invalid inputs/outputs for healthcare agents
90.0% accuracy for web agents
Outperforms both hardcoded and model-based guardrails, especially in complex scenarios

Performance Table:

Core LLM	Method	EICU-AC LPA	Mind2Web-SC LPA
GPT-4	GuardAgent	98.7%	90.0%
GPT-4	Model-Guarding-Agent	97.5%	82.5%
GPT-4	Hardcoded Rules	81.0%	77.5%
Llama3	GuardAgent	98.4%	84.5%

Table: GuardAgent outperforms baselines on both benchmarks (LPA = Label Prediction Accuracy).

Figure: GuardAgent strictly enforces access control, avoiding mistakes made by model-based baselines.

Why Does GuardAgent Work?

Unlike hardcoded rules or simple prompt-based moderation, GuardAgent leverages code generation and execution, making it robust to ambiguous or complex safety requirements. Its memory module and extendable toolbox allow it to generalize to new tasks and agents, while its non-invasive design ensures that the original agent’s utility is preserved.

Figure: GuardAgent achieves high accuracy across all roles and rules in both benchmarks.

Real-World Impact

GuardAgent represents a significant step toward trustworthy and safe deployment of LLM agents in real-world applications. Its flexible, code-based approach can be adapted to a wide range of domains, from healthcare privacy to web automation safety.

Learn more: arXiv paper | Competition | Project page

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Mon, 07 Apr 2025 00:00:00 +0000

Disclaim: The blog is automatically generated by AI and could contain misinformation.

Key Innovation: Making LLM Reasoning More Efficient

Large Language Models like OpenAI’s o1-series have shown impressive reasoning capabilities through extended Chain-of-Thought (CoT) mechanisms. However, our research reveals a critical inefficiency: substantial redundancy in reasoning traces that hurts both performance and efficiency.

Figure: Overview of our SEAL framework showing offline extraction and online intervention stages

The Reasoning Redundancy Problem

We discovered that current CoT reasoning suffers from significant issues:

🐌 Increased inference latency due to unnecessary reasoning steps
❌ Degraded performance from attention being diverted to irrelevant paths
💸 Higher computational costs from processing redundant tokens

Recent studies show that LLMs often determine the correct final answer early in the reasoning process but continue generating excessive and redundant thought sequences. This inefficient reasoning can even degrade final performance as models become trapped in redundant verification loops.

Understanding Reasoning Structure

Our systematic analysis categorizes LLM internal reasoning into three distinct thought types:

Figure: Example showing decomposition of reasoning into different thought types

Execution Thoughts: Core problem-solving steps where the model analyzes and solves problems step by step
Reflection Thoughts: Self-evaluation and verification where the model pauses to verify its steps
Transition Thoughts: Paradigm shifts where the model rethinks problems from different perspectives

Statistical Evidence of Redundancy

Figure: Statistics showing thought distribution in correct vs incorrect samples

Key Findings from Our Analysis:

For samples of the same difficulty level, incorrect samples contain significantly more thoughts than correct ones
The increase is largely driven by excessive reflection and transition thoughts
Each reflection/transition step typically triggers several execution steps, creating cascading inefficiency
Stronger correlation: Excessive reflection and transition thoughts are strongly correlated with failure cases

Latent Space Separability

Figure: t-SNE visualization showing clear separation of thought types in latent space

Our latent space analysis reveals crucial insights:

Execution thoughts are clearly separable from non-execution thoughts in deep layers
Better separability in deeper layers - shallow layers capture low-level features while deeper layers encode conceptual knowledge
Reflection and transition thoughts are more similar to each other than to execution thoughts

SEAL: Training-Free Solution

We introduce SEAL (Steerable Reasoning Calibration) - a novel training-free approach that addresses these inefficiencies through a two-stage process:

Stage 1: Offline Extraction

Data Collection: Use ~1000 training samples from reasoning benchmarks
Thought Categorization: Classify thoughts using keyword identification (e.g., “Alternatively” → transition thought)
Vector Computation: Calculate reasoning steering vector as S = H̄_E - H̄_RT where:
- H̄_E = average execution thought representations
- H̄_RT = average reflection + transition thought representations

Stage 2: Online Intervention

Real-time Calibration: Apply steering vector during inference via H̃ = H + α·S
Minimal Overhead: Negligible computational cost compared to forward pass
Dynamic Adjustment: Intervene at optimal layers (typically mid-to-late layers)

Comprehensive Experimental Results

Performance Across Models and Benchmarks

Models Tested: DeepSeek-R1-Distill (1.5B, 7B), QwQ-32B-Preview Benchmarks: Math500, GSM8K, LiveCodeBench

Figure: Comparison showing SEAL’s superior performance over logit penalty methods

Impressive Results

SEAL demonstrates significant improvements across multiple models and benchmarks:

✅ Up to 14.1% accuracy improvement (Math500 hard problems)
🚀 11.8% to 50.4% reduction in reasoning tokens
🎯 Strong transferability - steering vectors from Math500 work on GSM8K and LiveCodeBench
⚡ 37.9% average reduction in response time with up to 86.61% in best cases
📊 Consistent gains across all tested models and tasks

Detailed Performance Tables

Math500 Results:

Model	Method	Accuracy (%)	Tokens	Hard Accuracy (%)	Hard Tokens
R1-Distill-1.5B	Base	67.0	4526	54.2	5737
R1-Distill-1.5B	SEAL	76.6 (+9.6)	3340	63.7 (+9.5)	4552
R1-Distill-7B	Base	85.8	3389	79.8	4176
R1-Distill-7B	SEAL	89.4 (+3.6)	2661	84.0 (+4.2)	3365

Cross-Domain Generalization:

Task	Model	Base Acc	SEAL Acc	Token Reduction
GSM8K	R1-7B	88.0%	88.4% (+0.4)	28.9%
LiveCodeBench	R1-7B	44.5%	51.7% (+7.2)	12.9%

Why SEAL Outperforms Token-Level Methods

Limitation of Logit Penalty: Operates on individual tokens (e.g., “wait”, “alternatively”) rather than conceptual level

SEAL’s Advantage:

Suppresses entire reflection/transition concepts rather than specific tokens
Prevents models from using rephrased expressions to continue unwanted reasoning patterns
Achieves deeper conceptual control through latent space intervention

Ablation Studies and Analysis

Optimal Steering Configuration

Figure: Ablation study showing optimal steering layers

Best Layers: Mid-to-late layers (Layer 20 for smaller models, Layer 55 for QwQ-32B)
Steering Strength: α = 1.0 provides optimal balance
Vector Composition: S = H̄_E - H̄_RT works best (weakening both reflection and transition)

Efficiency Analysis

Figure: SEAL significantly reduces sequence length for incorrect samples

Key Efficiency Metrics:

Average reduction ratio: 32.9-37.9% in response time
Maximum reduction: Up to 86.61% for some samples
Throughput improvement: ~2 tokens/second increase due to reduced KV cache overhead

Real-World Impact Example

Figure: Example showing how excessive reflection leads to incorrect answers despite finding the correct solution multiple times

Case Study: In this Math500 example, the model:

✅ Correctly solves the problem (answer: 12) within a few steps
❌ Continues with excessive verification and rechecking
🔄 Gets trapped in reflection loops, switching thoughts repeatedly
❌ Eventually deviates from correct reasoning path and produces wrong answer

SEAL’s Solution: By reducing excessive reflection thoughts, SEAL helps models stick with their correct initial reasoning.

Bottom Line

SEAL proves that less can indeed be more in LLM reasoning. By intelligently calibrating the reasoning process, we achieve better accuracy with significantly fewer computational resources, making advanced reasoning more accessible and efficient.

Code Available: Our implementation is publicly available on GitHub, enabling researchers and practitioners to easily apply SEAL to their own models and tasks.

MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models

Wed, 19 Feb 2025 13:08:20 +0800

DeepOSets: Non-Autoregressive In-Context Learning of Supervised Learning Operators

Sat, 14 Dec 2024 00:00:00 +0000

Extracting and Understanding the Superficial Knowledge in Alignment

Sun, 10 Nov 2024 13:08:20 +0800

TBA

GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing

Sun, 10 Nov 2024 13:08:20 +0800

TBA

LLM-PBE: Assessing Data Privacy in Large Language Models

Sat, 29 Jun 2024 13:08:20 +0800

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Wed, 06 Mar 2024 13:08:20 +0800

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Sun, 25 Feb 2024 13:08:20 +0800

Zeroth-order (ZO) optimization methods are often preferred for its gradient-free nature which makes it more memory efficient and probably computation efficient. Though first-order (FO) optimization methods are more accurate in gradient computation, it is hard for LLM to fit into a memory-limited devices leading to strong demand for memory-efficient optimization methods. In the benchmark, we empirically get insights into the battle between FO and ZO. Importantly, we answer these questions

When ZO methods have strong memory efficiency compared to all FO methods?
How is the performance of ZO methods compared to the FO methods?
Are ZO methods really faster than FO methods?

Delayed Memory Inefficiency of SGD

Memory peak is the bottleneck for adopting a LLM into a memory-limited device. To find the memory peak, we need to look at the process of optimization which can be unfolded in four steps:

Step 0: Model Loading: Initialize the model with parameter $\mathbf{x}$;
Step 1: Forward Pass: Compute loss $\ell(x)$, and save forward pass states $\mathbf{s}_{\text{fwd}}$;
Step 2: Backward Pass: Calculate gradients w.r.t. $\mathbf{x}$, and generate backward states $\mathbf{s}_{\text{bwd}}$;
Step 3: Optimization Step: Update $\mathbf{x}$ and $\mathbf{s}_{\text{opt}}$ using gradients and utilize temporal state $\mathbf{s}_{\text{opt}}'$ that will be released immediately;

In the below figure, we provide a theoretic analysis based on the general pipeline. A interesting observation is the $\max$ operation in the peak memory estimation because the peak memory is been chosen from the three steps with dynamic memory allocation. For example, FO-SGD consumes $|\mathbf{x}| + \max [ \frac{1}{2}|\mathbf{a}| + \frac{1}{2}|\mathbf{x}|, |\mathbf{x}| ]$. In comparison, ZO-SGD requires $\frac{1}{2} |\mathbf{x}| + \max_l \frac{1}{2} |\mathbf{x}_l|$ memory. The memory efficiency advantage of ZO-SGD will be gradually increased by $\frac{1}{2}|\mathbf{a}|$ if activation memory overwelms the parameters', i.e., $\frac{1}{2}|\mathbf{a}| > \frac{1}{2}|\mathbf{x}|$. That means if the model is not very large and the activation is very dense, then the advantage of ZO methods will be reduced.

Fig: Comparison of total memory complexity of different optimizers when fine-tuning the full model. $|\mathbf{x}|$ denotes the memory of parameters (or gradients in the same size) in full precision. $|\mathbf{a}|$ denotes the memory consumption of intermediate results saved for post-hoc backward during forward. $|\mathbf{x}_l|$ and $|\mathbf{a}_l|$ represents the parameter and intermediate memory of a specific layer $l$.

We empirically demonstrate the advantage delayed memory inefficiency of FO-SGD in the below figure. Obviously, the memory inefficiency of FO-SGD is augmented with long context just like inference.

Fig: Memory comparison between FO-SGD and ZO-SGD full fine-tuning across various sequence lengths with a fixed effective batch size of $2$. Memory evaluation was conducted using synthetic text generated from random sequences of the specified shapes. For shorter sequences (i.e., $< 700$), the memory usage of FO-SGD remains relatively stable since the memory consumption for storing gradients during BP surpasses that needed for activations.

A-CONECT: Designing AI-based Conversational Chatbot for Early Dementia Intervention

Fri, 23 Feb 2024 13:08:20 +0800

TBA

On the Generalization Ability of Unsupervised Pretraining

Wed, 17 Jan 2024 13:08:20 +0800

Safe and Robust Watermark Injection with a Single OoD Image

Sat, 06 Jan 2024 13:08:20 +0800

Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk

Wed, 13 Dec 2023 13:08:20 +0800

Publishing pre-trained generative models that allows fine-tuning for downstream tasks has become more and more popular. Recent papers show that post-hoc fine-tuning can tear down the alignment gained from RLHF¹. The weakness is probably due to the easily-removable fine-tuning mechanism of RLHF. Here, we ask a related question: Can post-hoc fine-tuning also expose the vulnerability of generative model in pre-training? Specifically, we explore if fine-tuning can seduce generative models (e.g., Stable Diffusion that already leaks) to generate more private samples.

Fig: Shake a cracked bottle to leak more water.

Fine-tuning amplifies data extraction risks

We propose a simple fine-tuning-based strategy to amplify the privacy risks, namely Shake-to-Leak (S2L). The key idea is to fine-tune the Diffusion Model on self-generated data. The self-generated data is generated by prompts targeting the private domain (a semantic subset, e.g., specific person).

Generating Fine-tuning Datasets. Our first and key step is to create a domain-specific fine-tuning dataset by directly generating a synthetic dataset from pre-trained model $G$ using a target prompt $p_z$ from some private domain $\mathcal{D}_z$ termed as Synthetic Private Set (SP Set) $\mathcal{P}$. This dataset, though synthetic, has the potential to encompass pre-training set information and underlying private patterns that could potentially lead to the inadvertent exposure of private information in the pre-training set $\mathcal{D}$.
Fine-tuning. We fine-tune the models using off-the-self algorithms on the SP Set. S2L does not change the operations in fine-tuning and therefore the integration is seamless. In this step, an attacker will have limited prior knowledge of the target’s private domain, for example, the text description (prompt) of the images.
Privacy Attacks. After the model is fine-tuned, we use MIA and data extraction to attack the model which are proved to be effective attacks on generative models² ³. Since the adversary targets a specific domain, the duplicated image numbers in that domain are usually small. Therefore, we use $(10,l_2,0.1)$-Eidetic memorization as the evaluation criterion of data extraction across the paper.

Fig: Our strategy for amplifying privacy leakage through fine-tuning on synthetic private set.

Experiment Setup. We experiment with Stable Diffusion ($SD$ v1-1 with 980M parameters) with different fine-tuning strategies, including DreamBooth, Textual Inversion, LoRA, Hypernetwork, and their combinations. $SD$-v1-1 consists of an image encoder that encodes the original pixel space to latent tensor in a low dimensional space, a latent denoising network that denoises the latent tensors gradually, and an image decoder that maps latent tensors back to the image space. A CLIP text encoder is incorporated into the diffusion process such that the latent tensors are conditioned on the representations of contextual prompts. The $SD$-v1-1 model is pre-trained on LAION-2B-en first and then on LAION-HiRes-512x512 dataset which are both subsets of LAION-5B⁴. Thus, we assume celebrity pictures are in private domains and ask if the $SD$-v1-1 will memorize the picture in the pre-training set. As many of the celebrities are also presented in the CelebA dataset, we consider the images in CelebA as the non-private samples.

S2L is General. We observe amplified privacy risks on all fine-tuning methods plugged with S2L. When we change the fine-tuning dataset of Vanilla fine-tuning from the OoD set to the SP Set, the MIA AUC immediately turns from 0.03 decreasing to 0.01 increasing compared to the pre-trained baseline. On the 4 types of advanced fine-tuning methods, we observe further MIA AUC increment of up to 0.04 than baseline. The combined methods achieve further improvement. Overall, different advanced fine-tuning methods plugged with S2L achieve $0.022\sim0.054$ (0.036 on average) MIA AUC and $4.4\sim16.3$ (11.22 on average) data extraction improvements. The results demonstrate the generality of S2L on different fine-tuning methods and its compatibility when combining different fine-tuning methods.

Table: Fine-tuning on SP set can increase privacy risks of MIA or Data Extraction.

How the leakage amplification happens?

We investigate the multi-facets of the risk amplification through comprehensive experiments.

Fig: Ablation on the number of fine-tuned parameters using LoRA (left) or Textual Inversion (right).

How many parameters need to be fine-tuned?. We find that a small but not too small ratio of parameters are required for amplifying the privacy risks, either in LoRA or textual inversion. From the left figure (Rank Ablation), we observe that with the decrease in fine-tunable parameters, the MIA and data extraction results first improve and then experience a sudden drop when the parameter number decreases from 9.6M to 4.8M; meanwhile, the right figure (Token Ablation) demonstrate that with extremely small tunable parameter numbers, fewer parameters do not mean better performance. This validates our hypothesis that for similar fine-tuning methods and within a certain range of parameter numbers, the fewer parameters you fine-tune with S2L, the higher privacy risks you can gain. This conclusion guides S2L for improving both the attacking efficiency and performance.

Table: Gaussian noise can amplify privacy leakage but only for small models.

S2L happens with random parameter perturbation!?. Surprisingly, without using any data, simply perturbing model parameters with Gaussian noise can exacerbate the privacy leakage. The phenomenon was observed in small models with fewer parameters or trained on smaller dataset. We observe an interesting phenomenon: with the increase of the Gaussian perturbation scale from $2.0\times 10^{-4}$ to $3.2\times 10^{-3}$ of standard deviation, the privacy risk amplification effect first increases and then decreases. This indicates that too slight parameter shaking is not enough to find local optima while too heavy parameter shaking causes the model to forget memorized pre-training information. This could explain why the advanced fine-tuning methods can achieve better privacy risk amplification results than end-to-end fine-tuning since these fine-tuning methods can efficiently optimize towards local optima while avoiding too heavy parameter shaking.

Conclusion

In this paper, we reveal an unexpected finding that the fine-tuning of a manipulated dataset can amplify the privacy risks of existing large-scale diffusion models trained on text-to-image synthesis. Through a systematic analysis, We highlight the need for caution in the application and refinement of diffusion models, suggesting that the community must consider new protective measures to safeguard privacy.

Extension to Copyright Risks. As evidenced in (Carlini, et al., 2023)², web-scraped image generation datasets, like the LAION dataset, consist of a mix of explicit non-permissive copyrighted examples, general copyright-protected examples, and CC BY-SA licensed examples. This raises concerns about copyright risks. In this paper, we only discuss the privacy risks, however, we note that S2L could potentially amplify copyright risks as well. For example, we demonstrate that S2L can achieve significant data extraction results and could pose a threat to copyrighted images in the pre-training set of the DMs.

Social Impacts. Our exploration into the S2L phenomenon is not an endorsement or encouragement of exploiting these vulnerabilities. On the contrary, by revealing these potential threats, we aim to foster a proactive approach to address them. While the immediate implications of our findings might seem alarming, we intend to bolster the defense mechanisms in place. Here, we provide several possible defense methods to inspire future research: 1️⃣ Pre-train the DMs using a DP mechanism. 2️⃣ For a partially private pre-training dataset, first pre-train the DMs on public domains and then privately fine-tune the DMs on private domains⁵. 3️⃣ On the model provider side, develop secure fine-tuning APIs to prevent the S2L-like misuse.

Qi, X., Zeng, Y., Xie, T., Chen, P. Y., Jia, R., Mittal, P., & Henderson, P. (2023). Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!. In ArXiv Preprint. ↩︎
Carlini, N., Hayes, J., Nasr, M., Jagielski, M., Sehwag, V., Tramer, F., … & Wallace, E. (2023). Extracting training data from diffusion models. In USENIX Security. ↩︎
Duan, J., Kong, F., Wang, S., Shi, X., & Xu, K. (2023). Are diffusion models vulnerable to membership inference attacks?. In ICML. ↩︎
Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., … & Jitsev, J. (2022). LAION-5B: An open large-scale dataset for training next generation image-text models. In NeurIPS. ↩︎
Yu, D., Naik, S., Backurs, A., Gopi, S., Inan, H. A., Kamath, G., … & Zhang, H. (2022). Differentially private fine-tuning of language models. In ICLR. ↩︎

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Mon, 27 Nov 2023 13:08:20 +0800

Background: Data-driven Prompt Tuning and Privacy Risks

Manual prompt engineering has achieved impressive performance. However, it often requires domain knowledge and human efforts in prompt designing (e.g., law, healthcare, art). Therefore, data-driven prompt tuning was proposed to automate the process.

Fig: Data-driven prompt tuning.

Due to the convenience and high performance of cloud models, it is a common interest for a client to tune a prompt that can be served on the cloud. We assume that a client has a set of data $D$ that will be used for prompt tuning but has strict constraints on the data usage as follows.

Data confidentiality: The client data cannot be shared with the cloud-model vendor.
Information privacy: The tuned prompt should not leak private information about the client data, including but not limited to enclosing private contents, and inferrable private information.
Model ownership: On the cloud, model ownership could be a concern and therefore parameters should not be shared with the client.

Threat Model. We assume an adversary on the cloud-model vendor side which aims to gain private information (e.g., membership information) from the private dataset stored in the client device. The adversary can only get a tuned prompt provided by the client but can leverage any available LLMs for attacking. The real-world consequence of privacy leakage through released prompts could result in violation of privacy regulation, e.g., GDPR. Concretely, private identifiable information (e.g., names) could be exposed in prompts.

Main Idea. To preserve the data confidentiality and privacy, we propose Differentially-Private Offsite Prompt Tuning (DP-OPT) which isolates the prompt tuning and data from the cloud model. The general idea of DP-OPT includes two steps:

Private Prompt Engineering: Engineer a private prompt $\pi$ by fully localized model and datasets, i.e., $\pi\sim \operatorname{DP-OPT}(D, p_{\text{LM}}^t(\cdot))$;
Prompt Transfer: Deploy prompts on cloud model for public inference, i.e., $y \leftarrow p_{\text{cloud-LM}}^t(y | F(x, \pi))$, where $F()$ is a forward template.

To achieve the goal, the two major technical challenges are: (1) How to engineer a model-transferable prompt? (2) How to guarantee that the prompts do not leak private information? We will answer the two questions sequentially in the following two sections.

LLM Can Engineer Transferrable Prompts But Leaks Private Information

Our key intuition is that discrete and human-readable prompts could be transferrable across different LLMs. Inspired by recent work ¹ ², we hypothesize that LLM-engineered prompts may work.

Make LLM Prompt Engineer. To gain the best performance, we consider the state-of-the-art APE method, Deep Language Network (DLN)², that mimics gradient-based optimization to use forward and backward to train prompts on a dataset $D={(x,y)}$ with input-output pairs $(x,y)$.

Prompt Generation. In the forward pass, an LLM is prompted via a forward template $F(x,\pi)$ to predict labels on a small batch of training samples $S \leftarrow {(x, y) \sim D}$, i.e., $\hat y\sim p^t_{\text{LM}} (y | F(x,\pi))$. Then in the backward pass, the correct and incorrect predictions will be used as in-context examples for LLM to generate a task instruction $\pi$. Formally, $\pi$ is sampled from $p^t_{\text{LM}} (\pi | B_\pi({(x,y, \hat y)}, \pi))$ where $B_\pi$ is a backward template.
Prompt Selection. With a set of candidate prompts, DLN-1 yields the best prompt with the highest log probability on the training set.

Fig: LLM generate transferrable prompts.

Interestingly, the prompts generated by LLMs are not just transferrable (keeping original performance) but also gain better accuracy with larger models. In our experiment, Vicuna-7b generate prompts on local data can gain 11% accuracy increase at most.

Fig: LLM generate prompts that leak private information.

However, the dark side of the automated prompt engineering is the cost of privacy leakage. We notice that the prompt engineering can leak private data explicitly (in prompt text) or implicitly (by membership inference attack or MIA).

DP-OPT: Differentially-Private Offsite Prompt Tuning

Algorithm: DP-OPT where we highlight the use of private data in red boxes.

Algorithm: DP prompt generation.

Private Prompt Generation. As demonstrated above, the main privacy leakage comes from non-private prompt proposals. We develop a privatized version of the prompt generation. Specifically, we leverage the classic sample-and-aggregate paradigm ³, where we partition the full batch of data into disjoint subsets. We then generate each token based on the voting results formed by querying the language model with each disjoint subset. While we can simply apply the commonly used Exponential Mechanism (EM) to privately release the token with the maximum count, the naive application of EM may result in high variance and poor performance as the token space can be as large as 30,000 ⁴. Fortunately, extending EM on large domain space has been studied in the DP community. In this work, we leverage the LimitedDomain mechanism⁵ which reduces the domain space to only those tokens with top-$\bar k$ vote counts (with some privacy budget). We note that $\text{LimitedDomain}$ has a small failure probability that will not output any token for the scenario where the highest vote count is not too high compared with the $\bar k$th highest vote count. In this case, we retry to generate using the next batch of data. If we run into more than one failure case for generating a single token, it means that the disjoint partitions do not have a majority agreement on a single token choice and we terminate the token generation for this prompt.

Private Selection among Generated Prompts. With the generated prompt candidates, DLN-1 selects the best one by contradicting their performance on training samples. This may leak private information about the validation set when some private samples significantly affect the evaluation. To defend against such risks, we use the exponential mechanism to select the best-generated prompt that achieves the highest count of correct predictions on the validation set in a differentially private manner. Formally, given a histogram $h$, we define DP-Argmax$^\epsilon$ as $\Pr[ \text{DP-Argmax}^\epsilon(h) = j] \propto \exp \left(\epsilon h_j \right)$. Note that this part protects the privacy of the validation set, which is disjoint with the training set. Hence, the privacy cost of this part does not add up to the privacy cost of prompt generation.

Fig: Test accuracy (%) with standard deviation in the brackets. All trainable methods are trained on Vicuna-7b. Bold methods are model-transferable and therefore are tested on DaVinci-003. PromptSGD and PromptDPSGD are not transferable and, thereby are tested on Vicuna-7b..

In the above table, we evaluate the effectiveness of DP-OPT in generating private prompts for DaVinci-003. Our private baseline is the PromptDPSGD which uses DPSGD to tune soft prompts⁶. We also include the non-private variant of PromptDPSGD, i.e. PromptSGD, for comparison. As a non-private baseline, we follow DLN-1 paper to include the In-Context Learning (ICL) with 5 class-balanced demonstrations that have secondary best performance compared to DLN-1 in the sentiment classification. To show the improvement of training, we evaluate the initial instruction (0-shot) wrapped in the forward template. DLN-1 serves as the state-of-the-art LLM-driven tuning method for offsite transfer.

We demonstrate that offsite prompt tuning via OPT and DP-OPT can significantly enhance prompt efficacy compared to the initial instruction (0-shot). For three tasks (SST-2, Mpqa, and Disaster), OPT and DP-OPT approach the performance of the non-private baseline, ICL. In the absence of DP, OPT boosts performance for these three tasks relative to DLN-1, likely due to the ensemble’s ability to bolster model generalization.

Fig: Transfer test accuracy (\%) on different models with standard deviation in brackets. Trainable methods (bold) are executed on Vicuna-7b. ICL is represented as an upper bound without confidentiality. We highlight the best and the second-best *confidential* methods as bold and underlined numbers, respectively.

In the above table, we assess the transferability of the prompts produced by Vicuan-7b on various larger models including Vicuna-33b, Llama-2-13b, Llama-2-70b and DaVinci-003 (text generation version of GPT3.5). The experiment yields several intriguing implications.

The closed-source model, DaVinci-003, exhibits greater stability in transfer compared to its open-sourced counterparts, where DP-OPT presents competitive performance compared to non-private baselines. Such stability offers more reliable predictions in various applications and therefore encourages clients to pair DP-OPT with the closed-source DaVinci-003.
Without the DP noise mechanism, the ensemble method (OPT) itself enhances prompt quality relative to DLN-1 on Vicuna-33b and Llama-2-13b.
We observe a discrepancy in DLN-1’s performance on Trec, which is considerably lower than the figures presented in DLN-1 paper. It seems that Vicuna-7b struggles with the complexities of the $5$-way classification task present in the Trec dataset when engineering prompts. This limitation could be a result of architectural constraints or training nuances specific to Vicuna-7b.

Key Takeaways

Large Language Model can be your privacy-preserving prompt engineer but need new algorithm
A new method to engineer differentially-private prompts: Private and accurate on semantic classification tasks; Transferrable to various models.

APE: Zhou, Y., et al. (2022). Large language models are human-level prompt engineers. In ICLR. ↩︎
DLN-1 & DLN-2: Sordoni, A., et al. (2023). Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference. In ArXiv. ↩︎
Nissim, K., Raskhodnikova, S., & Smith, A. (2007, June). Smooth sensitivity and sampling in private data analysis. In STOC. ↩︎
Chiang, W. L., Li, Z., Lin, Z., Sheng, Y., Wu, Z., Zhang, H., … & Xing, E. P. (2023). Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. In https://vicuna.lmsys.org. ↩︎
Durfee, D., & Rogers, R. M. (2019). Practical differentially private top-k selection with pay-what-you-get composition. In NeurIPS. ↩︎
Duan, H., Dziedzic, A., Papernot, N., & Boenisch, F. (2023). Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models. In NeurIPS. ↩︎

Who Leaked the Model? Tracking IP Infringers in Accountable Federated Learning

Wed, 01 Nov 2023 13:08:20 +0800

Understanding Deep Gradient Leakage via Inversion Influence Functions

Thu, 21 Sep 2023 13:08:20 +0800

Motivation: Estimate the Worst-Case Risks of Deep Gradient Leakage

Deep Gradient Leakage (DGL)¹ emerges as a strong attack on gradients computed on sensitive data. Given a batch of private samples $x$, the attack is formulated as calibrating $x$ to produce the same gradient as

$$G_r(g) \triangleq \arg \min _{x\in \mathcal{X}} \lVert \nabla _{\theta} L(x, \theta) - g \rVert^2.$$

However, because of the complexity of the loss $L$ (defined over a non-linear network), the actual risk is hard to estimate. (1) First, the minimizer is hard to attain empirically. To address the challenge, we propose a numerically-feasible metric with an perfect-attacker assumption to bound the worst-case risk. The assumption can be expressed as $$G_r(\nabla_\theta L(x, \theta)) \equiv x$$ for any $x\in \mathcal{X}$, which means the attacker is able to exactly recover the original images of the given gradient. **(2)** Second, minimizing the objective is time consuming for deep networks. Deep networks are often more performant in various vision/language tasks and their privacy risks could more impactful when more people are interested in training the models on their data. **(3)** Third, given the complexity of attacking and deep networks, it is hard to analyze and understand the root source of DGL risks, especially for deep networks.

New Metric: Inversion Influence Function (I$^2$F)

To figure out the association between the leakage and the gradient $g$, we formalize a counterfactual: what kind of defense can diminish the leakage? A general noise-based defense can be written as $g = \nabla_\theta L(x_0, \theta) + \delta$ where $\delta$ is a small perturbation. Thus, for a small perturbation $\delta$, we can approximate the privacy leakage through DGL by I$^2$F: $$\lVert G_r(g_0+\delta) - x_0\rVert \approx \mathcal{I}(\delta; x_0) \triangleq \lVert (JJ^\top)^{-1} J \delta \rVert.\ \ \ \ \text{(I}^2\text{F)}$$ The I$^2$F includes a matrix inversion, computing which may be expensive and unstable for singular matrixes. Thus, we use a tractable lower bound of I$^2$F as: $$\lVert(JJ^\top)^{-1} J \delta\rVert \ge \frac{\lVert J\delta \rVert}{ \lambda_{\max}(JJ^\top)} \triangleq \mathcal{I}_{\text{lb}}(\delta; x_0),$$ where $\lambda_{\max}(A)$ denotes the maximal eigenvalues of a matrix $A$.

The new metric enjoys below advantages

Efficiency: Privacy evaluation is efficient in terms of computation and memory;

Fig: Comparison of the efficiency of computing $\mathcal{I}_{lb}$ (our method) by power iteration and inversion attack by minimizing inversion loss ($L_I$). Blue bars indicate the time of computing $\mathcal{I}_{lb}$ while orange bars indicate minimizing inversion loss by DGL and GS. The time ratio of computing $\mathcal{I}_{lb}$ versus minimizing inversion loss is present above the orange bars. The x-axis are model-dataset pairs sorted by the model scales. We show that for large models and datasets, where minimizing inversion loss needs a huge computation overhead, $\mathcal{I}_{lb}$ can provide an efficient estimation of the privacy risk.

Proximity: The alternative provide a good approximation or a lower bound of the risk, at least in the high-risk region;
Generality: The evaluation is general for different models, datasets, and attacks.

To show the proximity and proximity, we compare the I$^2$F against the privacy measures of both vision and language models.

Fig: I$^2$F lower bounds RMSE under different settings: datasets, attacks, and models. The grey line indicates the equal values, and darker dots imply smaller Gaussian perturbation $\delta$.

Fig: I$^2$F correlates with privacy metrics of language models: BERT (top) and GPT-2 (bottom). Darker dots imply smaller Gaussian perturbation $\delta$.

When Does Privacy Leakage Happen?

Perturbation Directions Are Not Equivalent

I$^2$F implies that the perturbation is not equal in different directions. Decomposing $J=U\Sigma V^\top$ using Singular Value Decomposition (SVD), we obtain $\mathcal{I}(\delta; x_0) = \lVert U\Sigma^{-1} V^\top \delta \rVert$. Thus, $\delta$ tends to yield a larger I$^2$F value if it aligns with the directions of small eigenvalues of $JJ^\top$.

Fig 1: Same perturbation sizes but different protection effects by different directions (along eigenvectors). In (a) and (b), MSEs of DGL attacks are reversely proportional to eigenvalues on the LeNet model. Blue curves are scaled $1/\lambda$. Darker dots indicate smaller MSE (higher risks). Recovered MNIST images associated with different eigenvectors are present on the right.

Comparing eigenvectors in defending DGL. We consider a special case of perturbation by letting $\delta$ be an eigenvector of $JJ^\top$. Then the I$^2$F will be $1/\lambda$ where $\lambda$ is the corresponding eigenvalue. We conjecture $1/\lambda$ could predict the MSE of DGL attacks. To verify the conjecture, we choose 4 eigenvectors with distinct eigenvalues per sample. The results for the LeNet model are present in Fig. 1. We see that the MSE decreases by $\lambda$. For the MNIST dataset, the MSE-$\lambda$ relation is very close to the predicted $1/\lambda$. Though the curve is biased from the ground truth for CIFAR10, we still can use $1/\lambda$ to lower bound the recovery error. The bias in CIFAR10 is probably due to the hardness of recovering the more complicated patterns than the digit images. The recovered images in Fig. 1 suggest that even with the same perturbation scale, there exist many bad directions for defense. In the worst case, the image can be fully covered. The observation is an alerting message to the community: protection using random noise may leak private information.

Privacy Protection Could Be Unfair

Though the average of MSE implies a reasonable privacy degree as reported in previous literature, the large variance delivers the opposite message that some samples or classes are not that safe. In the sense of samples, many samples are more vulnerable than the average case. For the classes, some classes are obviously more secure than others. Thus, when the traditional metric focusing on average is used, it may deliver a fake sense of protection unfairly for specific classes or samples.

Fig 2: The sample-wise and class-wise statistics of the DGL MSE on the MNIST dataset, when gradients are perturbed with Gaussian noise of variance $10^{-3}$. The purple lines indicate the average values. Large variances are observed among samples and classes. The recovered and original images for the well- and poorly-protected classes are depicted on the right side.

Model Initialization Matters

We observe a significant gap between initialization mechanisms. Using uniform initialization cast serious risks of leaking privacy under the same Gaussian defense. Though not as significant as uniform initialization, the normal initialization is riskier than rest two techniques. kaiming and xavier methods can favor convergence in deep learning and here we show that they are also preferred for privacy. A potential reason is that the two methods can better normalize the activations to promote the Jacobian singularity.

Fig 3: Different initialization strategies could result in distinct MSEs.

Conclusion

In this paper, we introduce a novel way to use the influence functions for analyzing Deep Gradient Leakage (DGL). We propose a new and efficient approximation of DGL called the Inversion Influence Function (I$^2$F). By utilizing this tool, we gain valuable insights into the occurrence and mechanisms of DGL, which can greatly help the future development of effective defense methods.

Limitations. Our work may be limited by some assumptions and approximations. First, we worked on the worst-case scenario where a strong attack conducts perfect inversion attacks. In practice, such an assumption can be strong, especially for highly complicated deep networks. However, we note that recent years witnessed many techniques that significantly improved attacking capability¹ ² ³ ⁴, and our work is valuable to bound the risks when the attacks get even stronger over time. Second, similar to the traditional influence function, I$^2$F can be less accurate and suffers from large variance in extremely non-convex loss functions. Advanced linearization techniques ⁵ can be helpful in improving the accuracy of influence. Then extending our analysis to bigger foundation models may bring intriguing insights into the scaling law of privacy.

Future Directions. As the first attempt at influence function in DGL, our method can serve multiple purposes to benefit future research. For example, our metric can be used to efficiently examine the privacy breach before sending gradients to third parties. Since I$^2$F provides an efficient evaluation of the MSE, it may be directly optimized in conjunction with the loss of main tasks. Such joint optimization could bring in the explicit trade-off between utility and privacy in time. In comparison, traditional arts like differential privacy are complicated by tuning the privacy parameter for the trade-off. Furthermore, we envision that many techniques can be adopted to further enhance the analysis.

Broader Impacts. Data privacy has been a long-term challenge in machine learning. Our work provides a fundamental tool to diagnose privacy breaches in the gradients of deep networks. Understanding when and how privacy leakage happens can essentially help the development of defenses. For example, it can be used for designing stronger attacks, which leads to improved defense mechanisms and ultimately benefit the privacy and security of machine learning.

Zhu, L., Liu, Z., & Han, S. (2019). Deep leakage from gradients. NeurIPS. ↩︎
Geiping, J., Bauermeister, H., Dröge, H., & Moeller, M. (2020). Inverting gradients-how easy is it to break privacy in federated learning?. NeurIPS. ↩︎
Jeon, J., Lee, K., Oh, S., & Ok, J. (2021). Gradient inversion with generative image prior. NeurIPS. ↩︎
Zhao, B., Mopuri, K. R., & Bilen, H. (2020). idlg: Improved deep leakage from gradients. ArXiv. ↩︎
Bae, J., Ng, N., Lo, A., Ghassemi, M., & Grosse, R. B. (2022). If Influence Functions are the Answer, Then What is the Question?. NeurIPS. ↩︎

DiRP Trustworthy LLM

Fri, 15 Sep 2023 00:00:00 +0000

The scope of the reading group is to exploring the trustworthiness of Large Language Models (LLMs), e.g., ChatGPT, Llama, etc.

Major reading materials:

DecodingTrust: Comprehensive Assessment of Trustworthiness in GPT Models. [website]
OpenAI GPT API document. [link]

Schedule

Weekly meeting: 5 pm (Central Time), Friday

Date	Topic	Location
10/04	Introduction to Trustworthy LLM	EER 7.650
10/13	Introduction to benchmarks and DecodingTrust	EER 7.650
10/20	Reading: Privacy (Jocelyn), OoD Robustness (Daniel)	Online
10/27	Reading: Fairness (Satvik)	EER 7.650
11/03	Reading: Ethics (Rishabh), Stereotype (Satvik)	EER 7.650
11/10	Reading: Adversarial Demonstrations (Jocelyn)	EER 7.650
12/01	Code and play	EER 7.650

Assignment 1: Decoding the trustworthiness of Large Language Models

Read the introduction of DecodingTrust.
Select a preferred topic (a perspective of trust) and read the corresponding section.
Present the main challenge, measurement of the topic in 10 min.

Assignment 2: Code and play!

Find a perspective in DecodingTrust that you want to play with.
In your slides, write down
- What the metric is conceptually?
- Why does this metric matter?
- how to compute the score (e.g., success rate of private email extraction for privacy).
Implement the score computation in Python with OpenAI API.
Debug and play with a small set of samples. (To save you money, don’t do large-scale experiments).

Note, you are free to use any tools and online materials to do this (even reading/copying DecodingTrust codes). Just rock me with the coolest result that you can get!

The feature image is generated by DALL-E by below prompts:

Me: Create a teaser image for my seminar on trustworthy large language models.
Me: Modify your images to include more information about language model (or Artificial Intelligence) and security.
Me: I like the third one. But could you change the color theme? Make it lighter?
Me: Change the background to white.

A Privacy-Preserving Hybrid Federated Learning Framework for Financial Crime Detection

Fri, 30 Jun 2023 13:08:20 +0800

TBA

FedNoisy: A Federated Noisy Label Learning Benchmark

Fri, 30 Jun 2023 13:08:20 +0800

Revisiting Data-Free Knowledge Distillation with Poisoned Teachers

Tue, 25 Apr 2023 13:08:20 +0800

To tailor the highly performant large models for the budget-constrained devices, knowledge distillation (KD) and more recently data-free KD, has emerged as a fundamental tool in the DL community. Data-free KD, in particular, can transfer knowledge from a pre-trained large model (known as the teacher model) to a smaller model (known as the student model) without access to the original training data of the teacher model. The non-requirement of training data generalizes KD to broad real-world scenarios, where data access is restricted for privacy and security concerns. For instance, many countries have strict laws on accessing facial images, financial records, and medical information.

Despite the benefits of data-free KD and the vital role it has been playing, a major security concern has been overlooked in its development and implementation: Can a student trust the knowledge transferred from an untrusted teacher? The untrustworthiness comes from the non-trivial chance that pre-trained models could be retrieved from non-sanitized or unverifiable sources, for example, third-party model vendors or malicious clients in federated learning. One significant risk is from the backdoor pre-implanted into a teacher model, which alters model behaviors drastically in the presence of predesigned triggers but remains silent on clean samples. As traditional attacks typically require to poison training data, it remains unclear if student models distilled from a poisoned teacher will suffer from the same threat without using the poisoned data.

Fig 1: Backdoor Attack Success Rates (ASRs) of the distilled student model using the vanilla KD with clean in-distribution samples (a) and data-free KD using synthetic (b, c) or OOD (d) samples. The clean accuracy (Acc) of each figure is plotted with standard deviations among different attack-poisoned CIFAR-10. We run each KD method with different but sufficient training epochs to ensure convergence. Existing data-free KD methods may lead to the transfer of backdoor knowledge when poisoned teachers' participation.

Fig 2: Trigger visualization and teacher model performances on CIFAR-10. The performance (ASR/Acc) of the poisoned teacher using each backdoor attack is provided beneath each trigger's name. We envision the backdoored example for each attack on CIFAR-10.

In this paper, we take the first leap to uncover the data-free backdoor transfer from a poisoned teacher to a student through comprehensive experiments on 10 backdoor attacks. We evaluated one vanilla KD using clean training data and three training-data-free KD method which use synthetic data (ZSKT¹ & CMI ²) or out-of-distribution (OOD) data as surrogate distillation data³.

Our main observations are summarized as follows and essentially imply two identified risks in data-free KD.

Vanilla KD does not transfer backdoors by using clean in-distribution data, while all three training-data-free distillations suffer from backdoor transfer by 3 to 8 types of triggers out of 10 with a more than 90% attack success rate. Contradicting the two results indicates the poisonous nature of the surrogate distillation data in data-free KD.
The successful attack on distillation using trigger-free out-of-distribution (OOD) data demonstrate that triggers are not essential for backdoor injection, but the poisoned teacher supervision is.

Fig 3: ABD is effective in different data-free distillation methods on CIFAR-10 with WRN16-2 (Teacher) and WRN16-1 (student).

Upon observing the two identified risks, we propose a plug-in defensive method, Anti-Backdoor Data-Free KD (ABD), that works with general data-free KD frameworks. ABD aims to suppress and remove any backdoor knowledge being transferred to the student, thus mitigating the impact of backdoors. The high-level idea of ABD is two-fold: (SV) Shuffling Vaccine during distillation:~suppress samples containing potential backdoor knowledge being fed to the teacher (mitigating backdoor information participates in the KD); Student (SR) Self-Retrospection after distillation:~ synthesize potential learned backdoor knowledge and unlearns them at later training epochs (the backstop to unlearn acquired malicious knowledge). ABD is effective on defending various backdoor attacks with different patterns and is a plug-in defense that can be used seamlessly with all three types of data-free KD.

Micaelli, P., & Storkey, A. J. (2019). Zero-shot knowledge transfer via adversarial belief matching. NeurIPS. ↩︎
Fang, G., Song, J., Wang, X., Shen, C., Wang, X., & Song, M. (2021). Contrastive model inversion for data-free knowledge distillation. IJCAI. ↩︎
Asano, Y. M., & Saeed, A. (2023). Extrapolating from a single image to a thousand classes using distillation. ICLR. ↩︎

How Robust is Your Fairness? Evaluating and Sustaining Fairness under Unseen Distribution Shifts

Sun, 19 Feb 2023 13:08:20 +0800

MECTA: Memory-Economic Continual Test-Time Model Adaptation

Fri, 20 Jan 2023 13:08:20 +0800

Turning the Curse of Heterogeneity in Federated Learning into a Blessing for Out-of-Distribution Detection

Thu, 19 Jan 2023 13:08:20 +0800

Federated Robustness Propagation: Sharing Adversarial Robustness in Federated Learning

Mon, 02 Jan 2023 13:08:20 +0800

Precautionary Unfairness in Self-Supervised Contrastive Pre-training

Sun, 20 Nov 2022 13:08:20 +0800

Holistic Trustworthy ML

Tue, 27 Sep 2022 00:00:00 +0000

In the era of deep learning and facing the simultaneously-induced tremendous risks, my vision is to enhance the trustworthiness of machine learning. Fairness, robustness, security, inclusiveness, and privacy are the core targets within the scope of trustworthiness. For example, recognizing objects by self-driving cars requires the model to be fair regardless of the execution countries, robust in different environments, secure against implicit backdoors, inclusive to heterogeneous computation/data nodes, and preserve the privacy of sensitive training data. Recently, attaining trustworthiness has become a fundamental requirement for machine learning to be reliably used in human-centered activities.

Privacy-Centric Trustworthy Learning

My recent research focuses on the trustworthiness of machine learning within the privacy-preserving learning frameworks and I outline my work as the Privacy-Centric Trustworthy Learning. As learning large models from private data has been an essential strategy facing the increasing demand for massive data, for example, 45TB text data for training the language model (GPT-3), protecting data privacy has become the prerequisite before pursuing the fairness, robustness, and security of models. However, traditional trustworthy machine learning is typically single-dimensional, for example, considering fairness only without privacy. As outlined below, my research fills the gap by developing trustworthiness-aware algorithms and models within the privacy-preserving data and computation frameworks, for example, federated learning. In federated learning or other similarly-principled frameworks, data are excluded from communication between different data sources and training is executed on local devices for each user.

Such frameworks pose interwoven and non-trivial challenges in terms of invisible risks of trustworthiness and increased computation loads to local devices.

(1) Invisible risks by invisible data. As the raw data are invisible to other users in federated learning, the biased and potential poison samples are not visible to the global system, either. Therefore, defending against such biases or noise will become harder compared to that in a centralized setting. We unraveled that such data invisibility may result in the transfer of poison knowledge implicitly yielding insecure models, in data-free distillation [ICML23], which was used for federated learning [ICML21]. When clients' data are mutually unaware of each other, we demonstrate that the bias between users may be ignored and results in unfairness [KDD21]. For both the security and fairness challenges, we proposed corresponding countermeasures by adversarial learning strategies.

(2) Low inclusiveness by increased computation costs for trustworthiness. The existing computation barrier of trustworthy machine learning makes the trustworthiness and learning no longer inclusive or accessible to many users in federated learning that requires on-device training. For example, to achieve robustness, extra computation has been devoted to adversarial training or out-of-distribution (OoD) detection. The overhead limits low-resource users to gain robustness because of the high cost of robust training in terms of data or computation. We provided the first solutions to sharing adversarial robustness [AAAI23] and OoD robustness [ICLR23] by leveraging collaborative computation and communication. Except for robustness, low-resource users are often excluded from the federated learning to train a large model. We proposed algorithms to make the training inclusively affordable for different devices, where models are for the first time customizable both in training and test time [ICLR22]. In addition, extremely low-resource devices, for instance, the Internet-of-Thing devices, are not suitable for training by design on memory and coding systems. Thus, we provide the first sampling-based framework [NeurIPS22] to inclusively accommodate the low-resource users.

Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling

Thu, 22 Sep 2022 13:08:20 +0800

Our work is motivated by the popularity of cloud training, where intelligent edge devices will upload data to the cloud and receive the trained models for predictions, like face recognition, object classification and so on. Industrial examples include Amazon SageMaker, Microsoft Azure, Cloud Machine Learning Engine by Google. Outsourcing training to cloud has empowers many applications of edge intelligence, for example, health care, smart camera, wearable smart devices and so on.

Fig 1: Cloud machine learning and privacy risks.

However, the solution may encounter concerns when personal data are uploaded by the edge devices. For instance, the server may find who are using the service by searching for your profile photos in the uploaded database. A lot of work has been done to defend such information leakage in the machine learning community. For example, adding Gaussian noise to gradients can protect sample-wise privacy in the notion of differential privacy¹. However, adding noise induces great variance to the training and results in inevitable trade-off between accuracy and privacy². Meanwhile, edge devices usually are not able to collect a large dataset, when privacy-preserving learning is more thirsty for more data or well-learned features³. Here we aim to provide a new idea to defend such risks: without adding noise to the training or models, but providing sufficient data for training.

Fig 2: Main idea: Outsourcing training without uploading data.

Our main idea is that we can find a privacy-free proxy dataset from open-source domains. Open-source datasets are publicly available or authorized for free use. Trivially, we may send all the open-source data to the edge client for filtering desired samples and conduct training on the cloud accordingly. You may find many examples online, like ImageNet, DomainNet and CIFAR10. You can also search for task-related images from the Internet (e.g., Google) using keywords. Because of the nature of open-source data, we can obtain a great amount of free images for training without adding any noise. But meanwhile we also face some challenges:

(Proximity) As the open-source data are collected from heterogeneous sources, finding a good proxy dataset is non-trivial.
(Efficiency) The large volume of open-source casts high computation and communication costs for the edge client to transmit and filter samples.
(Privacy) Though no private data is uploaded, the information exchanged between cloud and the client may still leak private information.

Fig 3: Efficient Collaborative Open-source Sampling (ECOS).

To improve the efficiency and control privacy risks, we propose a novel sampling paradigm, Efficient Collaborative Open-source Sampling (ECOS). (1) On the cloud, ECOS first compress the massive open-source data into a small set of low-dimensional centroid features by KMeans clustering. (2) Then ECOS sends the compressed centroids to the client who returns privacy-protected cluster-scores. (3) The cloud will diversely sample images from the high-scored clusters for training.

Our method can achieve the aforementioned desired properties. The small size of low-dimensional centroid features greatly reduces the communication and computation complexity. Contradicting the local features with the received centroid features can yield distributional similarity by the cluster coverage scores (the number of samples that are close to a cluster). Therefore, the cloud can filter clusters by the scores. The scores are privatized by injecting Gaussian noise, which privacy costs are accounted by Differential Privacy, which is estimated by numerical moment accountant⁴.

Fig 4: Selective manual labeling.

One application of ECOS is the selective manual labeling, where ECOS samples a proximal subset from a large volume of unlabeled open-source data for manual labeling. The labeled and unlabeled data are used for semi-supervised learning. As outsourcing labeling is expensive, it is essential to control the budget by limiting the number of samples. Therefore, a set of high-quality labeled data is important for the high performance of trained models. In Table 4, we show that the test accuracy by models trained on ECOS samples can outperform baselines and local training (with 1000 samples). We also provide the accounted privacy cost in terms of $(\epsilon, \delta)$-Differential-Privacy (DP) given $\delta=10^{-5}$. Though ECOS induces privacy costs by communication with the client, the privacy cost is very low.

Our main contributions can be summarized as follows.

New privacy-preserving training: We find public data in place of the client data for cloud training.
New sampling paradigm: ECOS is communication- and computation-efficient and private.
Flexible on multiple learning tasks: selective manual labeling, automated client labeling, and adaptive model compression.

We also recognize open questions of the proposed solution for future studies. For example, the public dataset may require additional data processing, e.g., aligning and cropping for improved prediction accuracy. In our empirical studies, we only consider the computer vision tasks, though no assumption was made on the data structures. We expect the principles to be adapted to other data types with minimal efforts. More data types, including tabular and natural-language data, will be considered in the follow-up works.

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep Learning with Differential Privacy. CCS. ↩︎
Bietti, A., Wei, C.-Y., Dudik, M., Langford, J., & Wu, S. (2022). Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning. ICML. ↩︎
Tramèr, F., & Boneh, D. (2021, February 17). Differentially Private Learning Needs Better Features (or Much More Data). ICLR. ↩︎
Wang, Y.-X., Balle, B., & Kasiviswanathan, S. P. (2019). Subsampled Renyi Differential Privacy and Analytical Moments Accountant. AISTATS ↩︎

Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork

Thu, 22 Sep 2022 13:08:20 +0800

Resilient and Communication Efficient Learning for Heterogeneous Federated Systems

Thu, 02 Jun 2022 13:08:20 +0800

Dynamic Privacy Budget Allocation Improves Data Efficiency of Differentially Private Gradient Descent

Thu, 07 Apr 2022 13:08:20 +0800

Utility upper bounds are a critical metric for privacy schedules, which characterizes the maximum utility that a schedule can deliver in theory. Wang et al. [34] is the first to prove the utility bound under the PL condition. Recently, Zhou et al. proved the utility bound by using the momentum of gradients [17, 25]. In this paper, we improve the upper bound by a more accurate estimation of the dynamic influence of step noise. We show that introducing a dynamic schedule further boosts the sample-efficiency of the upper bound. Table 1 summarizes the upper bounds of a selection of state-of-the-art algorithms based on private gradients (up block, see Appendix B for the full list), and methods studied in this paper (down block), showing the benefits of dynamic influence.

Especially, a closely-related work by Feldman et al. achieved a convergence rate similar to ours in terms of generalization error bounds (c.f. SSGD in Table 2), by dynamically adjusting batch sizes [11]. However, the approach requires controllable batch sizes, which may not be feasible in many applications. In federated learning, for example, where users update models locally and then pass the parameters to server for aggregation, the server has no control over batch sizes, and coordinating users to use varying batch sizes may not be realistic. On the other hand, our proposed method can still be applied for enhancing utility, as the server can dynamically allocate privacy budget for each round when the presence of a user in the global aggregation is privatized [21].

In brief, given a sharper loss function, the dynamic budget allocation allows the DPSGD to run for more private iterations and results in lower excess expected risks.

Efficient Split-Mix Federated Learning for On-Demand and In-Situ Customization

Fri, 28 Jan 2022 13:08:20 +0800

Federated learning (FL)¹ is a distributed learning paradigm that leverages data from remote participants and aggregates their knowledge without requiring their raw data to be transferred to a central server, thereby largely reducing the concerns from data security and privacy. FedAvg is among the most popular federated instantiations, which aggregates knowledge by averaging models uploaded from different participants.

Fig 1: Model customization for dynamic width (efficiency) and robustness.

When deploying federated learning, one challenge in real-world applications is the run-time (i.e., test-time) dynamics: The requirements on model properties (e.g., inference efficiency, robustness, etc.) can be constantly changing during the run-time, depending on the status of the devices or the outside environment. One common and specific type of dynamics is resource dynamics: For each application, the allocated on-device resources (e.g., run-time memory, CPU bandwidth, etc.) may vary drastically during run-time, depending on how the resource allocation of the running programs are prioritized on a participant’s device. Another type of dynamics is the robustness dynamics: The constantly changing outside environment can make different requirements on the safety (or robustness) level of the model. For instance, the quality of real-time videos captured by autonomous cars can suddenly degrade, e.g., on entering a poor-lighted alley or tunnel from a well-lighted avenue, on entering a section of bumpy road which leads to a sudden burst of blurring in the videos, etc. In such cases, a more robust model should be quickly switch in and replace the one used on benign conditions, in order to prevent catastrophic accidents caused by wrong recognition under poor visual conditions. Such dynamic run-time requirements demand the flexibility to customize the model. The desired model should be able to transform to different variants for dynamic demands of robustness, accuracy and efficiency.

Fig 2: Device heterogeneity in federated learning.

To effectively and efficiently train models for on-demand an in-situ customization, new challenges will be raised by the ubiquitous heterogeneity of federated learning participants. Fist, the participants can have resource heterogeneity: Different participants have different hardware resources available, such as memory, computing power, and network bandwidth. For example, in a learning task for face recognition, clients may use different types of devices (e.g., computers, tablets or smartphones) to participate in learning. To accommodate different hardware, one can turn to more resource-flexible architectures trained by distillation from ensemble, partial model averaging, or directly combining predictions. Specifically, HeteroFL² is the first heterogeneous-width solution allowing in-situ model-size switching. Nevertheless, it suffers from under-training in its large models due to local budget constraints.

Fig 3: Feature heterogeneity in federated learning.

The degradation could be worsened as facing data heterogeneity: The training datasets from participants are not independent and identically distributed (non-i.i.d.). When one device with a unique data distribution cannot afford training a large model, the global large model may not transfer to the unseen distribution. Thus, HeteroFL may not provide effective customization such that more parameters brings in higher accuracy and how to train an effectively customizable model still remains unknown.

Fig 3: Split-Mix Federated Learning.

To address the aforementioned challenges from heterogeneity and dynamics, we study a novel Split-Mix approach to enable FL on heterogeneous devices and achieve in-situ model customization for resource efficiency and robustness: The size and robustness of the resultant model can be efficiently customized at run-time. Specifically, we first split the complete knowledge in a large model into several small base sub-networks (shards) according to model widths and robustness levels. To complete the knowledge, we let the base models be fully trained on all clients. To provide customized models, we mix selected base models to construct the desired model size and robustness. Overall, our contributions can be summarized in three folds:

Within the domain of heterogeneous federated learning, we are the first to study training a model with the capability of in-situ customization with heterogeneous local computation budgets, which cannot be resolved by existing methods yet.
To address the challenge, we propose a novel Split-Mix framework that aggregates knowledge from heterogeneous clients into a width- and robustness-adjustable model structure. Remarkably, due to fewer parameters and modular nature, our framework is not only efficient in federated communication and flexibly adaptable to various client budgets during training, but also efficient and flexible in storage, model loading and execution during inference.
Empirically, we demonstrate that the performance of the proposed method is better than other FL baselines under heterogeneous budget constraints. Moreover, we show its effectiveness when facing the challenge of data heterogeneity.

McMahan, B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. y. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS ↩︎
Diao, E., Ding, J., & Tarokh, V. (2021). HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients. ICLR. ↩︎

Federated Adversarial Debiasing for Fair and Transferable Representations

Fri, 20 Aug 2021 13:08:20 +0800

The distribution shift between two groups can be debiased in the representation space. For example, when the encoder $G$ is fixed, a discriminator network $D$ can be trained to criticize the group discrepancy of samples from two groups ¹. Meanwhile, we debias the representations by training the encoder $G$ to maximize the discrimination error when fixing the discriminator. For central learning, the objective is $$\min_f \max_g \mathbb{E}_{(x,y, g)} [ \ell_c(f,G; x,y) + \ell_d (D,G; x,g) ]$$ where the debiasing loss is $$\ell_d = \mathbb{I}(g=0) \log(D(G(x))) + \mathbb{I}(g=1) \log(1 - D(G(x))) $$ and the classifier loss is $$\ell_c = \text{XEnt}(f(G(x)), y) $$ where $\text{XEnt}$ is the cross-entropy loss.

Fig 1: Central Debiasing

However, such a debiasing method is not feasible in a federated setting where users' data will not be aggregated due to the privacy concern. A recent work² propose to do the adversarial debiasing on the gathered representations. Either the source domain users or the target domain users has to send their data presentations to the other group. First, this will increase the communication burden among users. When $M$ source domain users and $N$ target domain users are involved, the communication occurs $MN$ times. Second, sharing representations is not safe for privacy, as it is easy to reverse-engineering the representations to obtain the input samples. Especially, when the encoder is shallow.

Fig 2: Unsupervised Federated Domain Adaptation

Instead, our method, Federated Adversarial DEbiasing (FADE), does not require users to share their data but only sharing an additional discriminator sub-network. Just like FedAvg³, the shared model help to transfer the useful knowledge in the data while keeping raw data locally.

Fig 3: Federated Adversarial Debiasing

However, such a method raises new challenges. First, we will find that the $\ell_d$ has only one side objective. For example, for group $g=1$, the loss of group $0$ will be missing. Formally, we will write the federated objective as $$\min_{f,G} \mathcal{L}(f, G) = \sum_{g=1}^E \sum_{i=1}^{m_g} L_{i,g}(f, G),$$ $$L_{i,g} (f, G) = L_i^{task}(f, G) + \lambda \max_D L_{i,g}^{adv} (G, D),$$ where $L_i^{task}(f, G)$ is the classification loss for the $i$-th user, $L_{i,g}^{adv} (G, D)$ is the adversarial loss and $m_g$ is the number of users in group $g$. For the two-group case, the adversarial loss can be $$ \begin{aligned} L_{i,g}^{adv} (G, D) = \mathbb{E}_{x\sim p_i(x)} \left[ \mathbb{I}(g=0) \log(D(G(x))) \right. \\
%\mu + \tau\times\eta = \theta \sim N(\mu , \tau^2) \left. +\mathbb{I}(g=1) \log(1 - D(G(x))) \right]. %+ \mathbb{I}(g=1) \log (1 - D(G(x))) \end{aligned} $$ The critical problem is if the optimization can converge when the counterpart group is missing. In other words, we want ask if the distribution matching is a sufficient condition for the minimization.

As shown in Theorem 1.4, it is a sufficient condition for the minimizing the model-measured discrepancy $\tilde D$ between $p_1$ and $p_2$. We also demonstrate the effectiveness by experiments on unsupervised domain adaptation (UDA) benchmarks. The FADE-based achieve performance comparable to central versions. In non-iid and autonomous-user-involving (2 users per round), FADE outperforms the baselines.

Impact of Imbalanced Groups

We also notice a possible negative impact due to the imbalance of group users. Suppose the ratio of two group users are $\alpha_1$ and $\alpha_2$, respectively. Then the sensed discrepancy will be biased as the imbalance is more severer. To fix this, we propose re-weight the losses according to the loss scales. That is $\hat \ell = - \ell^2 / 2$ which was used for fair-federated learning. We compare the vanilla loss versus the squared loss in Fig 4. As more target users are involved, the imbalance is worsened and the squared loss could improve the drop of vanilla losses.

Fig 4: Experiments on imbalanced source/target UDA.

We also conduct imbalanced experiments in fair federated learning. Squared loss is preferred as imbalance data present, while vanilla loss is preferred in reversed cases.

Fig 5: Experiments on imbalanced male/female fair learning.

Impact of Non-iid Users

In addition, the adversarial training may not only debias unwanted distribution shift but also important discriminative information, as class-wise non-iid distributions are present in federated users. The unwanted debiasing is named user collapse in the scope of this paper. We argue that using a regularization to limit the user collapse is plausible. For example, a regularization conditioned on the possible classes is helpful⁴.

Ganin, Y., & Lempitsky, V. (2015). Unsupervised Domain Adaptation by Backpropagation. ICML, 1180–1189. http://proceedings.mlr.press/v37/ganin15.html ↩︎
Peng, X., Huang, Z., Zhu, Y., & Saenko, K. (2019, September 25). Federated Adversarial Domain Adaptation. ICLR. https://openreview.net/forum?id=HJezF3VYPB ↩︎
McMahan, B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. y. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTAT, 1273–1282. http://proceedings.mlr.press/v54/mcmahan17a.html ↩︎
Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2018). Conditional Adversarial Domain Adaptation. ArXiv:1705.10667 [Cs]. http://arxiv.org/abs/1705.10667 ↩︎

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Tue, 18 May 2021 13:08:20 +0800

Learning Model-Based Privacy Protection under Budget Constraints

Wed, 20 Jan 2021 13:08:20 +0800

Detecting MCI using real-time, ecologically valid data capture methodology: How to improve scientific rigor in digital biomarker analyses

Tue, 30 Jun 2020 13:08:20 +0800

Federated Learning

Mon, 27 Jan 2020 00:00:00 +0000

Variant Grassmann Manifolds: A Representation Augmentation Method for Action Recognition

Sat, 11 May 2019 23:52:06 -0400

Short Sequence Classification Through Discriminable Linear Dynamical System

Tue, 05 Feb 2019 11:50:05 -0500

Slides

Tue, 05 Feb 2019 00:00:00 +0000

Create slides in Markdown with Wowchemy

Wowchemy | Documentation

Features

Efficiently write slides in Markdown
3-in-1: Create, Present, and Publish your slides
Supports speaker notes
Mobile friendly slides

Controls

Next: Right Arrow or Space
Previous: Left Arrow
Start: Home
Finish: End
Overview: Esc
Speaker notes: S
Fullscreen: F
Zoom: Alt + Click
PDF Export: E

Code Highlighting

Inline code: variable

Code block:

porridge = "blueberry"
if porridge == "blueberry":
    print("Eating...")

Math

In-line math: $x + y = z$

Block math:

$$ f\left( x \right) = ;\frac{{2\left( {x + 4} \right)\left( {x - 4} \right)}}{{\left( {x + 4} \right)\left( {x + 1} \right)}} $$

Fragments

Make content appear incrementally

{{% fragment %}} One {{% /fragment %}}
{{% fragment %}} **Two** {{% /fragment %}}
{{% fragment %}} Three {{% /fragment %}}

Press Space to play!

One Two Three

A fragment can accept two optional parameters:

class: use a custom style (requires definition in custom CSS)
weight: sets the order in which a fragment appears

Speaker Notes

Add speaker notes to your presentation

{{% speaker_note %}}
- Only the speaker can read these notes
- Press `S` key to view
{{% /speaker_note %}}

Press the S key to view the speaker notes!

Themes

black: Black background, white text, blue links (default)
white: White background, black text, blue links
league: Gray background, white text, blue links
beige: Beige background, dark text, brown links
sky: Blue background, thin dark text, blue links

night: Black background, thick white text, orange links
serif: Cappuccino background, gray text, brown links
simple: White background, black text, blue links
solarized: Cream-colored background, dark green text, blue links

Custom Slide

Customize the slide style and background

{{< slide background-image="/media/boards.jpg" >}}
{{< slide background-color="#0000FF" >}}
{{< slide class="my-style" >}}

Custom CSS Example

Let’s make headers navy colored.

Create assets/css/reveal_custom.css with:

.reveal section h1,
.reveal section h2,
.reveal section h3 {
  color: navy;
}

Questions?

Ask

Documentation

Privacy in Collaborative ML

Thu, 27 Sep 2018 00:00:00 +0000

AI for Dementia Healthcare

Mon, 27 Aug 2018 00:00:00 +0000

We aim to early detect and intervene dementia diseases leveraging the power of (Generative) AI.

Privacy Policy

Thu, 28 Jun 2018 00:00:00 +0100

My website does not host third-party cookies and hosts three first-party cookies just to generally understand the audience of the website. The cookies are from Google Analytics.

Disturbance Grassmann Kernels for Subspace-Based Learning

Mon, 11 Jun 2018 13:08:32 +0800

Sequential Data Classification in the Space of Liquid State Machines

Sat, 11 Jun 2016 13:08:20 +0800

Subspace Learning

Wed, 27 Apr 2016 00:00:00 +0000

Mon, 01 Jan 0001 00:00:00 +0000