Question about Over-Trust Logit Penalty in OPERA (Equation (6) Definition)

Dear Author,

First of all, I would like to express my gratitude for your excellent work and for open-sourcing such a valuable resource. After carefully reading your paper and going through the code, I have a couple of questions regarding the Over-Trust Logit Penalty section, specifically about the definition of Equation (6):

From my understanding, OPERA adds an attention penalty to the probability of the top-k tokens selected at each beam search step. However, \( H(h_t) \) and the attention score appear to be different quantities. Could you clarify whether the subtraction in the equation has a clear physical interpretation, or if there’s a specific reasoning behind it?

I appreciate your time and look forward to your clarification!

Best regards,  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Over-Trust Logit Penalty in OPERA (Equation (6) Definition) #50

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about Over-Trust Logit Penalty in OPERA (Equation (6) Definition) #50

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions