Skip to content

Question about Over-Trust Logit Penalty in OPERA (Equation (6) Definition) #50

@lky-violet

Description

@lky-violet

Dear Author,

First of all, I would like to express my gratitude for your excellent work and for open-sourcing such a valuable resource. After carefully reading your paper and going through the code, I have a couple of questions regarding the Over-Trust Logit Penalty section, specifically about the definition of Equation (6):

From my understanding, OPERA adds an attention penalty to the probability of the top-k tokens selected at each beam search step. However, ( H(h_t) ) and the attention score appear to be different quantities. Could you clarify whether the subtraction in the equation has a clear physical interpretation, or if there’s a specific reasoning behind it?

I appreciate your time and look forward to your clarification!

Best regards,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions