Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: order of similarity matters, WHY? #648

Open
dhandhalyabhavik opened this issue Sep 9, 2024 · 6 comments
Open

[Bug]: order of similarity matters, WHY? #648

dhandhalyabhavik opened this issue Sep 9, 2024 · 6 comments

Comments

@dhandhalyabhavik
Copy link

Current Behavior

Using the default onnx model,
Score function

def get_score(a, b):
    return evaluation.evaluation(
        {
            'question': a
        },
        {
            'question': b
        }
    )

Case 1:

a = 'What is neural network?'
b = 'Explain neural network and its components.'
c = 'What are the key components of neural network?'
print (get_score(a, b))
print (get_score(a, c))
print (get_score(b, c))

0.7585506439208984
0.02885962650179863
0.0909486636519432

Case 2:

a = 'What is neural network?'
b = 'Explain neural network and its components.'
c = 'What are the key components of neural network?'
print (get_score(b, a))
print (get_score(c, a))
print (get_score(c, b))

0.17746654152870178
0.013074617832899094
0.8378676772117615

Just changed x,y to y,x while passing argument to get_score, why drastic changes in scores?

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

@SimFG
Copy link
Collaborator

SimFG commented Sep 10, 2024

It seems that you did not experiment with changing x,y to y,x. It seems that you should use
print (get_score(a, b)), print (get_score(b, a)) for comparison.

@dhandhalyabhavik
Copy link
Author

I did, look at these lines,

print (get_score(a, b)) # in case 1
0.7585506439208984
print (get_score(b, a)) # in case 2
0.17746654152870178

@SimFG
Copy link
Collaborator

SimFG commented Sep 11, 2024

It's amazing that there is such a phenomenon!

@Ali-Parandeh
Copy link

Is it because the LLM replies differently if ranking/ordering of content is different in a rag application?

@SimFG
Copy link
Collaborator

SimFG commented Sep 13, 2024

I don't know much about this part. Theoretically, the distance between the two vectors should be calculated to get the score. Swapping the positions should not affect the score.

@wxywb
Copy link
Collaborator

wxywb commented Sep 13, 2024

We trained a cross-encoder model to evaluate similarity, where conceptually the pairs should ignore their positions. However, since it uses BERT, there could be some unusual behavior, as it's a lightweight transformer without any constraints to enforce this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants