-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
Hello,
Could you clarify how you handle group query attention ? For instance in Mistral 7B, there are 8 key value heads and 32 heads. So a given key-value pair is associated with 4 different queries and hence 4 different attention weights. How do you aggregate these 4 values ? I do see the num_key_value_groups variable in the update_kv method but it is not used.
Thanks !
guozhiyu, shirinyamani and ericshwu
Metadata
Metadata
Assignees
Labels
No labels