[Questions] Implementation on RoPE

Thanks for open-sourcing this great work!

I was reading the source code in [model.py](https://github.com/allenai/molmo/blob/793fa387edfd6fd0f5b21eb8e0a7620a1f3799e1/olmo/model.py#L334) file and got really confused by the implementation of RoPE. RoPE frames the multi-dimension rotation matrix as a bunch of 2x2 rotation matrix. So for a d-dimensional array `[x1, x2, ..., xd]`, `x1` and `x2` goes through a rotation matrix while `xd-1` and `xd`another.

The problem is 

```python
freqs = einsum("i , j -> i j", seq, inv_freq)
positions = torch.cat((freqs, freqs), dim=-1)
```

If we concat the `freqs` in the last dimension, the resulting matrix will be `0, 2, 2i, ..., d, 0, 2, 2i, ..., d`. In this case, if we later apply the rotary embedding, `x1` and `x2` accept different rotation angles.

Thus, I would suggest the correct way is to perform `torch.repeat_interleave` in the last dimension as is specified in [huggingface timm library](https://github.com/huggingface/pytorch-image-models/blob/a5104902804fc984df1a26e372a8695f5ce4bab5/timm/layers/pos_embed_sincos.py#L374).

Correct me if I was wrong. Appreciated for any replies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Questions] Implementation on RoPE #49

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Questions] Implementation on RoPE #49

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions