Skip to content

Conversation

@OmarEmaraDev
Copy link
Contributor

@OmarEmaraDev OmarEmaraDev commented Oct 30, 2021

Currently, ramps are generated as a number of independent scalar
expressions that are finally gathered into a vector. For instance,
indexing in vectorized code is filled with ramps like the following:

int _11 = int(1) * int(1);
int _12 = _10 + _11;
int _13 = int(2) * int(1);
int _14 = _10 + _13;
int _15 = int(3) * int(1);
int _16 = _10 + _15;
ivec4 _17 = ivec4(_10, _12, _14, _16);

This patch simplifies the generated code using a multiply add expression
on a vector containing an arithmetic sequence, such that the code is
as follows:

ivec4 _11 = ivec4(0, 1, 2, 3) * int(1) + _10;

This is more performant due to vectorization, more compact, and more
readable because the base and the stride are easily identifiable.

Currently, ramps are generated as a number of independent scalar
expressions that are finally gathered into a vector. For instance,
indexing in vectorized code is filled with ramps like the following:

```
int _11 = int(1) * int(1);
int _12 = _10 + _11;
int _13 = int(2) * int(1);
int _14 = _10 + _13;
int _15 = int(3) * int(1);
int _16 = _10 + _15;
ivec4 _17 = ivec4(_10, _12, _14, _16);
```

This patch simplifies the generated code using a multiply add expression
on a vector containing an arithmetic expression, such that the code is
as follows:

```
ivec4 _11 = ivec4(0, 1, 2, 3) * int(1) + _10;
```

This is more performant due to vectorization, more compact, and more
readable because the base and the stride are easily identifiable.
@dsharletg dsharletg merged commit 76315a2 into halide:master Nov 3, 2021
@dsharletg
Copy link
Contributor

Thanks, this makes sense and is a nice change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants