Skip to content

Instantly share code, notes, and snippets.

@troosh
Forked from mhroth/movemask.c
Created January 14, 2018 09:00
Show Gist options
  • Select an option

  • Save troosh/b3b2af87e51ff9c17e7fc010e7e56314 to your computer and use it in GitHub Desktop.

Select an option

Save troosh/b3b2af87e51ff9c17e7fc010e7e56314 to your computer and use it in GitHub Desktop.
A basic NEON implementation of SSE _mm_movemask_ps
uint32_t _mm_movemask_ps(float32x4_t x) {
uint32x4_t mmA = vandq_u32(
vreinterpretq_u32_f32(x), (uint32x4_t) {0x1, 0x2, 0x4, 0x8}); // [0 1 2 3]
uint32x4_t mmB = vextq_u32(mmA, mmA, 2); // [2 3 0 1]
uint32x4_t mmC = vorrq_u32(mmA, mmB); // [0+2 1+3 0+2 1+3]
uint32x4_t mmD = vextq_u32(mmC, mmC, 3); // [1+3 0+2 1+3 0+2]
uint32x4_t mmE = vorrq_u32(mmC, mmD); // [0+1+2+3 ...]
return vgetq_lane_u32(mmE, 0);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment