Advertisement
Bisqwit

16-bit clamp algorithms comparison

Jan 20th, 2017
560
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.25 KB | None | 0 0
  1. Left: x = (n<=-32768 ? -32768 : (n>32767 ? 32767 : n));
  2.  
  3. Right: x = ((short)n != n) ? (n>>31) ^ 0x7FFF : n;
  4.  
  5. X86_64:
  6. CMP EDI, -32768 MOV EAX, EDI
  7. MOV EAX, 32767 MOVSX EDX, DI
  8. CMOVG EDI, EAX SAR EAX, 31
  9. MOV EAX, -32768 XOR AX, 32767
  10. CMP EDI, -32768 CMP EDI, EDX
  11. CMOVL EDI, EAX CMOVNE EDI, EAX
  12.  
  13. ARMV6 (ARM):
  14. SSAT R0, #16, R0 SXTH R3, R0
  15. SXTH R0, R0 CMP R0, R3
  16. LDRNE R3, .L7
  17. EORNE R3, R3, R0, ASR #31
  18. ...
  19. .L7: .WORD 32767
  20.  
  21. ARMV6-M (THUMB):
  22. LDR R3, .L5 SXTH R3, R0
  23. CMP R0, R3 CMP R0, R3
  24. BLE .L2 BEQ .L8
  25. MOVS R0, R3 LDR R3, .L9
  26. .L2: LDR R3, .L5+4 ASRS R0, R0, #31
  27. CMP R0, R3 EORS R3, R0
  28. BGE .L3 .L8: ...
  29. MOVS R0, R3 .L9: .WORD 32767
  30. .L3: SXTH R0, R0
  31. ...
  32. .L5: .WORD 32767, -32768
  33.  
  34. ARMV7VE and ARMV8-A (ARM):
  35. SSAT R0, #16, R0 SXTH R3, R0
  36. SXTH R0, R0 CMP R0, R3
  37. ASRNE R0, R0, #31
  38. EORNE R3, R0, #32512
  39. EORNE R3, R3, #255
  40.  
  41. ARMV7VE and ARMV8-A (THUMB):
  42. SSAT R0, #16, R0 SXTH R3, R0
  43. SXTH R0, R0 CMP R0, R3
  44. ITT NE
  45. MOVWNE R3, #32767
  46. EORNE R3, R3, R0, ASR #31
  47.  
  48. NVCC (NVidia Cuda compiler)
  49. ld.param.u32 %r1, [_Z6clamp1i_param_0]; ld.param.u32 %r1, [_Z6clamp2i_param_0];
  50. setp.lt.s32 %p1, %r1, -32767; cvt.s32.s16 %r2, %r1;
  51. setp.gt.s32 %p2, %r1, 32767; setp.eq.s32 %p1, %r2, %r1;
  52. cvt.u16.u32 %rs1, %r1; shr.s32 %r3, %r1, 31;
  53. selp.b16 %rs2, 32767, %rs1, %p2; xor.b32 %r4, %r3, 32767;
  54. selp.b16 %rs3, -32768, %rs2, %p1; selp.b32 %r5, %r1, %r4, %p1;
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement