Skip to content

Instantly share code, notes, and snippets.

@marcprux
Created March 20, 2026 14:16
Show Gist options
  • Select an option

  • Save marcprux/e1678ebdc8f957f49eb4e315f344d908 to your computer and use it in GitHub Desktop.

Select an option

Save marcprux/e1678ebdc8f957f49eb4e315f344d908 to your computer and use it in GitHub Desktop.
This file has been truncated, but you can view the full file.
diff -r android-ndk-r29/CHANGELOG.md android-ndk-r30-beta1/CHANGELOG.md
13c13
< https://android.googlesource.com/platform/ndk/+/master/docs/BuildSystemMaintainers.md
---
> https://android.googlesource.com/platform/ndk/+/mirror-goog-main-ndk/docs/BuildSystemMaintainers.md
19c19
< - Updated LLVM to clang-r563880c. See `clang_source_info.md` in the toolchain
---
> - Updated LLVM to clang-r574158. See `clang_source_info.md` in the toolchain
21,35c21,23
< - [Issue 2144]: Fixed issue where `lldb.sh` did not work when installed to a
< path which contained spaces.
< - [Issue 2170]: Fixed issue where std::unique_ptr caused sizeof to be
< sometimes applied to function reference types.
< - ndk-stack will now find symbols in files with matching build IDs even if the
< file names do not match.
< - ndk-stack will now find symbols in files with matching build IDs even if the
< name of the file is not present in the trace.
< - [Issue 2078]: ndk-stack now accepts a [native-debug-symbols.zip] file for the
< `--sym` argument as an alternative to a directory.
< - [Issue 2109]: `llvm-lipo` has been removed. This tool is only useful for
< building macOS binaries but was mistakenly included in the NDK.
< - [Issue 2135]: simpleperf no longer depends on Tk-Inter in non-GUI mode.
< - [Issue 2146]: Fixed a case where invalid data would appear in simpleperf
< reports.
---
> - [Issue 2073]: Fixed runtime segfault when using LTO and nested exception
> handlers.
> - [Issue 2160]: Fix Clang crash on invalid code.
37,44c25,26
< [Issue 2078]: https://github.com/android/ndk/issues/2078
< [Issue 2109]: https://github.com/android/ndk/issues/2109
< [Issue 2135]: https://github.com/android/ndk/issues/2135
< [Issue 2142]: https://github.com/android/ndk/issues/2142
< [Issue 2144]: https://github.com/android/ndk/issues/2144
< [Issue 2146]: https://github.com/android/ndk/issues/2146
< [Issue 2170]: https://github.com/android/ndk/issues/2170
< [native-debug-symbols.zip]: https://support.google.com/googleplay/android-developer/answer/9848633?hl=en
---
> [Issue 2073]: https://github.com/android/ndk/issues/2073
> [Issue 2160]: https://github.com/android/ndk/issues/2160
diff -r android-ndk-r29/NOTICE android-ndk-r30-beta1/NOTICE
413,440d412
< Copyright (c) 1993 John Brezak
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. The name of the author may be used to endorse or promote products
< derived from this software without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR `AS IS'' AND ANY EXPRESS OR
< IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
< WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
< DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
< INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
< (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
< SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
< STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
< ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
< POSSIBILITY OF SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
730,760d701
< Copyright (C) 2010 The Android Open Source Project
< Copyright (c) 2008 ARM Ltd
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. The name of the company may not be used to endorse or promote
< products derived from this software without specific prior written
< permission.
<
< THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED
< WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
< MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
< IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
< SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
< TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
< PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
< LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
< NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
< SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
<
< Android adaptation and tweak by Jim Huang <jserv@0xlab.org>.
<
< -------------------------------------------------------------------
<
1348a1290,1333
> Copyright (C) 2025 The Android Open Source Project
>
> Licensed under the Apache License, Version 2.0 (the "License");
> you may not use this file except in compliance with the License.
> You may obtain a copy of the License at
>
> http://www.apache.org/licenses/LICENSE-2.0
>
> Unless required by applicable law or agreed to in writing, software
> distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the License for the specific language governing permissions and
> limitations under the License.
>
> -------------------------------------------------------------------
>
> Copyright (C) 2025 The Android Open Source Project
> All rights reserved.
>
> Redistribution and use in source and binary forms, with or without
> modification, are permitted provided that the following conditions
> are met:
> * Redistributions of source code must retain the above copyright
> notice, this list of conditions and the following disclaimer.
> * Redistributions in binary form must reproduce the above copyright
> notice, this list of conditions and the following disclaimer in
> the documentation and/or other materials provided with the
> distribution.
>
> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> SUCH DAMAGE.
>
> -------------------------------------------------------------------
>
2220,2251d2204
<
< This code is derived from software contributed to Berkeley by
< Roger L. Snyder.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< Copyright (c) 1989, 1993
< The Regents of the University of California. All rights reserved.
2284,2315d2236
< Copyright (c) 1990 Regents of the University of California.
< All rights reserved.
<
< This code is derived from software contributed to Berkeley by
< Chris Torek.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
2597,2625d2517
< Copyright (c) 1991 The Regents of the University of California.
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
3118,3146d3009
< Copyright (c) 1997 The NetBSD Foundation, Inc.
< All rights reserved.
<
< This code is derived from software contributed to The NetBSD Foundation
< by Neil A. Carson and Mark Brinicombe
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
<
< THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
< ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
< TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
< PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
< BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
< CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
< SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
< INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
< CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
< ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
< POSSIBILITY OF SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
3335,3350d3197
< Copyright (c) 1998, 2015 Todd C. Miller <millert@openbsd.org>
<
< Permission to use, copy, modify, and distribute this software for any
< purpose with or without fee is hereby granted, provided that the above
< copyright notice and this permission notice appear in all copies.
<
< THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
< WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
< MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
< ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
< WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
< ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
< OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
<
< -------------------------------------------------------------------
<
4293,4321d4139
< Copyright (c) 2011 The Android Open Source Project
< Copyright (c) 2008 ARM Ltd
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. The name of the company may not be used to endorse or promote
< products derived from this software without specific prior written
< permission.
<
< THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED
< WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
< MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
< IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
< SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
< TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
< PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
< LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
< NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
< SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
4381c4199
< Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
---
> Copyright (c) 2025 Qualcomm Innovation Center, Inc. All rights reserved.
4462,4477d4279
< Copyright (c) 2013 Antoine Jacoutot <ajacoutot@openbsd.org>
<
< Permission to use, copy, modify, and distribute this software for any
< purpose with or without fee is hereby granted, provided that the above
< copyright notice and this permission notice appear in all copies.
<
< THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
< WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
< MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
< ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
< WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
< ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
< OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
<
< -------------------------------------------------------------------
<
4590c4392
< Copyright (c)1999 Citrus Project,
---
> Copyright (c) 2024, Intel Corporation
4594,4600c4396
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
---
> modification, are permitted provided that the following conditions are met:
4602,4612c4398,4399
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
---
> * Redistributions of source code must retain the above copyright notice,
> * this list of conditions and the following disclaimer.
4614c4401,4403
< -------------------------------------------------------------------
---
> * Redistributions in binary form must reproduce the above copyright notice,
> * this list of conditions and the following disclaimer in the documentation
> * and/or other materials provided with the distribution.
4616,4617c4405,4407
< Copyright (c)2001 Citrus Project,
< All rights reserved.
---
> * Neither the name of Intel Corporation nor the names of its contributors
> * may be used to endorse or promote products derived from this software
> * without specific prior written permission.
4619,4626c4409,4418
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
---
> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
> ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
> WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
> DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
> ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
> ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
> SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
4628,4639d4419
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
4642c4422
< Copyright (c)2003 Citrus Project,
---
> Copyright (c)2001 Citrus Project,
4854,4881d4633
< Copyright (c) 2002 Tim J. Robbins
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
<
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-2-Clause
<
4910,4937d4661
< Copyright (c) 2009 David Schultz <das@FreeBSD.org>
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
<
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-2-Clause
<
4938a4663
> Copyright (c) 2023 Klara, Inc.
5021,5048d4745
< SPDX-License-Identifier: BSD-2-Clause
<
< Copyright (c)1999 Citrus Project,
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
<
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
5051,5146d4747
< Copyright (c) 1989, 1993
< The Regents of the University of California. All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
< Copyright (c) 1990, 1993
< The Regents of the University of California. All rights reserved.
<
< This code is derived from software contributed to Berkeley by
< Chris Torek.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
< Copyright (c) 1992, 1993
< The Regents of the University of California. All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
5193,5290d4793
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
< Copyright (c) 1998 Softweyr LLC. All rights reserved.
<
< strtok_r, from Berkeley strtok
< Oct 13, 1998 by Wes Peters <wes@softweyr.com>
<
< Copyright (c) 1988, 1993
< The Regents of the University of California. All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notices, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notices, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY SOFTWEYR LLC, THE REGENTS AND CONTRIBUTORS
< ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
< LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
< PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SOFTWEYR LLC, THE
< REGENTS, OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
< SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
< TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
< PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
< LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
< NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
< SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
< Copyright (c) 1998 Todd C. Miller <Todd.Miller@courtesan.com>
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. The name of the author may not be used to endorse or promote products
< derived from this software without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
< INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
< AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
< THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
< EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
< PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
< OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
< WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
< OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
< ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
< Copyright (c) 1999
< David E. O'Brien
< Copyright (c) 1988, 1993
< The Regents of the University of California. All rights reserved.
diff -r android-ndk-r29/NOTICE.toolchain android-ndk-r30-beta1/NOTICE.toolchain
4355,4382d4354
< Copyright (c) 1993 John Brezak
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. The name of the author may be used to endorse or promote products
< derived from this software without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR `AS IS'' AND ANY EXPRESS OR
< IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
< WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
< DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
< INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
< (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
< SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
< STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
< ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
< POSSIBILITY OF SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
4672,4702d4643
< Copyright (C) 2010 The Android Open Source Project
< Copyright (c) 2008 ARM Ltd
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. The name of the company may not be used to endorse or promote
< products derived from this software without specific prior written
< permission.
<
< THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED
< WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
< MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
< IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
< SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
< TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
< PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
< LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
< NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
< SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
<
< Android adaptation and tweak by Jim Huang <jserv@0xlab.org>.
<
< -------------------------------------------------------------------
<
5290a5232,5275
> Copyright (C) 2025 The Android Open Source Project
>
> Licensed under the Apache License, Version 2.0 (the "License");
> you may not use this file except in compliance with the License.
> You may obtain a copy of the License at
>
> http://www.apache.org/licenses/LICENSE-2.0
>
> Unless required by applicable law or agreed to in writing, software
> distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the License for the specific language governing permissions and
> limitations under the License.
>
> -------------------------------------------------------------------
>
> Copyright (C) 2025 The Android Open Source Project
> All rights reserved.
>
> Redistribution and use in source and binary forms, with or without
> modification, are permitted provided that the following conditions
> are met:
> * Redistributions of source code must retain the above copyright
> notice, this list of conditions and the following disclaimer.
> * Redistributions in binary form must reproduce the above copyright
> notice, this list of conditions and the following disclaimer in
> the documentation and/or other materials provided with the
> distribution.
>
> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> SUCH DAMAGE.
>
> -------------------------------------------------------------------
>
6162,6193d6146
<
< This code is derived from software contributed to Berkeley by
< Roger L. Snyder.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< Copyright (c) 1989, 1993
< The Regents of the University of California. All rights reserved.
6226,6257d6178
< Copyright (c) 1990 Regents of the University of California.
< All rights reserved.
<
< This code is derived from software contributed to Berkeley by
< Chris Torek.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
6539,6567d6459
< Copyright (c) 1991 The Regents of the University of California.
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
7060,7088d6951
< Copyright (c) 1997 The NetBSD Foundation, Inc.
< All rights reserved.
<
< This code is derived from software contributed to The NetBSD Foundation
< by Neil A. Carson and Mark Brinicombe
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
<
< THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
< ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
< TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
< PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
< BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
< CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
< SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
< INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
< CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
< ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
< POSSIBILITY OF SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
7277,7292d7139
< Copyright (c) 1998, 2015 Todd C. Miller <millert@openbsd.org>
<
< Permission to use, copy, modify, and distribute this software for any
< purpose with or without fee is hereby granted, provided that the above
< copyright notice and this permission notice appear in all copies.
<
< THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
< WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
< MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
< ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
< WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
< ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
< OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
<
< -------------------------------------------------------------------
<
8235,8263d8081
< Copyright (c) 2011 The Android Open Source Project
< Copyright (c) 2008 ARM Ltd
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. The name of the company may not be used to endorse or promote
< products derived from this software without specific prior written
< permission.
<
< THIS SOFTWARE IS PROVIDED BY ARM LTD ``AS IS'' AND ANY EXPRESS OR IMPLIED
< WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
< MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
< IN NO EVENT SHALL ARM LTD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
< SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
< TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
< PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
< LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
< NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
< SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
8323c8141
< Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
---
> Copyright (c) 2025 Qualcomm Innovation Center, Inc. All rights reserved.
8404,8419d8221
< Copyright (c) 2013 Antoine Jacoutot <ajacoutot@openbsd.org>
<
< Permission to use, copy, modify, and distribute this software for any
< purpose with or without fee is hereby granted, provided that the above
< copyright notice and this permission notice appear in all copies.
<
< THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
< WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
< MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
< ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
< WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
< ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
< OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
<
< -------------------------------------------------------------------
<
8532c8334
< Copyright (c)1999 Citrus Project,
---
> Copyright (c) 2024, Intel Corporation
8536,8542c8338
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
---
> modification, are permitted provided that the following conditions are met:
8544,8554c8340,8341
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
---
> * Redistributions of source code must retain the above copyright notice,
> * this list of conditions and the following disclaimer.
8556c8343,8345
< -------------------------------------------------------------------
---
> * Redistributions in binary form must reproduce the above copyright notice,
> * this list of conditions and the following disclaimer in the documentation
> * and/or other materials provided with the distribution.
8558,8559c8347,8349
< Copyright (c)2001 Citrus Project,
< All rights reserved.
---
> * Neither the name of Intel Corporation nor the names of its contributors
> * may be used to endorse or promote products derived from this software
> * without specific prior written permission.
8561,8568c8351,8360
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
---
> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
> ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
> WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
> DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
> ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
> ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
> SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
8570,8581d8361
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
8584c8364
< Copyright (c)2003 Citrus Project,
---
> Copyright (c)2001 Citrus Project,
8796,8823d8575
< Copyright (c) 2002 Tim J. Robbins
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
<
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-2-Clause
<
8852,8879d8603
< Copyright (c) 2009 David Schultz <das@FreeBSD.org>
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
<
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-2-Clause
<
8880a8605
> Copyright (c) 2023 Klara, Inc.
8963,8990d8687
< SPDX-License-Identifier: BSD-2-Clause
<
< Copyright (c)1999 Citrus Project,
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
<
< THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
8993,9088d8689
< Copyright (c) 1989, 1993
< The Regents of the University of California. All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
< Copyright (c) 1990, 1993
< The Regents of the University of California. All rights reserved.
<
< This code is derived from software contributed to Berkeley by
< Chris Torek.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
< Copyright (c) 1992, 1993
< The Regents of the University of California. All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
9135,9232d8735
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
< ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
< IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
< ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
< FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
< DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
< OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
< HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
< LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
< OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
< SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
< Copyright (c) 1998 Softweyr LLC. All rights reserved.
<
< strtok_r, from Berkeley strtok
< Oct 13, 1998 by Wes Peters <wes@softweyr.com>
<
< Copyright (c) 1988, 1993
< The Regents of the University of California. All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notices, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notices, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. Neither the name of the University nor the names of its contributors
< may be used to endorse or promote products derived from this software
< without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED BY SOFTWEYR LLC, THE REGENTS AND CONTRIBUTORS
< ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
< LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
< PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SOFTWEYR LLC, THE
< REGENTS, OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
< SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
< TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
< PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
< LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
< NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
< SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
< Copyright (c) 1998 Todd C. Miller <Todd.Miller@courtesan.com>
< All rights reserved.
<
< Redistribution and use in source and binary forms, with or without
< modification, are permitted provided that the following conditions
< are met:
< 1. Redistributions of source code must retain the above copyright
< notice, this list of conditions and the following disclaimer.
< 2. Redistributions in binary form must reproduce the above copyright
< notice, this list of conditions and the following disclaimer in the
< documentation and/or other materials provided with the distribution.
< 3. The name of the author may not be used to endorse or promote products
< derived from this software without specific prior written permission.
<
< THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
< INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
< AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
< THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
< EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
< PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
< OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
< WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
< OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
< ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
<
< -------------------------------------------------------------------
<
< SPDX-License-Identifier: BSD-3-Clause
<
< Copyright (c) 1999
< David E. O'Brien
< Copyright (c) 1988, 1993
< The Regents of the University of California. All rights reserved.
diff -r android-ndk-r29/build/cmake/abis.cmake android-ndk-r30-beta1/build/cmake/abis.cmake
24c24
< set(NDK_ABI_riscv64_MIN_OS_VERSION "35")
---
> set(NDK_ABI_riscv64_MIN_OS_VERSION "36")
diff -r android-ndk-r29/build/cmake/android-legacy.toolchain.cmake android-ndk-r30-beta1/build/cmake/android-legacy.toolchain.cmake
158c158
< "https://android.googlesource.com/platform/ndk/+/master/docs/ClangMigration.md.")
---
> "https://android.googlesource.com/platform/ndk/+/mirror-goog-main-ndk/docs/ClangMigration.md.")
diff -r android-ndk-r29/build/cmake/android.toolchain.cmake android-ndk-r30-beta1/build/cmake/android.toolchain.cmake
169c169
< "https://android.googlesource.com/platform/ndk/+/master/docs/ClangMigration.md.")
---
> "https://android.googlesource.com/platform/ndk/+/mirror-goog-main-ndk/docs/ClangMigration.md.")
diff -r android-ndk-r29/build/cmake/platforms.cmake android-ndk-r30-beta1/build/cmake/platforms.cmake
2c2
< set(NDK_MAX_PLATFORM_LEVEL "35")
---
> set(NDK_MAX_PLATFORM_LEVEL "36")
diff -r android-ndk-r29/build/core/abis.mk android-ndk-r30-beta1/build/core/abis.mk
24c24
< NDK_ABI_riscv64_MIN_OS_VERSION := 35
---
> NDK_ABI_riscv64_MIN_OS_VERSION := 36
diff -r android-ndk-r29/build/core/platforms.mk android-ndk-r30-beta1/build/core/platforms.mk
2c2
< NDK_MAX_PLATFORM_LEVEL := 35
---
> NDK_MAX_PLATFORM_LEVEL := 36
diff -r android-ndk-r29/build/core/setup-app-platform.mk android-ndk-r30-beta1/build/core/setup-app-platform.mk
125c125
< https://android.googlesource.com/platform/ndk/+/master/docs/user/common_problems.md \
---
> https://developer.android.com/ndk/guides/common-problems \
diff -r android-ndk-r29/build/core/setup-app.mk android-ndk-r30-beta1/build/core/setup-app.mk
68c68
< $(call __ndk_info,See https://android.googlesource.com/platform/ndk/+/master/docs/HardFloatAbi.md)
---
> $(call __ndk_info,See https://android.googlesource.com/platform/ndk/+/mirror-goog-main-ndk/docs/HardFloatAbi.md)
diff -r android-ndk-r29/build/core/version.mk android-ndk-r30-beta1/build/core/version.mk
1c1
< NDK_MAJOR := 29
---
> NDK_MAJOR := 30
3c3
< NDK_BETA := 0
---
> NDK_BETA := 1
diff -r android-ndk-r29/meta/abis.json android-ndk-r30-beta1/meta/abis.json
30c30
< "min_os_version": 35
---
> "min_os_version": 36
diff -r android-ndk-r29/meta/platforms.json android-ndk-r30-beta1/meta/platforms.json
3c3
< "max": 35,
---
> "max": 36,
Binary files android-ndk-r29/prebuilt/darwin-x86_64/bin/make and android-ndk-r30-beta1/prebuilt/darwin-x86_64/bin/make differ
Binary files android-ndk-r29/prebuilt/darwin-x86_64/bin/ndkgdb.pyz and android-ndk-r30-beta1/prebuilt/darwin-x86_64/bin/ndkgdb.pyz differ
Binary files android-ndk-r29/prebuilt/darwin-x86_64/bin/ndkstack.pyz and android-ndk-r30-beta1/prebuilt/darwin-x86_64/bin/ndkstack.pyz differ
Binary files android-ndk-r29/prebuilt/darwin-x86_64/bin/vsyasm and android-ndk-r30-beta1/prebuilt/darwin-x86_64/bin/vsyasm differ
Binary files android-ndk-r29/prebuilt/darwin-x86_64/bin/yasm and android-ndk-r30-beta1/prebuilt/darwin-x86_64/bin/yasm differ
Binary files android-ndk-r29/prebuilt/darwin-x86_64/bin/ytasm and android-ndk-r30-beta1/prebuilt/darwin-x86_64/bin/ytasm differ
diff -r android-ndk-r29/prebuilt/darwin-x86_64/include/libyasm-stdint.h android-ndk-r30-beta1/prebuilt/darwin-x86_64/include/libyasm-stdint.h
5c5
< /* generated using /Volumes/Android/buildbot/src/android/ndk-r29-release/prebuilts/clang/host/darwin-x86/clang-r563880c/bin/clang --target=x86_64-apple-darwin -mmacosx-version-min=10.9 -DMACOSX_DEPLOYMENT_TARGET=10.9 -isysroot/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -Wl,-syslibroot,/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -mlinker-version=711 -L/Volumes/Android/buildbot/src/android/ndk-r29-release/prebuilts/clang/host/darwin-x86/clang-r563880c/lib -Os -fomit-frame-pointer -w -s */
---
> /* generated using /Volumes/Android/buildbot/src/googleplex-android/ndk-r30-release/prebuilts/clang/host/darwin-x86/clang-r574158/bin/clang --target=x86_64-apple-darwin -mmacosx-version-min=10.9 -DMACOSX_DEPLOYMENT_TARGET=10.9 -isysroot/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -Wl,-syslibroot,/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -mlinker-version=1015.7 -L/Volumes/Android/buildbot/src/googleplex-android/ndk-r30-release/prebuilts/clang/host/darwin-x86/clang-r574158/lib -Wl,-rpath,/usr/lib -Os -fomit-frame-pointer -w -s */
Binary files android-ndk-r29/prebuilt/darwin-x86_64/lib/libyasm.a and android-ndk-r30-beta1/prebuilt/darwin-x86_64/lib/libyasm.a differ
Only in android-ndk-r29: python-packages
Binary files android-ndk-r29/shader-tools/darwin-x86_64/glslc and android-ndk-r30-beta1/shader-tools/darwin-x86_64/glslc differ
Binary files android-ndk-r29/shader-tools/darwin-x86_64/spirv-as and android-ndk-r30-beta1/shader-tools/darwin-x86_64/spirv-as differ
Binary files android-ndk-r29/shader-tools/darwin-x86_64/spirv-cfg and android-ndk-r30-beta1/shader-tools/darwin-x86_64/spirv-cfg differ
Binary files android-ndk-r29/shader-tools/darwin-x86_64/spirv-dis and android-ndk-r30-beta1/shader-tools/darwin-x86_64/spirv-dis differ
Binary files android-ndk-r29/shader-tools/darwin-x86_64/spirv-link and android-ndk-r30-beta1/shader-tools/darwin-x86_64/spirv-link differ
Binary files android-ndk-r29/shader-tools/darwin-x86_64/spirv-opt and android-ndk-r30-beta1/shader-tools/darwin-x86_64/spirv-opt differ
Binary files android-ndk-r29/shader-tools/darwin-x86_64/spirv-reduce and android-ndk-r30-beta1/shader-tools/darwin-x86_64/spirv-reduce differ
Binary files android-ndk-r29/shader-tools/darwin-x86_64/spirv-val and android-ndk-r30-beta1/shader-tools/darwin-x86_64/spirv-val differ
diff -r android-ndk-r29/simpleperf/ChangeLog android-ndk-r30-beta1/simpleperf/ChangeLog
0a1,18
> Build 14523392 (Dec 2, 2025)
> Support simpleperf back to Android 10 (for NDK issue #2198). Please use simpleperf executable in
> previous releases for Android < 10.
> record command:
> Fix several issues of choosing between multiple ETR/TRBE sinks.
> inject command:
> Fix several issues of converting ETM data to AutoFDO for kernel modules and apps.
> stat command:
> Record thread name records for --per-thread.
> Fix --monitor-new-thread for multi-process apps.
> report scripts:
> Add --no-demangle to report raw symbols.
> Document:
> Add doc for using tracepoint/kprobe/uprobe events.
> Update doc for recording ETM data for AutoFDO for kernel.
> Add doc for recording ETM data for AutoFDO for Apps and Kernel modules.
>
>
diff -r android-ndk-r29/simpleperf/app_profiler.py android-ndk-r30-beta1/simpleperf/app_profiler.py
372c372
< if self.app_versioncode:
---
> if self.app_versioncode and self.android_version >= 12:
Binary files android-ndk-r29/simpleperf/bin/android/arm/simpleperf and android-ndk-r30-beta1/simpleperf/bin/android/arm/simpleperf differ
Binary files android-ndk-r29/simpleperf/bin/android/arm64/simpleperf and android-ndk-r30-beta1/simpleperf/bin/android/arm64/simpleperf differ
Binary files android-ndk-r29/simpleperf/bin/android/riscv64/simpleperf and android-ndk-r30-beta1/simpleperf/bin/android/riscv64/simpleperf differ
Binary files android-ndk-r29/simpleperf/bin/android/x86/simpleperf and android-ndk-r30-beta1/simpleperf/bin/android/x86/simpleperf differ
Binary files android-ndk-r29/simpleperf/bin/android/x86_64/simpleperf and android-ndk-r30-beta1/simpleperf/bin/android/x86_64/simpleperf differ
Binary files android-ndk-r29/simpleperf/bin/darwin/x86_64/libsimpleperf_report.dylib and android-ndk-r30-beta1/simpleperf/bin/darwin/x86_64/libsimpleperf_report.dylib differ
Binary files android-ndk-r29/simpleperf/bin/darwin/x86_64/simpleperf and android-ndk-r30-beta1/simpleperf/bin/darwin/x86_64/simpleperf differ
diff -r android-ndk-r29/simpleperf/doc/android_platform_profiling.md android-ndk-r30-beta1/simpleperf/doc/android_platform_profiling.md
121a122,198
>
> ## Use tracepoint/kprobe/uprobe events to get callstacks for certain functions
>
> Simpleperf supports tracepoint events, kprobe events and uprobe events. tracepoint events are
> predefined locations in the kernel source code that act as hooks for tracing. kprobe events allows
> adding dynamic tracepoints for kernel functions. uprobe events allows adding dynamic tracepoints
> for userspace binary functions.
>
> Through simpleperf, we can get callstacks when certain tracepoint, kprobe or uprobe events happen.
> This can help us understand why these events happen. Following are some examples.
>
> ```sh
> # We need `adb root` to monitor tracepoint/kprobe/uprobe events.
> (host) $ adb root
> # List all available tracepoint events.
> (device) $ simpleperf list tracepoint
> # Show options for kprobe/uprobe events.
> (device) $ simpleperf record --help
> --kprobe kprobe_event1,kprobe_event2,...
> Add kprobe events during recording. The kprobe_event format is in
> Documentation/trace/kprobetrace.rst in the kernel. Examples:
> 'p:myprobe do_sys_openat2 $arg2:string' - add event kprobes:myprobe
> 'r:myretprobe do_sys_openat2 $retval:s64' - add event kprobes:myretprobe
> --uprobe uprobe_event1,uprobe_event2,...
> Add uprobe events during recording. The uprobe_event format is in
> Documentation/trace/uprobetracer.rst in the kernel. Examples:
> 'p:myprobe /system/lib64/libc.so:0x1000'
> - add event uprobes:myprobe
> 'r:myretprobe /system/lib64/libc.so:0x1000'
> - add event uprobes:myretprobe
> -e event1[:modifier1],event2[:modifier2],...
> Select a list of events to record. An event can be:
> 1) an event name listed in `simpleperf list`;
> 2) a raw PMU event in rN format. N is a hex number.
> For example, r1b selects event number 0x1b.
> 3) a kprobe event added by --kprobe option.
> 4) a uprobe event added by --uprobe option.
> Modifiers can be added to define how the event should be
> monitored. Possible modifiers are:
> u - monitor user space events only
> k - monitor kernel space events only
>
> # To use tracepoint events, use -e to monitor them.
> # Example: Trace sched_process_exec event for system wide for 10 seconds, recording callstack and
> # field values (field values provide details of the event).
> (device) simpleperf record -e sched:sched_process_exec -g --duration 10 -a
> # The callstacks can be viewed by profiler UIs in `view_the_profile.md`.
> (host)$ report_html.py -i perf.data
> # The field values call be viewed by `report_sample.py` or `simpleperf dump`.
> (host)$ report_sample.py -i perf.data --show_tracing_data
>
> # To use kprobe events, use --kprobe to add kprobe events, and use -e to monitor them.
> # Example: Trace each sys_open syscall for command `sleep 1`, recording callstack, file path and
> # return value.
> (device)$ simpleperf record --kprobe \
> 'p:open do_sys_openat2 $arg2:string,r:open_ret do_sys_openat2 $retval:s64' \
> -e kprobes:open,kprobes:open_ret -g -m 4096 sleep 1
> # The callstacks can be viewed by profiler UIs in `view_the_profile.md`.
> (host)$ report_html.py -i perf.data
> # The field values (file path and return value) call be viewed by `report_sample.py` or
> # `simpleperf dump`.
> (host)$ report_sample.py -i perf.data --show_tracing_data
>
> # To use uprobe events, use --uprobe to add uprobe events, and use -e to monitor them.
> # uprobe events needs one manual step to convert symbol names to virtual addresses in ELF file.
> # Hopefully we can automate it in simpleperf in the future.
> # Example: Trace pthread_mutex_lock and pthread_mutex_unlock for command `sleep 1`, recording
> # callstack.
> (device) $ readelf -sW /system/lib64/libc.so | grep pthread_mutex
> 222: 0000000000084db0 232 FUNC GLOBAL DEFAULT 15 pthread_mutex_lock
> 1031: 0000000000085260 316 FUNC GLOBAL DEFAULT 15 pthread_mutex_unlock
> (device) $ /system/bin/simpleperf record --uprobe \
> "p:pthread_mutex_lock /system/lib64/libc.so:0x84db0,p:pthread_mutex_unlock /system/lib64/libc.so:0x85260"
> -e uprobes:pthread_mutex_lock,uprobes:pthread_mutex_unlock -g -m 4096 sleep 1
> # The callstacks can be viewed by profiler UIs in `view_the_profile.md`.
> (host) gecko_profile_generator.py -i perf.data | gzip > gecko-profile.json.gz
> ```
Only in android-ndk-r30-beta1/simpleperf/doc: collect_autofdo_profile_for_app.md
diff -r android-ndk-r29/simpleperf/doc/collect_etm_data_for_autofdo.md android-ndk-r30-beta1/simpleperf/doc/collect_etm_data_for_autofdo.md
97c97,98
< host $ create_llvm_prof -profile perf_inject_binary1.data -profiler text -binary path_of_binary1 -out a.prof -format binary
---
> host $ create_llvm_prof -profile perf_inject_binary1.data -profiler text -binary path_of_binary1 \
> -out a.prof -format extbinary
100c101,102
< host $ create_llvm_prof -profile perf_inject_kernel.data -profiler text -binary vmlinux -out a.prof -format binary
---
> host $ create_llvm_prof -profile perf_inject_kernel.data -profiler text -binary vmlinux \
> -out a.prof -format extbinary --prof_sym_list=false
142c144,145
< (host) <AOSP>$ simpleperf inject -i branch_list.data -o perf_inject_etm_test_loop.data --symdir out/target/product/generic_arm64/symbols/system/bin
---
> (host) <AOSP>$ simpleperf inject -i branch_list.data -o perf_inject_etm_test_loop.data \
> --symdir out/target/product/generic_arm64/symbols/system/bin
152c155,157
< (host) <AOSP>$ create_llvm_prof -profile perf_inject_etm_test_loop.data -profiler text -binary out/target/product/generic_arm64/symbols/system/bin/etm_test_loop -out etm_test_loop.afdo -format binary
---
> (host) <AOSP>$ create_llvm_prof -profile perf_inject_etm_test_loop.data -profiler text \
> -binary out/target/product/generic_arm64/symbols/system/bin/etm_test_loop \
> -out etm_test_loop.afdo -format extbinary
254a260,262
> # The --prof_sym_list=false flag is important for kernel profiles. Without it, clang
> # assumes any function not listed in the profile is cold. This can lead to unwanted
> # deoptimizations, even when -fprofile-sample-accurate is not enabled.
256c264
< --out=kernel.llvm_profdata --format extbinary
---
> --out=kernel.llvm_profdata --prof_sym_list=false --format extbinary
266a275,495
> ### A complete example: kernel module using DDK
>
> This example demonstrates how to collect ETM data for an Android kernel module on a device, convert
> it to an AutoFDO profile on the host machine, and then use that profile to build an optimized
> kernel module. We use a fake kernel module called zram. zram is built using [DDK](https://android.googlesource.com/kernel/build/+/refs/heads/main/kleaf/docs/ddk/main.md).
>
> **Step 1: Build zram with debug info for profiling**
>
> Ensure your `kernel/build` includes the necessary AutoFDO support.
> - For `android15-6.6`, apply patches from https://android-review.googlesource.com/c/kernel/build/+/3790006/, https://android-review.googlesource.com/c/kernel/build/+/3790007 and https://android-review.googlesource.com/c/kernel/build/+/3790008.
> - For `android16-6.12`, apply a patch from https://android-review.googlesource.com/c/kernel/build/+/3790008.
>
> Next, enable profiling debug info in `zram/BUILD.bazel`.
>
> ```bazel
> # drivers/block/zram/BUILD.bazel
> ddk_module(
> name = "zram",
> srcs = [
> "zcomp.c",
> "zram_drv.c",
> ],
> out = "zram.ko",
> ...
> debug_info_for_profiling = True, # Add this line
> )
> ```
>
> Finally, build the kernel and flash the resulting image to your device. The unstripped `zram.ko`
> will only be used on host during the profile generation step.
>
> **Step 2: Collect ETM data for zram.ko on device**
>
> We can record ETM data for the whole kernel. But it's more effective to use an ETM address filter
> to only record ETM data for `zram.ko`.
>
> ```sh
> (host) $ adb root && adb shell
> # Find address range for zram kernel module
> (device) $ setprop security.lower_kptr_restrict 1
> (device) $ cat /proc/sys/kernel/kptr_restrict
> 0
> (device) $ cat /proc/modules | grep zram
> zram 61440 4 Live 0xffffffe3f682c000 (O)
>
> # The zram module address range is 0xffffffe3f682c000 - (0xffffffe3f682c000 + 61440).
> # We can use address filter to only record ETM data for instructions in address range.
> # While running `simpleperf record`, use the device to trigger using the zram kernel module.
> (device) $ simpleperf record -e cs-etm:k -a --log verbose --addr-filter "filter 0xffffffe3f682c000-0xffffffe3f683b000" -o /data/local/tmp/perf.data -z --duration 10 --no-dump-symbols
> ...
> 10-02 14:19:56.258 23149 23149 I simpleperf: cmd_record.cpp:904 Aux data traced: 48,820,784
> 10-02 14:19:56.259 23149 23149 I simpleperf: cmd_record.cpp:896 Record compressed: 6.37 MB (original 143.09 MB, ratio 22)
> 10-02 14:19:56.259 23149 23149 D simpleperf: cmd_record.cpp:974 Prepare recording time 0.506453 s, recording time 10.0094 s, stop recording time 0.0240939 s, post process time 3.68489 s.
> 10-02 14:19:56.259 23149 23149 D simpleperf: command.cpp:291 command 'record' finished successfully
>
> # Pull perf.data on host.
> (host) $ adb pull /data/local/tmp/perf.data
> (host) $ simpleperf dump --dump-etm packet >dump.txt
> (host) $ cat dump.txt
> # Check if dump.txt has the following items:
> # Item 1: Map info and symbol addresses in memory for zram.ko
> record kernel_symbol: type 32769, misc 0x0, size 12408412
> kallsyms: ffffffe3f7000000 T _text
> ...
> record mmap: type 1, misc 0x1, size 88
> pid 4294967295, tid 0, addr 0xffffffe3f682c000, len 0xf000
> pgoff 0x0, filename [zram]
> sample_id: pid 0, tid 0
> sample_id: time 0
> sample_id: id 66
> sample_id: cpu 0, res 0
> # Item 2: ETM data for zram.ko
> Idx:4765; ID:1e; [0x95 0x88 0x69 ]; I_ADDR_S_IS0 : Address, Short, IS0.; Addr=0xFFFFFFE3F682D220 ~[0xD220]
> Idx:4768; ID:1e; [0xfe ]; I_ATOM_F3 : Atom format 3.; NEE
> Idx:4769; ID:1e; [0x95 0x0e ]; I_ADDR_S_IS0 : Address, Short, IS0.; Addr=0xFFFFFFE3F682D238 ~[0x38]
> Idx:4771; ID:1e; [0xff ]; I_ATOM_F3 : Atom format 3.; EEE
> Idx:4772; ID:1e; [0x95 0x3b ]; I_ADDR_S_IS0 : Address, Short, IS0.; Addr=0xFFFFFFE3F682D2EC ~[0xEC]
> # Item 3: Build Id of zram.ko
> record build_id: type 67, misc 0x1, size 100
> pid 4294967295
> build_id 0x849418f220e69bec4dbff9254bbf34126011bd7f
> filename [zram]
> ```
>
> To get good coverage, you usually need to record multiple times and record for a longer duration
> each time. Multiple `perf.data` files can be merged into one when converting them to an AutoFDO
> profile, for example: `simpleperf inject -i perf1.data,perf2.data`.
>
> **Step 3: Convert ETM data to AutoFDO Profile on Host**
>
> We need the unstripped `zram.ko` having .debug_line section, with the same build id as the one
> running on device. Put it in a directory, like `unstripped`.
>
> ```sh
> $ ls unstripped
> zram.ko
> # Check .debug_line section.
> $ readelf -SW unstripped/zram.ko
> [47] .debug_line PROGBITS 0000000000000000 08c530 002e03 00 C 0 0 1
> # Check if build id matches the one recorded in perf.data.
> $ readelf -n unstripped/zram.ko
> Build ID: 849418f220e69bec4dbff9254bbf34126011bd7f
>
> # Run `simpleperf inject` on host to convert perf.data to AutoFDO input file. It's fine to
> # use many perf.data input files.
> $ simpleperf inject -i perf.data -o perf_inject_zram.data --symdir unstripped
>
> # It's also fine to convert perf.data to branch list files on device or on host to reduce file size.
> $ simpleperf inject -i perf.data -o branch_zram.data --output branch-list
> $ simpleperf inject -i branch_zram.data -o perf_inject_zram.data --symdir unstripped
>
> # Check that we have a non-empty AutoFDO input file.
> # The addresses here are file offsets for instructions.
> $ cat perf_inject_zram.data
> 185
> 2cb8-2cc0:101
> 2cbc-2cc0:9
> 2cc0-2cc0:7
> 2cc4-2ce0:227
> 2cc8-2ce0:3
> 2cd4-2ce0:5
> 2cd8-2ce0:2
> 2ce8-2ce8:238
> ...
> 7038->34bc:303
> 7038->7038:159066
> 7048->703c:158773
> 7048->7080:159363
> 7074->2000:318312
> // build_id: 0x849418f220e69bec4dbff9254bbf34126011bd7f
> // [zram]
> ```
>
> The stock `create_llvm_prof` tool from the AutoFDO repository requires a patch to correctly process
> relocatable kernel modules (which lack program headers).
> - Download AutoFDO source code at https://github.com/google/autofdo.git.
> - Apply the following patch:
>
> ```diff
> diff --git a/addr2line.cc b/addr2line.cc
> index f8fe964..0951a3d 100644
> --- a/addr2line.cc
> +++ b/addr2line.cc
> @@ -87,8 +87,17 @@ void LLVMAddr2line::GetInlineStack(uint64_t address, SourceStack *stack) const {
> llvm::SmallVector<llvm::DWARFDie, 4> InlinedChain;
> cu_iter->second->getInlinedChainForAddress(address, InlinedChain);
>
> - uint32_t row_index = line_table->lookupAddress(
> - {address, llvm::object::SectionedAddress::UndefSection});
> + uint64_t section_index = llvm::object::SectionedAddress::UndefSection;
> + for (const auto& sec : getObject()->sections()) {
> + if (!sec.isText() || sec.isVirtual())
> + continue;
> + if (address >= sec.getAddress()
> + && address < sec.getAddress() + sec.getSize()) {
> + section_index = sec.getIndex();
> + break;
> + }
> + }
> + uint32_t row_index = line_table->lookupAddress({address, section_index});
> uint32_t file = (row_index == -1U ? -1U : line_table->Rows[row_index].File);
> uint32_t line = (row_index == -1U ? 0 : line_table->Rows[row_index].Line);
> uint32_t discriminator =
> diff --git a/symbol_map.cc b/symbol_map.cc
> index 2483835..b744f58 100644
> --- a/symbol_map.cc
> +++ b/symbol_map.cc
> @@ -478,6 +478,14 @@ void SymbolMap::ReadLoadableExecSegmentInfo(bool is_kernel) {
> << " vaddr=" << info.vaddr;
> }
> }
> +
> + if (si_vec.empty()) {
> + // It may be a kernel module. Create a fake segment for .text section.
> + ElfReader::SectionInfo info;
> + if (elf_reader.GetSectionInfoByName(".text", &info) != nullptr) {
> + add_loadable_exec_segment(info.offset, info.addr);
> + }
> + }
> }
>
> void SymbolMap::BuildSymbolMap() {
> ```
>
> Then we can build and run `create_llvm_prof`.
>
> ```sh
> # Run `create_llvm_prof` on host to convert AutoFDO input file to AutoFDO profile.
> $ create_llvm_prof -profile perf_inject_zram.data -profiler text \
> -binary unstripped/zram.ko \
> -out zram.afdo -format extbinary
>
> # Run `llvm_profdata` and verify profile.
> $ llvm-profdata show --sample --hot-func-list zram.afdo >zram_hot_functions.txt
> ...
> 6 out of 14 functions with profile (42.86%) are considered hot functions (max sample >= 157184).
> 131343543 out of 131415332 profile counts (99.95%) are from hot functions.
> Total sample (%) Max sample Entry sample Function name
> 37461663 (28.51%) 491516 165517 zram_slot_free_notify
> 27545033 (20.96%) 635040 159363 rvh_swap_readpage_bdev_sync
> 23754944 (18.08%) 317922 28 zcomp_decompress
>
> ```
>
> **Step 4: Use the AutoFDO Profile when Building zram.ko**
>
> Copy zram.afdo to the source directory of zram. Modify zram/BUILD.bazel.
>
> ```bazel
> ddk_module(
> name = "zram",
> srcs = [
> "zcomp.c",
> "zram_drv.c",
> ],
> out = "zram.ko",
> ...
> debug_info_for_profiling = True, # Add this line to add debug info for profiling.
> autofdo_profile = "zram.afdo", # Add this line to apply AutoFDO profile.
> }
> ```
diff -r android-ndk-r29/simpleperf/doc/sample_filter.md android-ndk-r30-beta1/simpleperf/doc/sample_filter.md
11c11,12
< can be generated by `sample_filter.py`, and passed to report scripts via `--filter-file`.
---
> can be generated by `sample_filter.py` or `sample_filter_for_perfetto_trace.py`, and passed to
> report scripts via `--filter-file`.
diff -r android-ndk-r29/simpleperf/doc/scripts_reference.md android-ndk-r30-beta1/simpleperf/doc/scripts_reference.md
400a401,424
>
> ## sample_filter_for_perfetto_trace.py
>
> `sample_filter_for_perfetto_trace.py` generates sample filter files as documented in
> [sample_filter.md](https://android.googlesource.com/platform/system/extras/+/refs/heads/main/simpleperf/doc/sample_filter.md).
>
> This script reads a Perfetto trace file, finds all events matching a specified regular
> expression, and then generates a sample filter file containing the time ranges of those events.
> You can then use this filter file with other tools, like `pprof_proto_generator.py`, to analyze
> performance samples that occurred only during those specific time ranges. This is useful for
> focusing on periods of interest within a larger trace.
>
> ```sh
> # Example: Filter samples based on a specific event in a Perfetto trace.
> $ sample_filter_for_perfetto_trace.py trace.perfetto-trace \
> --event-filter-regex "CriticalEventRegex"
> # Now use the generated filter.txt with another script.
> $ pprof_proto_generator.py --filter-file filter.txt
>
> # Example: Use --global-event to create a single time range covering all matching events.
> $ sample_filter_for_perfetto_trace.py trace.perfetto-trace \
> --event-filter-regex "GlobalCriticalEventRegex" --global-event
> $ pprof_proto_generator.py --filter-file filter.txt
> ```
diff -r android-ndk-r29/simpleperf/pprof_proto_generator.py android-ndk-r30-beta1/simpleperf/pprof_proto_generator.py
295c295,299
< self.read_elf = ReadElf(self.config['ndk_path'])
---
> readelf_path = ToolFinder.find_tool_path('llvm-readelf', self.config['ndk_path'])
> if not readelf_path:
> self.read_elf = None
> else:
> self.read_elf = ReadElf(self.config['ndk_path'], readelf_path)
diff -r android-ndk-r29/simpleperf/proto/branch_list.proto android-ndk-r30-beta1/simpleperf/proto/branch_list.proto
41a42,49
> message KernelModuleInfo {
> uint64 memory_start = 1;
> uint64 memory_end = 2;
> string memory_symbol_name = 3;
> uint64 memory_symbol_addr = 4;
> uint64 memory_symbol_len = 5;
> }
>
77a86
> KernelModuleInfo kernel_module_info = 6;
diff -r android-ndk-r29/simpleperf/sample_filter.py android-ndk-r30-beta1/simpleperf/sample_filter.py
19a20
> Filter file format is shown in docs/sample_filter.md.
Only in android-ndk-r30-beta1/simpleperf: sample_filter_for_perfetto_trace.py
diff -r android-ndk-r29/simpleperf/simpleperf_report_lib.py android-ndk-r30-beta1/simpleperf/simpleperf_report_lib.py
303a304,305
> if options.no_demangle:
> report_lib.DisableDemangle()
337a340
> self._DisableDemangleFunc = self._lib.DisableDemangle
500a504,507
>
> def DisableDemangle(self):
> """ Don't demangle symbol names. """
> self._DisableDemangleFunc(self.getInstance())
diff -r android-ndk-r29/simpleperf/simpleperf_utils.py android-ndk-r30-beta1/simpleperf/simpleperf_utils.py
86c86
< dirname = os.path.join(get_script_dir(), 'bin')
---
> in_dir_path = Path('bin')
92c92
< dirname = os.path.join(dirname, 'windows')
---
> in_dir_path = in_dir_path / 'windows'
96c96
< dirname = os.path.join(dirname, 'darwin')
---
> in_dir_path = in_dir_path / 'darwin'
98,103c98,109
< dirname = os.path.join(dirname, 'linux')
< dirname = os.path.join(dirname, 'x86_64' if sys.maxsize > 2 ** 32 else 'x86')
< binary_path = os.path.join(dirname, binary_name)
< if not os.path.isfile(binary_path):
< log_fatal("can't find binary: %s" % binary_path)
< return binary_path
---
> in_dir_path = in_dir_path / 'linux'
> in_dir_path = in_dir_path / ('x86_64' if sys.maxsize > 2 ** 32 else 'x86') / binary_name
> # First search in <script_dir>/bin directory.
> path1 = Path(get_script_dir()) / in_dir_path
> if path1.is_file():
> return str(path1)
> # Then check sys.path[0]. When we are built into binaries like pprof_proto_generator,
> # the bin directory is put in sys.path[0].
> path2 = Path(sys.path[0]) / in_dir_path
> if path2.is_file():
> return str(path2)
> log_fatal(f"can't find binary: {path1}")
433c439
< def __init__(self, binary_cache_dir: Optional[Union[Path, str]], readelf: ReadElf):
---
> def __init__(self, binary_cache_dir: Optional[Union[Path, str]], readelf: Optional[ReadElf]):
473a480,481
> if not self.readelf:
> return True
863c871,872
< match = re.match(r'^\s*([0-9A-Fa-f]+):', line)
---
> # Exclude C:\ on Windows.
> match = re.match(r'^\s*([0-9A-Fa-f]+):[^\\]', line)
1016,1019c1025,1031
< def __init__(self, ndk_path: Optional[str]):
< self.readelf_path = ToolFinder.find_tool_path('llvm-readelf', ndk_path)
< if not self.readelf_path:
< log_exit("Can't find llvm-readelf. " + NDK_ERROR_MESSAGE)
---
> def __init__(self, ndk_path: Optional[str], readelf_path: Optional[str] = None):
> if readelf_path:
> self.readelf_path = readelf_path
> else:
> self.readelf_path = ToolFinder.find_tool_path('llvm-readelf', ndk_path)
> if not self.readelf_path:
> log_exit("Can't find llvm-readelf. " + NDK_ERROR_MESSAGE)
1176a1189
> no_demangle: bool
1213a1227,1228
> parser.add_argument('--no-demangle', action='store_true', help="""
> Don't demangle symbol names""")
1300c1315,1316
< namespace.proguard_mapping_file, sample_filters, namespace.aggregate_threads)
---
> namespace.proguard_mapping_file, sample_filters, namespace.aggregate_threads,
> namespace.no_demangle)
diff -r android-ndk-r29/source.properties android-ndk-r30-beta1/source.properties
2,4c2,4
< Pkg.Revision = 29.0.14206865
< Pkg.BaseRevision = 29.0.14206865
< Pkg.ReleaseName = r29
---
> Pkg.Revision = 30.0.14904198-beta1
> Pkg.BaseRevision = 30.0.14904198
> Pkg.ReleaseName = r30-beta1
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/AndroidVersion.txt android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/AndroidVersion.txt
2c2
< based on r563880c
---
> based on r574158
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/BUILD_INFO android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/BUILD_INFO
4,5c4,5
< "bid": "13989888",
< "branch": "git_llvm-r563880-release",
---
> "bid": "14475230",
> "branch": "git_llvm-r574158-release",
9c9
< "device_dir": "/buildbot/src/googleplex-android/llvm-r563880-release",
---
> "device_dir": "/buildbot/src/googleplex-android/llvm-r574158-release",
14c14
< "hostname": "wphb30.hot.corp.google.com",
---
> "hostname": "wphb22.hot.corp.google.com",
18c18
< "out_dir": "/buildbot/src/googleplex-android/llvm-r563880-release/out",
---
> "out_dir": "/buildbot/src/googleplex-android/llvm-r574158-release/out",
22c22
< "revision": "llvm-r563880-release",
---
> "revision": "llvm-r574158-release",
34,35c34,35
< "revision": "5e96669f06077099aa41290cdb4c5e6fa0f59349",
< "upstream": "llvm-r563880-release"
---
> "revision": "9f872551d3c681d06fd303b36f16ed5c274735eb",
> "upstream": "llvm-r574158-release"
45,46c45,46
< "revision": "1dab3288f660d43a6cb2479107e2b54b3ab0a2a1",
< "upstream": "llvm-r563880-release"
---
> "revision": "7f66b813d8e47b8e2be564f40dd9de357380a640",
> "upstream": "llvm-r574158-release"
57c57
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
67,68c67,68
< "revision": "e4ed541c00706c0108c57921ac4b95ca98e87ec5",
< "upstream": "llvm-r563880-release"
---
> "revision": "9b4f94761eea83b6edd1b59f485cddff6bd01b38",
> "upstream": "llvm-r574158-release"
79,80c79,80
< "revision": "a127940854eda28ef583009f8d5fd886df2c669f",
< "upstream": "llvm-r563880-release"
---
> "revision": "be300c3034da29d175071ca5dc7844eee8e40281",
> "upstream": "llvm-r574158-release"
92c92
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
104c104
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
116c116
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
127,128c127,128
< "revision": "33bb76fd08d10f224866f45e5b036da4b2973443",
< "upstream": "llvm-r563880-release"
---
> "revision": "2cfa919699338b68540f47a9ba767f08acaca4fc",
> "upstream": "llvm-r574158-release"
144,145c144,145
< "revision": "5a1ccbb7e1dda94e32b2e119d2c7c56e44fd232c",
< "upstream": "llvm-r563880-release"
---
> "revision": "52eddc7e88c3d6346b80207e7306a9a95581c1f9",
> "upstream": "llvm-r574158-release"
161,162c161,162
< "revision": "ed0315e2a72296bd40e11edf5c112dec349ab5e1",
< "upstream": "llvm-r563880-release"
---
> "revision": "248e5987f7a24f37ec2f10d4dfa78ee46f001223",
> "upstream": "llvm-r574158-release"
172,173c172,173
< "revision": "7ce0c0484513223124cb8c4f036ebd7bd18a61f1",
< "upstream": "llvm-r563880-release"
---
> "revision": "90212406d77e2b80b0ef0cb6ebcf2f4baf81e02f",
> "upstream": "llvm-r574158-release"
184c184
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
195c195
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
206c206
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
217c217
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
228c228
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
240c240
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
252c252
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
264c264
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
275,276c275,276
< "revision": "c89b9c5bc6375ebab012e4f1135d53e3c80621f0",
< "upstream": "llvm-r563880-release"
---
> "revision": "b1e73ae4d5409c740061a8fb656756f226b1ee08",
> "upstream": "llvm-r574158-release"
287,288c287,288
< "revision": "3795f4c8985f2c16f070561ad7d3ce8a318b43cb",
< "upstream": "llvm-r563880-release"
---
> "revision": "bdb138fc1316dc53132b11a03174bf6d2ecd5a00",
> "upstream": "llvm-r574158-release"
299,300c299,300
< "revision": "c6208a1b217cc688adb1b7617ff3fde3ba769ea1",
< "upstream": "llvm-r563880-release"
---
> "revision": "be252985c50c8ab59496ee072ff705dfd509b26c",
> "upstream": "llvm-r574158-release"
312c312
< "upstream": "llvm-r563880-release"
---
> "upstream": "llvm-r574158-release"
322,323c322,323
< "revision": "2895b391eb6c83e54f98aabc4c09dc00c9e97c99",
< "upstream": "llvm-r563880-release"
---
> "revision": "4711a2ba64d282d68380edd8780127f111e5c7ff",
> "upstream": "llvm-r574158-release"
327c327
< "revision": "26ecbe1b0d4cf3683f8eb3929629e208336fc1a9"
---
> "revision": "7c6435c5176c270d285033c777fbefa30d330fd2"
349,355c349,355
< "platform/external/toolchain-utils": "e4ed541c00706c0108c57921ac4b95ca98e87ec5",
< "platform/external/zstd": "7ce0c0484513223124cb8c4f036ebd7bd18a61f1",
< "platform/manifest": "26ecbe1b0d4cf3683f8eb3929629e208336fc1a9",
< "platform/prebuilts/build-tools": "33bb76fd08d10f224866f45e5b036da4b2973443",
< "platform/prebuilts/clang/host/darwin-x86": "c89b9c5bc6375ebab012e4f1135d53e3c80621f0",
< "platform/prebuilts/clang/host/linux-x86": "3795f4c8985f2c16f070561ad7d3ce8a318b43cb",
< "platform/prebuilts/clang/host/windows-x86": "c6208a1b217cc688adb1b7617ff3fde3ba769ea1",
---
> "platform/external/toolchain-utils": "9b4f94761eea83b6edd1b59f485cddff6bd01b38",
> "platform/external/zstd": "90212406d77e2b80b0ef0cb6ebcf2f4baf81e02f",
> "platform/manifest": "7c6435c5176c270d285033c777fbefa30d330fd2",
> "platform/prebuilts/build-tools": "2cfa919699338b68540f47a9ba767f08acaca4fc",
> "platform/prebuilts/clang/host/darwin-x86": "b1e73ae4d5409c740061a8fb656756f226b1ee08",
> "platform/prebuilts/clang/host/linux-x86": "bdb138fc1316dc53132b11a03174bf6d2ecd5a00",
> "platform/prebuilts/clang/host/windows-x86": "be252985c50c8ab59496ee072ff705dfd509b26c",
358c358
< "platform/prebuilts/gcc/linux-x86/host/x86_64-linux-glibc2.17-4.8": "a127940854eda28ef583009f8d5fd886df2c669f",
---
> "platform/prebuilts/gcc/linux-x86/host/x86_64-linux-glibc2.17-4.8": "be300c3034da29d175071ca5dc7844eee8e40281",
360,361c360,361
< "platform/prebuilts/go/darwin-x86": "5a1ccbb7e1dda94e32b2e119d2c7c56e44fd232c",
< "platform/prebuilts/go/linux-x86": "ed0315e2a72296bd40e11edf5c112dec349ab5e1",
---
> "platform/prebuilts/go/darwin-x86": "52eddc7e88c3d6346b80207e7306a9a95581c1f9",
> "platform/prebuilts/go/linux-x86": "248e5987f7a24f37ec2f10d4dfa78ee46f001223",
365c365
< "platform/tools/repohooks": "2895b391eb6c83e54f98aabc4c09dc00c9e97c99",
---
> "platform/tools/repohooks": "4711a2ba64d282d68380edd8780127f111e5c7ff",
367,368c367,368
< "toolchain/llvm-project": "5e96669f06077099aa41290cdb4c5e6fa0f59349",
< "toolchain/llvm_android": "1dab3288f660d43a6cb2479107e2b54b3ab0a2a1",
---
> "toolchain/llvm-project": "9f872551d3c681d06fd303b36f16ed5c274735eb",
> "toolchain/llvm_android": "7f66b813d8e47b8e2be564f40dd9de357380a640",
372c372
< "repo-init-branch": "llvm-r563880-release",
---
> "repo-init-branch": "llvm-r574158-release",
377,378c377,378
< "sync_finish_time": 1756145797.202256,
< "sync_start_time": 1756145695.6007302,
---
> "sync_finish_time": 1763698978.701304,
> "sync_start_time": 1763698873.770916,
383c383
< "clang-13989888-darwin-x86-builders.tar.xz",
---
> "manifest_14475230.xml",
384a385,386
> "clang-14475230-darwin-x86.tar.xz",
> "clang-14475230-darwin-x86-builders.tar.xz",
387,388d388
< "clang-13989888-darwin-x86.tar.xz",
< "manifest_13989888.xml",
406c406
< "dist-dir": "/buildbot/dist_dirs/git_llvm-r563880-release-mac_arm64-llvm_darwin_mac/13989888",
---
> "dist-dir": "/buildbot/dist_dirs/git_llvm-r574158-release-mac_arm64-llvm_darwin_mac/14475230",
416,418c416,418
< "storage_path": "/bigstore/android-build/builds/git_llvm-r563880-release-mac_arm64-llvm_darwin_mac/13989888/cd24ca7521c55f8a3cea9cc88eaa0b4e1539b73dc2db8cdc3fc551bb10fab0dd/1",
< "target_finish_time": 1756157176.1026921,
< "target_start_time": 1756145798.4471948,
---
> "storage_path": "/bigstore/android-build/builds/git_llvm-r574158-release-mac_arm64-llvm_darwin_mac/14475230/6f7367653857c4ead2e4ed6c466bff169f565c4984df33f336854562185f55b2/1",
> "target_finish_time": 1763710690.729491,
> "target_start_time": 1763699017.68132,
427,428c427,428
< "containerId": "L92900030016987580",
< "creationTimeMillis": "1756145586927",
---
> "containerId": "L44300030041579446",
> "creationTimeMillis": "1763698816656",
430c430
< "attemptId": "PzrHddTTNscSNtsh7opVkQ==",
---
> "attemptId": "0oDoiatPcqTlRLLwIKASZQ==",
433,435c433,435
< "displayMessage": "Build 13989888 for node L92900030016987580:N86200030258551819 has been inserted",
< "messageString": "Build 13989888 for node L92900030016987580:N86200030258551819 has been inserted",
< "timeMillis": "1756145682104"
---
> "displayMessage": "Build 14475230 for node L44300030041579446:N57700030397567424 has been inserted",
> "messageString": "Build 14475230 for node L44300030041579446:N57700030397567424 has been inserted",
> "timeMillis": "1763698863802"
438,440c438,440
< "displayMessage": "Build 13989888 for node L92900030016987580:N86200030258551819 has been popped",
< "messageString": "Build 13989888 for node L92900030016987580:N86200030258551819 has been popped",
< "timeMillis": "1756145692946"
---
> "displayMessage": "Build 14475230 for node L44300030041579446:N57700030397567424 has been popped",
> "messageString": "Build 14475230 for node L44300030041579446:N57700030397567424 has been popped",
> "timeMillis": "1763698870516"
443c443
< "startTimeMillis": "1756145677213"
---
> "startTimeMillis": "1763698859686"
446c446
< "id": "L92900030016987580:N86200030258551819",
---
> "id": "L44300030041579446:N57700030397567424",
449c449
< "neighborId": "L92900030016987580:N04500030258551816"
---
> "neighborId": "L44300030041579446:N41300030397567421"
453c453
< "lastUpdatedMillis": "1756145692984",
---
> "lastUpdatedMillis": "1763698870567",
459c459
< "revision": "hwloUrvursew5kDN4vm1aQ==",
---
> "revision": "o9p8pnPK3kJKoOuFgIdcEw==",
464c464
< "timestampMillis": "1756145586927"
---
> "timestampMillis": "1763698816656"
468c468
< "timestampMillis": "1756145676084"
---
> "timestampMillis": "1763698859030"
472c472
< "timestampMillis": "1756145677213"
---
> "timestampMillis": "1763698859686"
478c478
< "branch": "git_llvm-r563880-release",
---
> "branch": "git_llvm-r574158-release",
482,483c482,483
< "buildId": "13989888",
< "gerritPollerTimestamp": "1756145569255",
---
> "buildId": "14475230",
> "gerritPollerTimestamp": "1763698741895",
500c500
< "bbcpDepotContextCl": 783172273,
---
> "bbcpDepotContextCl": 789777983,
Only in android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin: aarch64-linux-android36-clang
Only in android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin: aarch64-linux-android36-clang++
Only in android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin: armv7a-linux-androideabi36-clang
Only in android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin: armv7a-linux-androideabi36-clang++
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-21 and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-21 differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-check and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-check differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-cl and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-cl differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-format and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-format differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-scan-deps and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-scan-deps differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-tidy and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang-tidy differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/clangd and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/clangd differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/dsymutil and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/dsymutil differ
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/git-clang-format android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/git-clang-format
22c22
< Requires Python 2.7 or Python 3
---
> Requires Python version >=3.8
35,37c35
< usage = (
< "git clang-format [OPTIONS] [<commit>] [<commit>|--staged] [--] [<file>...]"
< )
---
> usage = "git clang-format [OPTIONS] [<commit>] [<commit>|--staged] [--] [<file>...]"
108a107
> "cl", # OpenCL
118c117,118
< "json", # Json
---
> "json",
> "ipynb", # Json
237,240c237
< die(
< "--diff_from_common_commit is only allowed when two commits are "
< "given"
< )
---
> die("--diff_from_common_commit is only allowed when two commits are given")
386,389c383
< die(
< "`%s` is a %s, but a commit or filename was expected"
< % (value, object_type)
< )
---
> die("`%s` is a %s, but a commit or filename was expected" % (value, object_type))
464,466c458
< matches.setdefault(filename, []).append(
< Range(start_line, line_count)
< )
---
> matches.setdefault(filename, []).append(Range(start_line, line_count))
781,783c773
< unstaged_files = run(
< "git", "diff-files", "--name-status", *changed_files
< )
---
> unstaged_files = run("git", "diff-files", "--name-status", *changed_files)
786,787c776
< "The following files would be modified but have unstaged "
< "changes:",
---
> "The following files would be modified but have unstaged changes:",
829,831c818
< print(
< "`%s` printed to stderr:" % " ".join(args), file=sys.stderr
< )
---
> print("`%s` printed to stderr:" % " ".join(args), file=sys.stderr)
837,839c824
< print(
< "`%s` returned %s" % (" ".join(args), p.returncode), file=sys.stderr
< )
---
> print("`%s` returned %s" % (" ".join(args), p.returncode), file=sys.stderr)
Only in android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin: i686-linux-android36-clang
Only in android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin: i686-linux-android36-clang++
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/ld and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/ld differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/ld.lld and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/ld.lld differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/ld64.lld and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/ld64.lld differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/lld and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/lld differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/lld-link and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/lld-link differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/lldb and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/lldb differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/lldb-argdumper and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/lldb-argdumper differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-addr2line and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-addr2line differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ar and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ar differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-as and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-as differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-bolt and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-bolt differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-cfi-verify and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-cfi-verify differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-config and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-config differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-cov and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-cov differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-cxxfilt and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-cxxfilt differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-dis and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-dis differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-dlltool and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-dlltool differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-dwarfdump and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-dwarfdump differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-dwp and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-dwp differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ifs and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ifs differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-lib and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-lib differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-link and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-link differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-lipo and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-lipo differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ml and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ml differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-modextract and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-modextract differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-nm and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-nm differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-objcopy and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-objcopy differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-objdump and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-objdump differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-profdata and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-profdata differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ranlib and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-ranlib differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-rc and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-rc differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-readelf and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-readelf differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-readobj and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-readobj differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-size and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-size differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-strings and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-strings differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-strip and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-strip differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-symbolizer and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-symbolizer differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-windres and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/llvm-windres differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/merge-fdata and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/merge-fdata differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/perf2bolt and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/perf2bolt differ
Only in android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin: riscv64-linux-android35-clang
Only in android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin: riscv64-linux-android35-clang++
Only in android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin: riscv64-linux-android36-clang
Only in android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin: riscv64-linux-android36-clang++
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/sancov and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/sancov differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/sanstats and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/sanstats differ
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/wasm-ld and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/wasm-ld differ
Only in android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin: x86_64-linux-android36-clang
Only in android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin: x86_64-linux-android36-clang++
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/bin/yasm and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/bin/yasm differ
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/clang_source_info.md android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/clang_source_info.md
1c1
< Base revision: [386af4a5c64ab75eaee2448dc38f2e34a40bfed0](https://github.com/llvm/llvm-project/commits/386af4a5c64ab75eaee2448dc38f2e34a40bfed0)
---
> Base revision: [2d287f51eff2a5fbf84458a33f7fb2493cf67965](https://github.com/llvm/llvm-project/commits/2d287f51eff2a5fbf84458a33f7fb2493cf67965)
3,48c3,52
< - [Add-cmake-c-cxx-asm-linker-flags-v2.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/Add-cmake-c-cxx-asm-linker-flags-v2.patch)
< - [Add-stubs-and-headers-for-nl_types-APIs-v2.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/Add-stubs-and-headers-for-nl_types-APIs-v2.patch)
< - [BOLT-Increase-max-allocation-size-to-allow-BOLTing-clang-and-rustc.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/BOLT-Increase-max-allocation-size-to-allow-BOLTing-clang-and-rustc.patch)
< - [Diagnose the code with trailing comma in the function call.](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/922f339c4ef3631f66dc4b8caa4c356103dbf69d.patch)
< - [Disable-integer-sanitizer-for-__libcpp_blsr.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/Disable-integer-sanitizer-for-__libcpp_blsr.patch)
< - [Disable-std-utilities-charconv-charconv.msvc-test.pa.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/Disable-std-utilities-charconv-charconv.msvc-test.pa.patch)
< - [Disable-vfork-fork-events-v2.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/Disable-vfork-fork-events-v2.patch)
< - [Fix Windows build (#137414)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/935bc84158e933239047de69b9edc77969b5c70c.patch)
< - [Fix connecting via abstract socket (#136466)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/488eeb3ae508221f8e476bbc9d2e9f014542862e.patch)
< - [Fix restoring callee-saves from FP with hazard padding (#143371)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/d8e8ab79773f739c602c5869f80c6c5b5962c558.patch)
< - [Revert "[libc++] Reduce std::conjunction overhead (#124259)"](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/0227396417d4625bc93affdd8957ff8d90c76299.patch)
< - [Revert global widening transform (#144652)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/183acdd27985afd332463e3d9fd4a2ca46d85cf1.patch)
< - [Revert-Driver-Allow-target-override-containing-.-in-executable-name-v2.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/Revert-Driver-Allow-target-override-containing-.-in-executable-name-v2.patch)
< - [Revert-Recommit-DAGCombiner-Transform-icmp-eq-ne-and.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/Revert-Recommit-DAGCombiner-Transform-icmp-eq-ne-and.patch)
< - [Revert-libc-Don-t-implement-stdatomic.h-before-C-23-.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/Revert-libc-Don-t-implement-stdatomic.h-before-C-23-.patch)
< - [Skip tests if socket name is longer than 107 bytes (#137405)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/6c78dedc14e7431aa0dd92b9dd8d35bed3e0ed7d.patch)
< - [ThinLTOBitcodeWriter: Emit __cfi_check to full LTO part of](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/ff85dbdf6b399eac7bffa13e579f0f5e6edac3c0.patch)
< - [[AArch64] Disallow vscale x 1 partial reductions (#125252)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/c7995a6905f2320f280013454676f992a8c6f89f.patch)
< - [[AArch64] Fix op mask detection in](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/2c43479683651f0eb208c97bf12e49bacbea4e6f.patch)
< - [[AArch64][DAG] Allow fptos/ui.sat to scalarized. (#126799)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/bf7af2d12e3bb8c7bc322ed1c5bf4e9904ad409c.patch)
< - [[AArch64][SME2] Don't preserve ZT0 around SME ABI routines](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/107260cc29368070bba815d94f9d5b7cec1df7d0.patch)
< - [[AArch64][SME] Allow spills of ZT0 around SME ABI routines](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/8c7a2ce01a77c96028fe2c8566f65c45ad9408d3.patch)
< - [[AArch64][SME] Fix accessing the emergency spill slot with](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/b5cf03033251a642b184b2e0ea6bdac171c17702.patch)
< - [[ARM] Speedups for CombineBaseUpdate. (#129725)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/86cf4ed7e9510a6828e95e8b36893eec116c9cf9-v2.patch)
< - [[Clang] Fix an integer overflow issue in computing CTAD's](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/b8d1f3d62746110ff0c969a136fc15f1d52f811d.patch)
< - [[Clang] Treat constexpr-unknown value as invalid in](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/27757fb87429c89a65bb5e1f619ad700928db0fd.patch)
< - [[ELF,RISCV] Fix oscillation due to call relaxation](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/8957e64a20fc7f4277565c6cfe3e555c119783ce.patch)
< - [[HWASan] fix missing BTI attribute on personality function](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/a76cf062a57097ad7971325551854bd5f3d38d94.patch)
< - [[InstCombine] Check nowrap flags when folding comparison of](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/9725595f3acc0c1aaa354e15ac4ee2b1f8ff4cc9.patch)
< - [[LLD][ELF][AArch64] Discard .ARM.attributes sections](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/ba476d0b83dc8a4bbf066dc02a0f73ded27114f0.patch)
< - [[MTE] decide whether to tag global in AsmPrinter (#135891)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/9ed4c705ac1c5c8797f328694f6cd22fbcdae03b.patch)
< - [[MTE] do not tag zero sized globals (#136020)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/6bac20b391edce2bde348e59f5be2143157304b5.patch)
< - [[MemCpyOpt] Fix clobber check in fca2memcpy optimization](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/5da9044c40840187330526ca888290a95927a629.patch)
< - [[VPlan] Only use SCEV for live-ins in tryToWiden. (#125436)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/30f3752e54fa7cd595a434a985efbe9a7abe9b65.patch)
< - [[clang] Remove hasValue() check in](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/8b3d4bdf8bade1d1faa8ff3fcbdda7060f8b46d8.patch)
< - [[libc++] Add tests for the ABI break introduced by switching](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/2a83cf5d0e592890f74c5d5ff4a30ae4cf54b61b.patch)
< - [[libc++] Also provide an alignment assumption for vector in](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/ccb08b9dab7d829f8d9703d8b46b98e2d6717d0e.patch)
< - [[libc++] Expand Android libc++ test config files (#142846)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/13fe07d670e8a115929c9e595c4490ef5c75f583.patch)
< - [[libc++] Fix ABI break introduced by switching to](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/f5e687d7bf49cd9fe38ba7acdeb52d4f30468dee.patch)
< - [[libc++] Fix padding calculation for function reference types](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/769c42f4a552a75c8c38870ddc1b50d2ea874e4e.patch)
< - [[libc++] Fix stray usage of _LIBCPP_HAS_NO_WIDE_CHARACTERS on](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/bcfd9f81e1bc9954d616ffbb8625099916bebd5b.patch)
< - [[libc++] Reduce the dependency of the locale base API on the](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/f00b32e2d0ee666d32f1ddd0c687e269fab95b44.patch)
< - [[libc++][TZDB] Fixes %z escaping. (#125399)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/a27f3b2bb137001735949549354aff89dbf227f4.patch)
< - [[lld] Merge equivalent symbols found during ICF (#134342)](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/cherry/8389d6fad76bd880f02bddce7f0f2612ff0afc40.patch)
< - [compiler-rt-Allow-finding-LLVMConfig-if-CMAKE_FIND_ROOT_PATH_MODE_PACKAGE-is-set-to-ONLY.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/compiler-rt-Allow-finding-LLVMConfig-if-CMAKE_FIND_ROOT_PATH_MODE_PACKAGE-is-set-to-ONLY.patch)
< - [move-cxa-demangle-into-libcxxdemangle.patch](https://android.googlesource.com/toolchain/llvm_android/+/1dab3288f660d43a6cb2479107e2b54b3ab0a2a1/patches/move-cxa-demangle-into-libcxxdemangle.patch)
\ No newline at end of file
---
> - [Add-cmake-c-cxx-asm-linker-flags-v2.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/Add-cmake-c-cxx-asm-linker-flags-v2.patch)
> - [Add-stubs-and-headers-for-nl_types-APIs-v2.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/Add-stubs-and-headers-for-nl_types-APIs-v2.patch)
> - [BOLT-Increase-max-allocation-size-to-allow-BOLTing-clang-and-rustc.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/BOLT-Increase-max-allocation-size-to-allow-BOLTing-clang-and-rustc.patch)
> - [Cap IntRange::Width to MaxWidth (#145356)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/5bbe1536dfa6f1dce8737e466c209c553d614e50.patch)
> - [Disable-integer-sanitizer-for-__libcpp_blsr.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/Disable-integer-sanitizer-for-__libcpp_blsr.patch)
> - [Disable-std-utilities-charconv-charconv.msvc-test.pa.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/Disable-std-utilities-charconv-charconv.msvc-test.pa.patch)
> - [Disable-unsigned-integer-overflow-checks-for-s.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/Disable-unsigned-integer-overflow-checks-for-s.patch)
> - [Disable-vfork-fork-events-v2.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/Disable-vfork-fork-events-v2.patch)
> - [Fix an error introduced in #138518 (#142988)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/f53216793e15588d65601196b7a0625f73c12cea.patch)
> - [Fix restoring callee-saves from FP with hazard padding (#143371)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/d8e8ab79773f739c602c5869f80c6c5b5962c558.patch)
> - [Fix-for-x86-ndk-sysroot-define.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/Fix-for-x86-ndk-sysroot-define.patch)
> - [Reapply "IR: Remove uselist for constantdata (#137313)"](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/9383fb23e18bb983d0024fb956a0a724ef9eb03d.patch)
> - [Remove delayed typo expressions (#143423)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/9eef4d1c5fa6b1bcbbe675c14ca8301d5d346f7b.patch)
> - [Revert "[HIP] use offload wrapper for non-device-only non-rdc](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/f5e499a3383c1e3b9f60e60151075e8d9c1c3166.patch)
> - [Revert "[clang][Dependency Scanning] Report What a Module](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/8f4fd864033601aad99a10c2b878769b84df7537.patch)
> - [Revert "[lld] Merge equivalent symbols found during ICF](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/fd3fecfc0936703f2715fe6fea890e81b0b3c2ac.patch)
> - [Revert global widening transform (#144652)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/183acdd27985afd332463e3d9fd4a2ca46d85cf1-v2.patch)
> - [Revert-Recommit-DAGCombiner-Transform-icmp-eq-ne-and.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/Revert-Recommit-DAGCombiner-Transform-icmp-eq-ne-and.patch)
> - [Revert-libc-Don-t-implement-stdatomic.h-before-C-23-v3.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/Revert-libc-Don-t-implement-stdatomic.h-before-C-23-v3.patch)
> - [ThinLTOBitcodeWriter: Emit __cfi_check to full LTO part of](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/ff85dbdf6b399eac7bffa13e579f0f5e6edac3c0.patch)
> - [ThinLTOBitcodeWriter: Split modules with __cfi_check and no](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/43c85afce9c25141de79da6731b1d5f8bb2491b1.patch)
> - [ValueMapper: Delete unused initializers of replaced appending](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/1f38d49ebe96417e368a567efa4d650b8a9ac30f.patch)
> - [[AArch64] Ensure the LR is preserved if we must call](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/af7166a3f126ce4e4d2a05eccc1358bd0427cf0f.patch)
> - [[AArch64] Restrict .variant_pcs directive to ELF targets](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/3aaf44f95de24339d73c0093576a4a3cc42404ad.patch)
> - [[AArch64][SME] Disable tail calls for callees that require](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/bfb54e8ba6262a509343985c018f9a8d52963734.patch)
> - [[AArch64][SME] Fix accessing the emergency spill slot with](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/b5cf03033251a642b184b2e0ea6bdac171c17702.patch)
> - [[AArch64][SME] Precommit tests for LUT4I `Chain` issues (NFC)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/cd0f560cc7e88bffedc4c34e3eb3efbf00dcb3ef.patch)
> - [[AArch64][SME] Preserve `Chain` when selecting multi-vector](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/031fb7414fd6edf20e0cd7f7783666313169a0d2.patch)
> - [[C] Fix a false-positive with tentative defn compat (#139738)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/e3f87e15910a5f1c5552fc3ef57e7dda3f68901a.patch)
> - [[C] Handle comma operator for implicit int->enum conversions](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/b59ab701e94cce455a53358cbe5082a3efb58fbf.patch)
> - [[Clang] Fix a regression introduced by #140073 (#140288)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/efa28338d858e1ea2bf705d50a0404bc602c8fe1.patch)
> - [[Clang] Fix an assertion in the resolution of perfect matches](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/090f46d8d246762401c41c5486dde299382d6c90.patch)
> - [[Clang] Preserve CXXParenListInitExpr in TreeTransform.](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/13926e149081ca2771bdea7c08c07d92d87f7818.patch)
> - [[Clang] Separate implicit int conversion on negation sign to](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/0ada5c7b1a52afb668bc42dd2d5573e5805433d1.patch)
> - [[CodeGenPrepare] Make sure instruction get from SunkAddrs is](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/59c6d70ed8120b8864e5f796e2bf3de5518a0ef0.patch)
> - [[ELF,RISCV] Fix oscillation due to call relaxation](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/8957e64a20fc7f4277565c6cfe3e555c119783ce.patch)
> - [[ELF] Postpone ASSERT error](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/5859863bab7fb1cd98b6028293cba6ba25f7d514.patch)
> - [[HWASan] fix missing BTI attribute on personality function](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/a76cf062a57097ad7971325551854bd5f3d38d94.patch)
> - [[LoopIdiomVectorize] Fix FindFirstByte successors (#156945)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/2873d9fac550a248a1c62cfc106ac31fef84fa7d.patch)
> - [[clang] Fix nondeterminism in MemberPointerType (#137910)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/0764f65a7fb06703610b33a86ca79025fa4050a4.patch)
> - [[clang] Remove hasValue() check in](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/8b3d4bdf8bade1d1faa8ff3fcbdda7060f8b46d8-v2.patch)
> - [[clang][ARM] Fix setting of MaxAtomicInlineWidth. (#151404)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/558277ae4db665bea93686db4b4538c1c2c0cf4d.patch)
> - [[compiler-rt][AArch64] Don't use x18 in __arm_sme_save](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/149f91bad66972ad8bf0add5c79bf74055f6905a.patch)
> - [[libc++] Add tests for the ABI break introduced by switching](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/2a83cf5d0e592890f74c5d5ff4a30ae4cf54b61b.patch)
> - [[libc++] Expand Android libc++ test config files (#142846)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/13fe07d670e8a115929c9e595c4490ef5c75f583.patch)
> - [[libc++] Fix ABI break introduced by switching to](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/f5e687d7bf49cd9fe38ba7acdeb52d4f30468dee.patch)
> - [[libc++] Fix padding calculation for function reference types](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/769c42f4a552a75c8c38870ddc1b50d2ea874e4e.patch)
> - [[libc][bazel] Re-enable memcpy prefetching on x86. (#138945)](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/cherry/98d26b8f67e6abdac24591138f07dc34e7f0e36e.patch)
> - [compiler-rt-Allow-finding-LLVMConfig-if-CMAKE_FIND_ROOT_PATH_MODE_PACKAGE-is-set-to-ONLY.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/compiler-rt-Allow-finding-LLVMConfig-if-CMAKE_FIND_ROOT_PATH_MODE_PACKAGE-is-set-to-ONLY.patch)
> - [move-cxa-demangle-into-libcxxdemangle.patch](https://android.googlesource.com/toolchain/llvm_android/+/7f66b813d8e47b8e2be564f40dd9de357380a640/patches/move-cxa-demangle-into-libcxxdemangle.patch)
\ No newline at end of file
Binary files android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/LLVMPolly.so and android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/LLVMPolly.so differ
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_cuda_intrinsics.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_cuda_intrinsics.h
518c518
< return __nvvm_redux_sync_add(__mask, __value);
---
> return __nvvm_redux_sync_add(__value, __mask);
522c522
< return __nvvm_redux_sync_umin(__mask, __value);
---
> return __nvvm_redux_sync_umin(__value, __mask);
526c526
< return __nvvm_redux_sync_umax(__mask, __value);
---
> return __nvvm_redux_sync_umax(__value, __mask);
529c529
< return __nvvm_redux_sync_min(__mask, __value);
---
> return __nvvm_redux_sync_min(__value, __mask);
532c532
< return __nvvm_redux_sync_max(__mask, __value);
---
> return __nvvm_redux_sync_max(__value, __mask);
535c535
< return __nvvm_redux_sync_or(__mask, __value);
---
> return __nvvm_redux_sync_or(__value, __mask);
539c539
< return __nvvm_redux_sync_and(__mask, __value);
---
> return __nvvm_redux_sync_and(__value, __mask);
543c543
< return __nvvm_redux_sync_xor(__mask, __value);
---
> return __nvvm_redux_sync_xor(__value, __mask);
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_cuda_runtime_wrapper.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_cuda_runtime_wrapper.h
386a387
> #include "surface_indirect_functions.h"
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_cuda_texture_intrinsics.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_cuda_texture_intrinsics.h
30a31
> #pragma push_macro("__OP_TYPE_SURFACE")
47a49,105
> #pragma push_macro("__SURF_WRITE_V2")
> #pragma push_macro("__SW_ASM_ARGS")
> #pragma push_macro("__SW_ASM_ARGS1")
> #pragma push_macro("__SW_ASM_ARGS2")
> #pragma push_macro("__SW_ASM_ARGS4")
> #pragma push_macro("__SURF_WRITE_V2")
> #pragma push_macro("__SURF_READ_V2")
> #pragma push_macro("__SW_ASM_ARGS")
> #pragma push_macro("__SW_ASM_ARGS1")
> #pragma push_macro("__SW_ASM_ARGS2")
> #pragma push_macro("__SW_ASM_ARGS4")
> #pragma push_macro("__SURF_READ1D");
> #pragma push_macro("__SURF_READ2D");
> #pragma push_macro("__SURF_READ3D");
> #pragma push_macro("__SURF_READ1DLAYERED");
> #pragma push_macro("__SURF_READ2DLAYERED");
> #pragma push_macro("__SURF_READCUBEMAP");
> #pragma push_macro("__SURF_READCUBEMAPLAYERED");
> #pragma push_macro("__1DV1");
> #pragma push_macro("__1DV2");
> #pragma push_macro("__1DV4");
> #pragma push_macro("__2DV1");
> #pragma push_macro("__2DV2");
> #pragma push_macro("__2DV4");
> #pragma push_macro("__1DLAYERV1");
> #pragma push_macro("__1DLAYERV2");
> #pragma push_macro("__1DLAYERV4");
> #pragma push_macro("__3DV1");
> #pragma push_macro("__3DV2");
> #pragma push_macro("__3DV4");
> #pragma push_macro("__2DLAYERV1");
> #pragma push_macro("__2DLAYERV2");
> #pragma push_macro("__2DLAYERV4");
> #pragma push_macro("__CUBEMAPV1");
> #pragma push_macro("__CUBEMAPV2");
> #pragma push_macro("__CUBEMAPV4");
> #pragma push_macro("__CUBEMAPLAYERV1");
> #pragma push_macro("__CUBEMAPLAYERV2");
> #pragma push_macro("__CUBEMAPLAYERV4");
> #pragma push_macro("__SURF_READXD_ALL");
> #pragma push_macro("__SURF_WRITE1D_V2");
> #pragma push_macro("__SURF_WRITE1DLAYERED_V2");
> #pragma push_macro("__SURF_WRITE2D_V2");
> #pragma push_macro("__SURF_WRITE2DLAYERED_V2");
> #pragma push_macro("__SURF_WRITE3D_V2");
> #pragma push_macro("__SURF_CUBEMAPWRITE_V2");
> #pragma push_macro("__SURF_CUBEMAPLAYEREDWRITE_V2");
> #pragma push_macro("__SURF_WRITEXD_V2_ALL");
> #pragma push_macro("__1DV1");
> #pragma push_macro("__1DV2");
> #pragma push_macro("__1DV4");
> #pragma push_macro("__2DV1");
> #pragma push_macro("__2DV2");
> #pragma push_macro("__2DV4");
> #pragma push_macro("__3DV1");
> #pragma push_macro("__3DV2");
> #pragma push_macro("__3DV4");
188a247,261
> // Tag structs to distinguish operation types
> struct __texture_op_tag {};
> struct __surface_op_tag {};
>
> // Template specialization to determine operation type based on tag value
> template <class __op> struct __op_type_traits {
> using type = __texture_op_tag;
> };
>
> // Specialize for known surface operation tags
> #define __OP_TYPE_SURFACE(__op) \
> template <> struct __op_type_traits<__op> { \
> using type = __surface_op_tag; \
> }
>
651a725,1020
> // There are a couple of layers here. First, __op_type_traits is used to
> // dispatch to either surface write calls, or to the texture read calls.
> //
> // Then, that dispatches to __tex_fetch_impl below, which dispatches by both tag
> // and datatype to the appropriate
> // __surf_read_write_v2.
> // TODO(austin): Do the reads too.
>
> // Mark which of the ids we should be dispatching to surface write calls.
> __OP_TYPE_SURFACE(__ID("__isurf1Dread"));
> __OP_TYPE_SURFACE(__ID("__isurf2Dread"));
> __OP_TYPE_SURFACE(__ID("__isurf3Dread"));
> __OP_TYPE_SURFACE(__ID("__isurf1DLayeredread"));
> __OP_TYPE_SURFACE(__ID("__isurf2DLayeredread"));
> __OP_TYPE_SURFACE(__ID("__isurfCubemapread"));
> __OP_TYPE_SURFACE(__ID("__isurfCubemapLayeredread"));
> __OP_TYPE_SURFACE(__ID("__isurf1Dwrite_v2"));
> __OP_TYPE_SURFACE(__ID("__isurf2Dwrite_v2"));
> __OP_TYPE_SURFACE(__ID("__isurf3Dwrite_v2"));
> __OP_TYPE_SURFACE(__ID("__isurf1DLayeredwrite_v2"));
> __OP_TYPE_SURFACE(__ID("__isurf2DLayeredwrite_v2"));
> __OP_TYPE_SURFACE(__ID("__isurfCubemapwrite_v2"));
> __OP_TYPE_SURFACE(__ID("__isurfCubemapLayeredwrite_v2"));
>
> template <class __op, typename __type> struct __surf_read_write_v2;
>
> // For the various write calls, we need to be able to generate variations with
> // different IDs, different numbers of arguments, and different numbers of
> // outputs.
>
> #define __SURF_WRITE_V2(__op, __asm_dim, __asmtype, __type, __index_op_args, \
> __index_args, __index_asm_args, __asm_op_args, \
> __asm_args) \
> template <> struct __surf_read_write_v2<__op, __type> { \
> static __device__ void __run(__type *__ptr, cudaSurfaceObject_t obj, \
> __L(__index_args), \
> cudaSurfaceBoundaryMode mode) { \
> switch (mode) { \
> case cudaBoundaryModeZero: \
> asm volatile("sust.b." __asm_dim "." __asmtype \
> ".zero [%0, " __index_op_args "], " __asm_op_args ";" \
> : \
> : "l"(obj), __L(__index_asm_args), __L(__asm_args)); \
> break; \
> case cudaBoundaryModeClamp: \
> asm volatile("sust.b." __asm_dim "." __asmtype \
> ".clamp [%0, " __index_op_args "], " __asm_op_args ";" \
> : \
> : "l"(obj), __L(__index_asm_args), __L(__asm_args)); \
> break; \
> case cudaBoundaryModeTrap: \
> asm volatile("sust.b." __asm_dim "." __asmtype \
> ".trap [%0, " __index_op_args "], " __asm_op_args ";" \
> : \
> : "l"(obj), __L(__index_asm_args), __L(__asm_args)); \
> break; \
> } \
> } \
> }
>
> #define __SURF_READ_V2(__op, __asm_dim, __asmtype, __type, __asm_op_args, \
> __asm_args, __index_args, __index_asm_args) \
> template <> struct __surf_read_write_v2<__op, __type> { \
> static __device__ void __run(__type *__ptr, cudaSurfaceObject_t obj, \
> __L(__index_args), \
> cudaSurfaceBoundaryMode mode) { \
> switch (mode) { \
> case cudaBoundaryModeZero: \
> asm("suld.b." __asm_dim "." __asmtype ".zero " __asm_op_args ";" \
> : __L(__asm_args) \
> : "l"(obj), __L(__index_asm_args)); \
> break; \
> case cudaBoundaryModeClamp: \
> asm("suld.b." __asm_dim "." __asmtype ".clamp " __asm_op_args ";" \
> : __L(__asm_args) \
> : "l"(obj), __L(__index_asm_args)); \
> break; \
> case cudaBoundaryModeTrap: \
> asm("suld.b." __asm_dim "." __asmtype ".trap " __asm_op_args ";" \
> : __L(__asm_args) \
> : "l"(obj), __L(__index_asm_args)); \
> break; \
> } \
> } \
> }
>
> // Amazing, the read side should follow the same flow, I just need to change the
> // generated assembly calls, and the rest should fall in line.
>
> #define __SW_ASM_ARGS(__type) (__type(*__ptr))
> #define __SW_ASM_ARGS1(__type) (__type(__ptr->x))
> #define __SW_ASM_ARGS2(__type) (__type(__ptr->x), __type(__ptr->y))
> #define __SW_ASM_ARGS4(__type) \
> (__type(__ptr->x), __type(__ptr->y), __type(__ptr->z), __type(__ptr->w))
>
> #define __SURF_READ1D(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_READ_V2(__ID("__isurf1Dread"), "1d", __asmtype, __type, \
> __asm_op_args, __asm_args, (int x), ("r"(x)))
> #define __SURF_READ2D(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_READ_V2(__ID("__isurf2Dread"), "2d", __asmtype, __type, \
> __asm_op_args, __asm_args, (int x, int y), ("r"(x), "r"(y)))
> #define __SURF_READ3D(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_READ_V2(__ID("__isurf3Dread"), "3d", __asmtype, __type, \
> __asm_op_args, __asm_args, (int x, int y, int z), \
> ("r"(x), "r"(y), "r"(z)))
>
> #define __SURF_READ1DLAYERED(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_READ_V2(__ID("__isurf1DLayeredread"), "a1d", __asmtype, __type, \
> __asm_op_args, __asm_args, (int x, int layer), \
> ("r"(x), "r"(layer)))
> #define __SURF_READ2DLAYERED(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_READ_V2(__ID("__isurf2DLayeredread"), "a2d", __asmtype, __type, \
> __asm_op_args, __asm_args, (int x, int y, int layer), \
> ("r"(x), "r"(y), "r"(layer)))
> #define __SURF_READCUBEMAP(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_READ_V2(__ID("__isurfCubemapread"), "a2d", __asmtype, __type, \
> __asm_op_args, __asm_args, (int x, int y, int face), \
> ("r"(x), "r"(y), "r"(face)))
> #define __SURF_READCUBEMAPLAYERED(__asmtype, __type, __asm_op_args, \
> __asm_args) \
> __SURF_READ_V2(__ID("__isurfCubemapLayeredread"), "a2d", __asmtype, __type, \
> __asm_op_args, __asm_args, (int x, int y, int layerface), \
> ("r"(x), "r"(y), "r"(layerface)))
>
> #define __1DV1 "{%0}, [%1, {%2}]"
> #define __1DV2 "{%0, %1}, [%2, {%3}]"
> #define __1DV4 "{%0, %1, %2, %3}, [%4, {%5}]"
>
> #define __2DV1 "{%0}, [%1, {%2, %3}]"
> #define __2DV2 "{%0, %1}, [%2, {%3, %4}]"
> #define __2DV4 "{%0, %1, %2, %3}, [%4, {%5, %6}]"
>
> #define __1DLAYERV1 "{%0}, [%1, {%3, %2}]"
> #define __1DLAYERV2 "{%0, %1}, [%2, {%4, %3}]"
> #define __1DLAYERV4 "{%0, %1, %2, %3}, [%4, {%6, %5}]"
>
> #define __3DV1 "{%0}, [%1, {%2, %3, %4, %4}]"
> #define __3DV2 "{%0, %1}, [%2, {%3, %4, %5, %5}]"
> #define __3DV4 "{%0, %1, %2, %3}, [%4, {%5, %6, %7, %7}]"
>
> #define __2DLAYERV1 "{%0}, [%1, {%4, %2, %3, %3}]"
> #define __2DLAYERV2 "{%0, %1}, [%2, {%5, %3, %4, %4}]"
> #define __2DLAYERV4 "{%0, %1, %2, %3}, [%4, {%7, %5, %6, %6}]"
>
> #define __CUBEMAPV1 "{%0}, [%1, {%4, %2, %3, %3}]"
> #define __CUBEMAPV2 "{%0, %1}, [%2, {%5, %3, %4, %4}]"
> #define __CUBEMAPV4 "{%0, %1, %2, %3}, [%4, {%7, %5, %6, %6}]"
>
> #define __CUBEMAPLAYERV1 "{%0}, [%1, {%4, %2, %3, %3}]"
> #define __CUBEMAPLAYERV2 "{%0, %1}, [%2, {%5, %3, %4, %4}]"
> #define __CUBEMAPLAYERV4 "{%0, %1, %2, %3}, [%4, {%7, %5, %6, %6}]"
>
> #define __SURF_READXD_ALL(__xdv1, __xdv2, __xdv4, __surf_readxd_v2) \
> __surf_readxd_v2("b8", char, __xdv1, __SW_ASM_ARGS("=h")); \
> __surf_readxd_v2("b8", signed char, __xdv1, __SW_ASM_ARGS("=h")); \
> __surf_readxd_v2("b8", char1, __xdv1, __SW_ASM_ARGS1("=h")); \
> __surf_readxd_v2("b8", unsigned char, __xdv1, __SW_ASM_ARGS("=h")); \
> __surf_readxd_v2("b8", uchar1, __xdv1, __SW_ASM_ARGS1("=h")); \
> __surf_readxd_v2("b16", short, __xdv1, __SW_ASM_ARGS("=h")); \
> __surf_readxd_v2("b16", short1, __xdv1, __SW_ASM_ARGS1("=h")); \
> __surf_readxd_v2("b16", unsigned short, __xdv1, __SW_ASM_ARGS("=h")); \
> __surf_readxd_v2("b16", ushort1, __xdv1, __SW_ASM_ARGS1("=h")); \
> __surf_readxd_v2("b32", int, __xdv1, __SW_ASM_ARGS("=r")); \
> __surf_readxd_v2("b32", int1, __xdv1, __SW_ASM_ARGS1("=r")); \
> __surf_readxd_v2("b32", unsigned int, __xdv1, __SW_ASM_ARGS("=r")); \
> __surf_readxd_v2("b32", uint1, __xdv1, __SW_ASM_ARGS1("=r")); \
> __surf_readxd_v2("b64", long long, __xdv1, __SW_ASM_ARGS("=l")); \
> __surf_readxd_v2("b64", longlong1, __xdv1, __SW_ASM_ARGS1("=l")); \
> __surf_readxd_v2("b64", unsigned long long, __xdv1, __SW_ASM_ARGS("=l")); \
> __surf_readxd_v2("b64", ulonglong1, __xdv1, __SW_ASM_ARGS1("=l")); \
> __surf_readxd_v2("b32", float, __xdv1, __SW_ASM_ARGS("=r")); \
> __surf_readxd_v2("b32", float1, __xdv1, __SW_ASM_ARGS1("=r")); \
> \
> __surf_readxd_v2("v2.b8", char2, __xdv2, __SW_ASM_ARGS2("=h")); \
> __surf_readxd_v2("v2.b8", uchar2, __xdv2, __SW_ASM_ARGS2("=h")); \
> __surf_readxd_v2("v2.b16", short2, __xdv2, __SW_ASM_ARGS2("=h")); \
> __surf_readxd_v2("v2.b16", ushort2, __xdv2, __SW_ASM_ARGS2("=h")); \
> __surf_readxd_v2("v2.b32", int2, __xdv2, __SW_ASM_ARGS2("=r")); \
> __surf_readxd_v2("v2.b32", uint2, __xdv2, __SW_ASM_ARGS2("=r")); \
> __surf_readxd_v2("v2.b64", longlong2, __xdv2, __SW_ASM_ARGS2("=l")); \
> __surf_readxd_v2("v2.b64", ulonglong2, __xdv2, __SW_ASM_ARGS2("=l")); \
> __surf_readxd_v2("v2.b32", float2, __xdv2, __SW_ASM_ARGS2("=r")); \
> \
> __surf_readxd_v2("v4.b8", char4, __xdv4, __SW_ASM_ARGS4("=h")); \
> __surf_readxd_v2("v4.b8", uchar4, __xdv4, __SW_ASM_ARGS4("=h")); \
> __surf_readxd_v2("v4.b16", short4, __xdv4, __SW_ASM_ARGS4("=h")); \
> __surf_readxd_v2("v4.b16", ushort4, __xdv4, __SW_ASM_ARGS4("=h")); \
> __surf_readxd_v2("v4.b32", int4, __xdv4, __SW_ASM_ARGS4("=r")); \
> __surf_readxd_v2("v4.b32", uint4, __xdv4, __SW_ASM_ARGS4("=r")); \
> __surf_readxd_v2("v4.b32", float4, __xdv4, __SW_ASM_ARGS4("=r"))
>
> __SURF_READXD_ALL(__1DV1, __1DV2, __1DV4, __SURF_READ1D);
> __SURF_READXD_ALL(__2DV1, __2DV2, __2DV4, __SURF_READ2D);
> __SURF_READXD_ALL(__3DV1, __3DV2, __3DV4, __SURF_READ3D);
> __SURF_READXD_ALL(__1DLAYERV1, __1DLAYERV2, __1DLAYERV4, __SURF_READ1DLAYERED);
> __SURF_READXD_ALL(__2DLAYERV1, __2DLAYERV2, __2DLAYERV4, __SURF_READ2DLAYERED);
> __SURF_READXD_ALL(__CUBEMAPV1, __CUBEMAPV2, __CUBEMAPV4, __SURF_READCUBEMAP);
> __SURF_READXD_ALL(__CUBEMAPLAYERV1, __CUBEMAPLAYERV2, __CUBEMAPLAYERV4,
> __SURF_READCUBEMAPLAYERED);
>
> #define __SURF_WRITE1D_V2(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_WRITE_V2(__ID("__isurf1Dwrite_v2"), "1d", __asmtype, __type, "{%1}", \
> (int x), ("r"(x)), __asm_op_args, __asm_args)
> #define __SURF_WRITE1DLAYERED_V2(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_WRITE_V2(__ID("__isurf1DLayeredwrite_v2"), "a1d", __asmtype, __type, \
> "{%2, %1}", (int x, int layer), ("r"(x), "r"(layer)), \
> __asm_op_args, __asm_args)
> #define __SURF_WRITE2D_V2(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_WRITE_V2(__ID("__isurf2Dwrite_v2"), "2d", __asmtype, __type, \
> "{%1, %2}", (int x, int y), ("r"(x), "r"(y)), __asm_op_args, \
> __asm_args)
> #define __SURF_WRITE2DLAYERED_V2(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_WRITE_V2(__ID("__isurf2DLayeredwrite_v2"), "a2d", __asmtype, __type, \
> "{%3, %1, %2, %2}", (int x, int y, int layer), \
> ("r"(x), "r"(y), "r"(layer)), __asm_op_args, __asm_args)
> #define __SURF_WRITE3D_V2(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_WRITE_V2(__ID("__isurf3Dwrite_v2"), "3d", __asmtype, __type, \
> "{%1, %2, %3, %3}", (int x, int y, int z), \
> ("r"(x), "r"(y), "r"(z)), __asm_op_args, __asm_args)
>
> #define __SURF_CUBEMAPWRITE_V2(__asmtype, __type, __asm_op_args, __asm_args) \
> __SURF_WRITE_V2(__ID("__isurfCubemapwrite_v2"), "a2d", __asmtype, __type, \
> "{%3, %1, %2, %2}", (int x, int y, int face), \
> ("r"(x), "r"(y), "r"(face)), __asm_op_args, __asm_args)
> #define __SURF_CUBEMAPLAYEREDWRITE_V2(__asmtype, __type, __asm_op_args, \
> __asm_args) \
> __SURF_WRITE_V2(__ID("__isurfCubemapLayeredwrite_v2"), "a2d", __asmtype, \
> __type, "{%3, %1, %2, %2}", (int x, int y, int layerface), \
> ("r"(x), "r"(y), "r"(layerface)), __asm_op_args, __asm_args)
>
> #define __SURF_WRITEXD_V2_ALL(__xdv1, __xdv2, __xdv4, __surf_writexd_v2) \
> __surf_writexd_v2("b8", char, __xdv1, __SW_ASM_ARGS("h")); \
> __surf_writexd_v2("b8", signed char, __xdv1, __SW_ASM_ARGS("h")); \
> __surf_writexd_v2("b8", char1, __xdv1, __SW_ASM_ARGS1("h")); \
> __surf_writexd_v2("b8", unsigned char, __xdv1, __SW_ASM_ARGS("h")); \
> __surf_writexd_v2("b8", uchar1, __xdv1, __SW_ASM_ARGS1("h")); \
> __surf_writexd_v2("b16", short, __xdv1, __SW_ASM_ARGS("h")); \
> __surf_writexd_v2("b16", short1, __xdv1, __SW_ASM_ARGS1("h")); \
> __surf_writexd_v2("b16", unsigned short, __xdv1, __SW_ASM_ARGS("h")); \
> __surf_writexd_v2("b16", ushort1, __xdv1, __SW_ASM_ARGS1("h")); \
> __surf_writexd_v2("b32", int, __xdv1, __SW_ASM_ARGS("r")); \
> __surf_writexd_v2("b32", int1, __xdv1, __SW_ASM_ARGS1("r")); \
> __surf_writexd_v2("b32", unsigned int, __xdv1, __SW_ASM_ARGS("r")); \
> __surf_writexd_v2("b32", uint1, __xdv1, __SW_ASM_ARGS1("r")); \
> __surf_writexd_v2("b64", long long, __xdv1, __SW_ASM_ARGS("l")); \
> __surf_writexd_v2("b64", longlong1, __xdv1, __SW_ASM_ARGS1("l")); \
> __surf_writexd_v2("b64", unsigned long long, __xdv1, __SW_ASM_ARGS("l")); \
> __surf_writexd_v2("b64", ulonglong1, __xdv1, __SW_ASM_ARGS1("l")); \
> __surf_writexd_v2("b32", float, __xdv1, __SW_ASM_ARGS("r")); \
> __surf_writexd_v2("b32", float1, __xdv1, __SW_ASM_ARGS1("r")); \
> \
> __surf_writexd_v2("v2.b8", char2, __xdv2, __SW_ASM_ARGS2("h")); \
> __surf_writexd_v2("v2.b8", uchar2, __xdv2, __SW_ASM_ARGS2("h")); \
> __surf_writexd_v2("v2.b16", short2, __xdv2, __SW_ASM_ARGS2("h")); \
> __surf_writexd_v2("v2.b16", ushort2, __xdv2, __SW_ASM_ARGS2("h")); \
> __surf_writexd_v2("v2.b32", int2, __xdv2, __SW_ASM_ARGS2("r")); \
> __surf_writexd_v2("v2.b32", uint2, __xdv2, __SW_ASM_ARGS2("r")); \
> __surf_writexd_v2("v2.b64", longlong2, __xdv2, __SW_ASM_ARGS2("l")); \
> __surf_writexd_v2("v2.b64", ulonglong2, __xdv2, __SW_ASM_ARGS2("l")); \
> __surf_writexd_v2("v2.b32", float2, __xdv2, __SW_ASM_ARGS2("r")); \
> \
> __surf_writexd_v2("v4.b8", char4, __xdv4, __SW_ASM_ARGS4("h")); \
> __surf_writexd_v2("v4.b8", uchar4, __xdv4, __SW_ASM_ARGS4("h")); \
> __surf_writexd_v2("v4.b16", short4, __xdv4, __SW_ASM_ARGS4("h")); \
> __surf_writexd_v2("v4.b16", ushort4, __xdv4, __SW_ASM_ARGS4("h")); \
> __surf_writexd_v2("v4.b32", int4, __xdv4, __SW_ASM_ARGS4("r")); \
> __surf_writexd_v2("v4.b32", uint4, __xdv4, __SW_ASM_ARGS4("r")); \
> __surf_writexd_v2("v4.b32", float4, __xdv4, __SW_ASM_ARGS4("r"))
>
> #define __1DV1 "{%2}"
> #define __1DV2 "{%2, %3}"
> #define __1DV4 "{%2, %3, %4, %5}"
>
> #define __2DV1 "{%3}"
> #define __2DV2 "{%3, %4}"
> #define __2DV4 "{%3, %4, %5, %6}"
>
> #define __3DV1 "{%4}"
> #define __3DV2 "{%4, %5}"
> #define __3DV4 "{%4, %5, %6, %7}"
>
> __SURF_WRITEXD_V2_ALL(__1DV1, __1DV2, __1DV4, __SURF_WRITE1D_V2);
> __SURF_WRITEXD_V2_ALL(__2DV1, __2DV2, __2DV4, __SURF_WRITE2D_V2);
> __SURF_WRITEXD_V2_ALL(__3DV1, __3DV2, __3DV4, __SURF_WRITE3D_V2);
> __SURF_WRITEXD_V2_ALL(__2DV1, __2DV2, __2DV4, __SURF_WRITE1DLAYERED_V2);
> __SURF_WRITEXD_V2_ALL(__3DV1, __3DV2, __3DV4, __SURF_WRITE2DLAYERED_V2);
> __SURF_WRITEXD_V2_ALL(__3DV1, __3DV2, __3DV4, __SURF_CUBEMAPWRITE_V2);
> __SURF_WRITEXD_V2_ALL(__3DV1, __3DV2, __3DV4, __SURF_CUBEMAPLAYEREDWRITE_V2);
>
> template <class __op, class __DataT, class... __Args>
> __device__ static void __tex_fetch_impl(__surface_op_tag, __DataT *__ptr,
> cudaSurfaceObject_t __handle,
> __Args... __args) {
> __surf_read_write_v2<__op, __DataT>::__run(__ptr, __handle, __args...);
> }
>
662,663c1031,1033
< __device__ static void __tex_fetch(__T *__ptr, cudaTextureObject_t __handle,
< __Args... __args) {
---
> __device__ static void __tex_fetch_impl(__texture_op_tag, __T *__ptr,
> cudaTextureObject_t __handle,
> __Args... __args) {
668a1039,1045
> template <class __op, class __T, class... __Args>
> __device__ static void __tex_fetch(__T *__ptr, cudaTextureObject_t __handle,
> __Args... __args) {
> using op_type = typename __op_type_traits<__op>::type;
> __tex_fetch_impl<__op>(op_type{}, __ptr, __handle, __args...);
> }
>
724a1102
> #pragma pop_macro("__OP_TYPE_SURFACE")
741a1120,1176
> #pragma pop_macro("__SURF_WRITE_V2")
> #pragma pop_macro("__SW_ASM_ARGS")
> #pragma pop_macro("__SW_ASM_ARGS1")
> #pragma pop_macro("__SW_ASM_ARGS2")
> #pragma pop_macro("__SW_ASM_ARGS4")
> #pragma pop_macro("__SURF_WRITE_V2")
> #pragma pop_macro("__SURF_READ_V2")
> #pragma pop_macro("__SW_ASM_ARGS")
> #pragma pop_macro("__SW_ASM_ARGS1")
> #pragma pop_macro("__SW_ASM_ARGS2")
> #pragma pop_macro("__SW_ASM_ARGS4")
> #pragma pop_macro("__SURF_READ1D");
> #pragma pop_macro("__SURF_READ2D");
> #pragma pop_macro("__SURF_READ3D");
> #pragma pop_macro("__SURF_READ1DLAYERED");
> #pragma pop_macro("__SURF_READ2DLAYERED");
> #pragma pop_macro("__SURF_READCUBEMAP");
> #pragma pop_macro("__SURF_READCUBEMAPLAYERED");
> #pragma pop_macro("__1DV1");
> #pragma pop_macro("__1DV2");
> #pragma pop_macro("__1DV4");
> #pragma pop_macro("__2DV1");
> #pragma pop_macro("__2DV2");
> #pragma pop_macro("__2DV4");
> #pragma pop_macro("__1DLAYERV1");
> #pragma pop_macro("__1DLAYERV2");
> #pragma pop_macro("__1DLAYERV4");
> #pragma pop_macro("__3DV1");
> #pragma pop_macro("__3DV2");
> #pragma pop_macro("__3DV4");
> #pragma pop_macro("__2DLAYERV1");
> #pragma pop_macro("__2DLAYERV2");
> #pragma pop_macro("__2DLAYERV4");
> #pragma pop_macro("__CUBEMAPV1");
> #pragma pop_macro("__CUBEMAPV2");
> #pragma pop_macro("__CUBEMAPV4");
> #pragma pop_macro("__CUBEMAPLAYERV1");
> #pragma pop_macro("__CUBEMAPLAYERV2");
> #pragma pop_macro("__CUBEMAPLAYERV4");
> #pragma pop_macro("__SURF_READXD_ALL");
> #pragma pop_macro("__SURF_WRITE1D_V2");
> #pragma pop_macro("__SURF_WRITE1DLAYERED_V2");
> #pragma pop_macro("__SURF_WRITE2D_V2");
> #pragma pop_macro("__SURF_WRITE2DLAYERED_V2");
> #pragma pop_macro("__SURF_WRITE3D_V2");
> #pragma pop_macro("__SURF_CUBEMAPWRITE_V2");
> #pragma pop_macro("__SURF_CUBEMAPLAYEREDWRITE_V2");
> #pragma pop_macro("__SURF_WRITEXD_V2_ALL");
> #pragma pop_macro("__1DV1");
> #pragma pop_macro("__1DV2");
> #pragma pop_macro("__1DV4");
> #pragma pop_macro("__2DV1");
> #pragma pop_macro("__2DV2");
> #pragma pop_macro("__2DV4");
> #pragma pop_macro("__3DV1");
> #pragma pop_macro("__3DV2");
> #pragma pop_macro("__3DV4");
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_hip_libdevice_declares.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_hip_libdevice_declares.h
16a17,18
> #define __PRIVATE_AS __attribute__((opencl_private))
>
58,59c60
< __device__ float __ocml_frexp_f32(float,
< __attribute__((address_space(5))) int *);
---
> __device__ float __ocml_frexp_f32(float, __PRIVATE_AS int *);
77,78c78
< __device__ float __ocml_modf_f32(float,
< __attribute__((address_space(5))) float *);
---
> __device__ float __ocml_modf_f32(float, __PRIVATE_AS float *);
90,91c90
< __device__ float __ocml_remquo_f32(float, float,
< __attribute__((address_space(5))) int *);
---
> __device__ float __ocml_remquo_f32(float, float, __PRIVATE_AS int *);
102,105c101,102
< __device__ float __ocml_sincos_f32(float,
< __attribute__((address_space(5))) float *);
< __device__ float __ocml_sincospi_f32(float,
< __attribute__((address_space(5))) float *);
---
> __device__ float __ocml_sincos_f32(float, __PRIVATE_AS float *);
> __device__ float __ocml_sincospi_f32(float, __PRIVATE_AS float *);
179,180c176
< __device__ double __ocml_frexp_f64(double,
< __attribute__((address_space(5))) int *);
---
> __device__ double __ocml_frexp_f64(double, __PRIVATE_AS int *);
195,196c191
< __device__ double __ocml_modf_f64(double,
< __attribute__((address_space(5))) double *);
---
> __device__ double __ocml_modf_f64(double, __PRIVATE_AS double *);
209,210c204
< __device__ double __ocml_remquo_f64(double, double,
< __attribute__((address_space(5))) int *);
---
> __device__ double __ocml_remquo_f64(double, double, __PRIVATE_AS int *);
222,225c216,217
< __device__ double __ocml_sincos_f64(double,
< __attribute__((address_space(5))) double *);
< __device__ double
< __ocml_sincospi_f64(double, __attribute__((address_space(5))) double *);
---
> __device__ double __ocml_sincos_f64(double, __PRIVATE_AS double *);
> __device__ double __ocml_sincospi_f64(double, __PRIVATE_AS double *);
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_hip_math.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_hip_math.h
35a36,38
> #pragma push_macro("__PRIVATE_AS")
>
> #define __PRIVATE_AS __attribute__((opencl_private))
392c395
< float exp10f(float __x) { return __ocml_exp10_f32(__x); }
---
> float exp10f(float __x) { return __builtin_exp10f(__x); }
495c498
< float log2f(float __x) { return __FAST_OR_SLOW(__log2f, __ocml_log2_f32)(__x); }
---
> float log2f(float __x) { return __FAST_OR_SLOW(__log2f, __builtin_log2f)(__x); }
501c504
< float logf(float __x) { return __FAST_OR_SLOW(__logf, __ocml_log_f32)(__x); }
---
> float logf(float __x) { return __FAST_OR_SLOW(__logf, __builtin_logf)(__x); }
515,516c518
< float __r =
< __ocml_modf_f32(__x, (__attribute__((address_space(5))) float *)&__tmp);
---
> float __r = __ocml_modf_f32(__x, (__PRIVATE_AS float *)&__tmp);
598,599c600
< float __r = __ocml_remquo_f32(
< __x, __y, (__attribute__((address_space(5))) int *)&__tmp);
---
> float __r = __ocml_remquo_f32(__x, __y, (__PRIVATE_AS int *)&__tmp);
641,642c642,646
< return (__n < INT_MAX) ? __builtin_amdgcn_ldexpf(__x, __n)
< : __ocml_scalb_f32(__x, __n);
---
> if (__n > INT_MAX)
> __n = INT_MAX;
> else if (__n < INT_MIN)
> __n = INT_MIN;
> return __builtin_ldexpf(__x, (int)__n);
660,661c664
< *__sinptr =
< __ocml_sincos_f32(__x, (__attribute__((address_space(5))) float *)&__tmp);
---
> *__sinptr = __ocml_sincos_f32(__x, (__PRIVATE_AS float *)&__tmp);
672,673c675
< *__sinptr = __ocml_sincospi_f32(
< __x, (__attribute__((address_space(5))) float *)&__tmp);
---
> *__sinptr = __ocml_sincospi_f32(__x, (__PRIVATE_AS float *)&__tmp);
916,917c918
< double __r =
< __ocml_modf_f64(__x, (__attribute__((address_space(5))) double *)&__tmp);
---
> double __r = __ocml_modf_f64(__x, (__PRIVATE_AS double *)&__tmp);
1007,1008c1008
< double __r = __ocml_remquo_f64(
< __x, __y, (__attribute__((address_space(5))) int *)&__tmp);
---
> double __r = __ocml_remquo_f64(__x, __y, (__PRIVATE_AS int *)&__tmp);
1050,1051c1050,1054
< return (__n < INT_MAX) ? __builtin_amdgcn_ldexp(__x, __n)
< : __ocml_scalb_f64(__x, __n);
---
> if (__n > INT_MAX)
> __n = INT_MAX;
> else if (__n < INT_MIN)
> __n = INT_MIN;
> return __builtin_ldexp(__x, (int)__n);
1068,1069c1071
< *__sinptr = __ocml_sincos_f64(
< __x, (__attribute__((address_space(5))) double *)&__tmp);
---
> *__sinptr = __ocml_sincos_f64(__x, (__PRIVATE_AS double *)&__tmp);
1079,1080c1081
< *__sinptr = __ocml_sincospi_f64(
< __x, (__attribute__((address_space(5))) double *)&__tmp);
---
> *__sinptr = __ocml_sincospi_f64(__x, (__PRIVATE_AS double *)&__tmp);
1313,1315c1314,1367
< #if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
< __host__ inline static int min(int __arg1, int __arg2) {
< return __arg1 < __arg2 ? __arg1 : __arg2;
---
> // Define host min/max functions.
> #if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__) && \
> !defined(__HIP_NO_HOST_MIN_MAX_IN_GLOBAL_NAMESPACE__)
>
> // TODO: make this default to 1 after existing HIP apps adopting this change.
> #ifndef __HIP_DEFINE_EXTENDED_HOST_MIN_MAX__
> #define __HIP_DEFINE_EXTENDED_HOST_MIN_MAX__ 0
> #endif
>
> #ifndef __HIP_DEFINE_MIXED_HOST_MIN_MAX__
> #define __HIP_DEFINE_MIXED_HOST_MIN_MAX__ 0
> #endif
>
> #pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
> #pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
> #define DEFINE_MIN_MAX_FUNCTIONS(ret_type, type1, type2) \
> inline ret_type min(const type1 __a, const type2 __b) { \
> return (__a < __b) ? __a : __b; \
> } \
> inline ret_type max(const type1 __a, const type2 __b) { \
> return (__a > __b) ? __a : __b; \
> }
>
> // Define min and max functions for same type comparisons
> DEFINE_MIN_MAX_FUNCTIONS(int, int, int)
>
> #if __HIP_DEFINE_EXTENDED_HOST_MIN_MAX__
> DEFINE_MIN_MAX_FUNCTIONS(unsigned int, unsigned int, unsigned int)
> DEFINE_MIN_MAX_FUNCTIONS(long, long, long)
> DEFINE_MIN_MAX_FUNCTIONS(unsigned long, unsigned long, unsigned long)
> DEFINE_MIN_MAX_FUNCTIONS(long long, long long, long long)
> DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, unsigned long long,
> unsigned long long)
> #endif // if __HIP_DEFINE_EXTENDED_HOST_MIN_MAX__
>
> // The host min/max functions below accept mixed signed/unsigned integer
> // parameters and perform unsigned comparisons, which may produce unexpected
> // results if a signed integer was passed unintentionally. To avoid this
> // happening silently, these overloaded functions are not defined by default.
> // However, for compatibility with CUDA, they will be defined if users define
> // __HIP_DEFINE_MIXED_HOST_MIN_MAX__.
> #if __HIP_DEFINE_MIXED_HOST_MIN_MAX__
> DEFINE_MIN_MAX_FUNCTIONS(unsigned int, int, unsigned int)
> DEFINE_MIN_MAX_FUNCTIONS(unsigned int, unsigned int, int)
> DEFINE_MIN_MAX_FUNCTIONS(unsigned long, long, unsigned long)
> DEFINE_MIN_MAX_FUNCTIONS(unsigned long, unsigned long, long)
> DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, long long, unsigned long long)
> DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, unsigned long long, long long)
> #endif // if __HIP_DEFINE_MIXED_HOST_MIN_MAX__
>
> // Floating-point comparisons using built-in functions
> #if __HIP_DEFINE_EXTENDED_HOST_MIN_MAX__
> inline float min(float const __a, float const __b) {
> return __builtin_fminf(__a, __b);
1316a1369,1377
> inline double min(double const __a, double const __b) {
> return __builtin_fmin(__a, __b);
> }
> inline double min(float const __a, double const __b) {
> return __builtin_fmin(__a, __b);
> }
> inline double min(double const __a, float const __b) {
> return __builtin_fmin(__a, __b);
> }
1318,1319c1379,1380
< __host__ inline static int max(int __arg1, int __arg2) {
< return __arg1 > __arg2 ? __arg1 : __arg2;
---
> inline float max(float const __a, float const __b) {
> return __builtin_fmaxf(__a, __b);
1321c1382,1396
< #endif // !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
---
> inline double max(double const __a, double const __b) {
> return __builtin_fmax(__a, __b);
> }
> inline double max(float const __a, double const __b) {
> return __builtin_fmax(__a, __b);
> }
> inline double max(double const __a, float const __b) {
> return __builtin_fmax(__a, __b);
> }
> #endif // if __HIP_DEFINE_EXTENDED_HOST_MIN_MAX__
>
> #pragma pop_macro("DEFINE_MIN_MAX_FUNCTIONS")
>
> #endif // !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__) &&
> // !defined(__HIP_NO_HOST_MIN_MAX_IN_GLOBAL_NAMESPACE__)
1324a1400
> #pragma pop_macro("__PRIVATE_AS")
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_hip_runtime_wrapper.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__clang_hip_runtime_wrapper.h
127a128
> #pragma push_macro("INT_MIN")
132a134
> #define INT_MIN (-__INT_MAX__ - 1)
156a159
> #pragma pop_macro("INT_MIN")
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__stdarg_va_arg.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/__stdarg_va_arg.h
13,14c13,14
< /* C23 does not require the second parameter for va_start. */
< #define va_start(ap, ...) __builtin_va_start(ap, 0)
---
> /* C23 uses a special builtin. */
> #define va_start(...) __builtin_c23_va_start(__VA_ARGS__)
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/altivec.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/altivec.h
17528c17528
< static inline __ATTRS_o_ai vector bool char vec_reve(vector bool char __a) {
---
> static __inline__ __ATTRS_o_ai vector bool char vec_reve(vector bool char __a) {
17533c17533,17534
< static inline __ATTRS_o_ai vector signed char vec_reve(vector signed char __a) {
---
> static __inline__ __ATTRS_o_ai vector signed char
> vec_reve(vector signed char __a) {
17538c17539
< static inline __ATTRS_o_ai vector unsigned char
---
> static __inline__ __ATTRS_o_ai vector unsigned char
17544c17545
< static inline __ATTRS_o_ai vector bool int vec_reve(vector bool int __a) {
---
> static __inline__ __ATTRS_o_ai vector bool int vec_reve(vector bool int __a) {
17548c17549,17550
< static inline __ATTRS_o_ai vector signed int vec_reve(vector signed int __a) {
---
> static __inline__ __ATTRS_o_ai vector signed int
> vec_reve(vector signed int __a) {
17552c17554
< static inline __ATTRS_o_ai vector unsigned int
---
> static __inline__ __ATTRS_o_ai vector unsigned int
17557c17559,17560
< static inline __ATTRS_o_ai vector bool short vec_reve(vector bool short __a) {
---
> static __inline__ __ATTRS_o_ai vector bool short
> vec_reve(vector bool short __a) {
17561c17564
< static inline __ATTRS_o_ai vector signed short
---
> static __inline__ __ATTRS_o_ai vector signed short
17566c17569
< static inline __ATTRS_o_ai vector unsigned short
---
> static __inline__ __ATTRS_o_ai vector unsigned short
17571c17574
< static inline __ATTRS_o_ai vector float vec_reve(vector float __a) {
---
> static __inline__ __ATTRS_o_ai vector float vec_reve(vector float __a) {
17576c17579
< static inline __ATTRS_o_ai vector bool long long
---
> static __inline__ __ATTRS_o_ai vector bool long long
17581c17584
< static inline __ATTRS_o_ai vector signed long long
---
> static __inline__ __ATTRS_o_ai vector signed long long
17586c17589
< static inline __ATTRS_o_ai vector unsigned long long
---
> static __inline__ __ATTRS_o_ai vector unsigned long long
17591c17594
< static inline __ATTRS_o_ai vector double vec_reve(vector double __a) {
---
> static __inline__ __ATTRS_o_ai vector double vec_reve(vector double __a) {
17724,17725c17727,17728
< static inline __ATTRS_o_ai vector signed char vec_xl(ptrdiff_t __offset,
< const signed char *__ptr) {
---
> static __inline__ __ATTRS_o_ai vector signed char
> vec_xl(ptrdiff_t __offset, const signed char *__ptr) {
17729c17732
< static inline __ATTRS_o_ai vector unsigned char
---
> static __inline__ __ATTRS_o_ai vector unsigned char
17734c17737
< static inline __ATTRS_o_ai vector signed short
---
> static __inline__ __ATTRS_o_ai vector signed short
17740c17743
< static inline __ATTRS_o_ai vector unsigned short
---
> static __inline__ __ATTRS_o_ai vector unsigned short
17746,17747c17749,17750
< static inline __ATTRS_o_ai vector signed int vec_xl(ptrdiff_t __offset,
< const signed int *__ptr) {
---
> static __inline__ __ATTRS_o_ai vector signed int
> vec_xl(ptrdiff_t __offset, const signed int *__ptr) {
17752c17755
< static inline __ATTRS_o_ai vector unsigned int
---
> static __inline__ __ATTRS_o_ai vector unsigned int
17758,17759c17761,17762
< static inline __ATTRS_o_ai vector float vec_xl(ptrdiff_t __offset,
< const float *__ptr) {
---
> static __inline__ __ATTRS_o_ai vector float vec_xl(ptrdiff_t __offset,
> const float *__ptr) {
17769c17772
< static inline __ATTRS_o_ai vector signed long long
---
> static __inline__ __ATTRS_o_ai vector signed long long
17775c17778
< static inline __ATTRS_o_ai vector unsigned long long
---
> static __inline__ __ATTRS_o_ai vector unsigned long long
17781,17782c17784,17785
< static inline __ATTRS_o_ai vector double vec_xl(ptrdiff_t __offset,
< const double *__ptr) {
---
> static __inline__ __ATTRS_o_ai vector double vec_xl(ptrdiff_t __offset,
> const double *__ptr) {
17793c17796
< static inline __ATTRS_o_ai vector signed __int128
---
> static __inline__ __ATTRS_o_ai vector signed __int128
17799c17802
< static inline __ATTRS_o_ai vector unsigned __int128
---
> static __inline__ __ATTRS_o_ai vector unsigned __int128
17994c17997
< static inline __ATTRS_o_ai void
---
> static __inline__ __ATTRS_o_ai void
17999c18002
< static inline __ATTRS_o_ai void
---
> static __inline__ __ATTRS_o_ai void
18004c18007
< static inline __ATTRS_o_ai void
---
> static __inline__ __ATTRS_o_ai void
18010,18012c18013,18015
< static inline __ATTRS_o_ai void vec_xst(vector unsigned short __vec,
< ptrdiff_t __offset,
< unsigned short *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst(vector unsigned short __vec,
> ptrdiff_t __offset,
> unsigned short *__ptr) {
18017,18018c18020,18021
< static inline __ATTRS_o_ai void vec_xst(vector signed int __vec,
< ptrdiff_t __offset, signed int *__ptr) {
---
> static __inline__ __ATTRS_o_ai void
> vec_xst(vector signed int __vec, ptrdiff_t __offset, signed int *__ptr) {
18023c18026
< static inline __ATTRS_o_ai void
---
> static __inline__ __ATTRS_o_ai void
18029,18030c18032,18033
< static inline __ATTRS_o_ai void vec_xst(vector float __vec, ptrdiff_t __offset,
< float *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst(vector float __vec,
> ptrdiff_t __offset, float *__ptr) {
18036,18038c18039,18041
< static inline __ATTRS_o_ai void vec_xst(vector signed long long __vec,
< ptrdiff_t __offset,
< signed long long *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst(vector signed long long __vec,
> ptrdiff_t __offset,
> signed long long *__ptr) {
18043,18045c18046,18048
< static inline __ATTRS_o_ai void vec_xst(vector unsigned long long __vec,
< ptrdiff_t __offset,
< unsigned long long *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst(vector unsigned long long __vec,
> ptrdiff_t __offset,
> unsigned long long *__ptr) {
18050,18051c18053,18054
< static inline __ATTRS_o_ai void vec_xst(vector double __vec, ptrdiff_t __offset,
< double *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst(vector double __vec,
> ptrdiff_t __offset, double *__ptr) {
18059,18061c18062,18064
< static inline __ATTRS_o_ai void vec_xst(vector signed __int128 __vec,
< ptrdiff_t __offset,
< signed __int128 *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst(vector signed __int128 __vec,
> ptrdiff_t __offset,
> signed __int128 *__ptr) {
18066,18068c18069,18071
< static inline __ATTRS_o_ai void vec_xst(vector unsigned __int128 __vec,
< ptrdiff_t __offset,
< unsigned __int128 *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst(vector unsigned __int128 __vec,
> ptrdiff_t __offset,
> unsigned __int128 *__ptr) {
18078,18080c18081,18083
< static inline __ATTRS_o_ai void vec_xst_trunc(vector signed __int128 __vec,
< ptrdiff_t __offset,
< signed char *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst_trunc(vector signed __int128 __vec,
> ptrdiff_t __offset,
> signed char *__ptr) {
18084,18086c18087,18089
< static inline __ATTRS_o_ai void vec_xst_trunc(vector unsigned __int128 __vec,
< ptrdiff_t __offset,
< unsigned char *__ptr) {
---
> static __inline__ __ATTRS_o_ai void
> vec_xst_trunc(vector unsigned __int128 __vec, ptrdiff_t __offset,
> unsigned char *__ptr) {
18090,18092c18093,18095
< static inline __ATTRS_o_ai void vec_xst_trunc(vector signed __int128 __vec,
< ptrdiff_t __offset,
< signed short *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst_trunc(vector signed __int128 __vec,
> ptrdiff_t __offset,
> signed short *__ptr) {
18096,18098c18099,18101
< static inline __ATTRS_o_ai void vec_xst_trunc(vector unsigned __int128 __vec,
< ptrdiff_t __offset,
< unsigned short *__ptr) {
---
> static __inline__ __ATTRS_o_ai void
> vec_xst_trunc(vector unsigned __int128 __vec, ptrdiff_t __offset,
> unsigned short *__ptr) {
18102,18104c18105,18107
< static inline __ATTRS_o_ai void vec_xst_trunc(vector signed __int128 __vec,
< ptrdiff_t __offset,
< signed int *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst_trunc(vector signed __int128 __vec,
> ptrdiff_t __offset,
> signed int *__ptr) {
18108,18110c18111,18113
< static inline __ATTRS_o_ai void vec_xst_trunc(vector unsigned __int128 __vec,
< ptrdiff_t __offset,
< unsigned int *__ptr) {
---
> static __inline__ __ATTRS_o_ai void
> vec_xst_trunc(vector unsigned __int128 __vec, ptrdiff_t __offset,
> unsigned int *__ptr) {
18114,18116c18117,18119
< static inline __ATTRS_o_ai void vec_xst_trunc(vector signed __int128 __vec,
< ptrdiff_t __offset,
< signed long long *__ptr) {
---
> static __inline__ __ATTRS_o_ai void vec_xst_trunc(vector signed __int128 __vec,
> ptrdiff_t __offset,
> signed long long *__ptr) {
18120,18122c18123,18125
< static inline __ATTRS_o_ai void vec_xst_trunc(vector unsigned __int128 __vec,
< ptrdiff_t __offset,
< unsigned long long *__ptr) {
---
> static __inline__ __ATTRS_o_ai void
> vec_xst_trunc(vector unsigned __int128 __vec, ptrdiff_t __offset,
> unsigned long long *__ptr) {
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/amdgpuintrin.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/amdgpuintrin.h
16,20c16,17
< #include <stdint.h>
<
< #if !defined(__cplusplus)
< _Pragma("push_macro(\"bool\")");
< #define bool _Bool
---
> #ifndef __GPUINTRIN_H
> #error "Never use <amdgpuintrin.h> directly; include <gpuintrin.h> instead"
118,126d114
< // Copies the value from the first active thread in the wavefront to the rest.
< _DEFAULT_FN_ATTRS __inline__ uint64_t
< __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
< uint32_t __hi = (uint32_t)(__x >> 32ull);
< uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
< return ((uint64_t)__builtin_amdgcn_readfirstlane(__hi) << 32ull) |
< ((uint64_t)__builtin_amdgcn_readfirstlane(__lo));
< }
<
148,149c136,139
< __gpu_shuffle_idx_u32(uint64_t __lane_mask, uint32_t __idx, uint32_t __x) {
< return __builtin_amdgcn_ds_bpermute(__idx << 2, __x);
---
> __gpu_shuffle_idx_u32(uint64_t __lane_mask, uint32_t __idx, uint32_t __x,
> uint32_t __width) {
> uint32_t __lane = __idx + (__gpu_lane_id() & ~(__width - 1));
> return __builtin_amdgcn_ds_bpermute(__lane << 2, __x);
152c142
< // Shuffles the the lanes inside the wavefront according to the given index.
---
> // Returns a bitmask marking all lanes that have the same value of __x.
154,158c144,145
< __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t __idx, uint64_t __x) {
< uint32_t __hi = (uint32_t)(__x >> 32ull);
< uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
< return ((uint64_t)__builtin_amdgcn_ds_bpermute(__idx << 2, __hi) << 32ull) |
< ((uint64_t)__builtin_amdgcn_ds_bpermute(__idx << 2, __lo));
---
> __gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
> return __gpu_match_any_u32_impl(__lane_mask, __x);
160a148,165
> // Returns a bitmask marking all lanes that have the same value of __x.
> _DEFAULT_FN_ATTRS static __inline__ uint64_t
> __gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
> return __gpu_match_any_u64_impl(__lane_mask, __x);
> }
>
> // Returns the current lane mask if every lane contains __x.
> _DEFAULT_FN_ATTRS static __inline__ uint64_t
> __gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
> return __gpu_match_all_u32_impl(__lane_mask, __x);
> }
>
> // Returns the current lane mask if every lane contains __x.
> _DEFAULT_FN_ATTRS static __inline__ uint64_t
> __gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
> return __gpu_match_all_u64_impl(__lane_mask, __x);
> }
>
185,188d189
<
< #if !defined(__cplusplus)
< _Pragma("pop_macro(\"bool\")");
< #endif
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/amxavx512intrin.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/amxavx512intrin.h
231c231
< #define _tile_movrow(a, b) __builtin_ia32_tilemovrow(a, b)
---
> #define _tile_movrow(a, b) ((__m512i)__builtin_ia32_tilemovrow(a, b))
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/amxcomplexintrin.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/amxcomplexintrin.h
138,140c138,139
< __DEFAULT_FN_ATTRS_COMPLEX
< static void __tile_cmmimfp16ps(__tile1024i *dst, __tile1024i src0,
< __tile1024i src1) {
---
> static __inline__ void __DEFAULT_FN_ATTRS_COMPLEX
> __tile_cmmimfp16ps(__tile1024i *dst, __tile1024i src0, __tile1024i src1) {
161,163c160,161
< __DEFAULT_FN_ATTRS_COMPLEX
< static void __tile_cmmrlfp16ps(__tile1024i *dst, __tile1024i src0,
< __tile1024i src1) {
---
> static __inline__ void __DEFAULT_FN_ATTRS_COMPLEX
> __tile_cmmrlfp16ps(__tile1024i *dst, __tile1024i src0, __tile1024i src1) {
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/arm_acle.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/arm_acle.h
45c45,46
< static __inline__ void __attribute__((__always_inline__, __nodebug__)) __wfi(void) {
---
> static __inline__ void __attribute__((__always_inline__, __nodebug__))
> __wfi(void) {
51c52,53
< static __inline__ void __attribute__((__always_inline__, __nodebug__)) __wfe(void) {
---
> static __inline__ void __attribute__((__always_inline__, __nodebug__))
> __wfe(void) {
57c59,60
< static __inline__ void __attribute__((__always_inline__, __nodebug__)) __sev(void) {
---
> static __inline__ void __attribute__((__always_inline__, __nodebug__))
> __sev(void) {
63c66,67
< static __inline__ void __attribute__((__always_inline__, __nodebug__)) __sevl(void) {
---
> static __inline__ void __attribute__((__always_inline__, __nodebug__))
> __sevl(void) {
69c73,74
< static __inline__ void __attribute__((__always_inline__, __nodebug__)) __yield(void) {
---
> static __inline__ void __attribute__((__always_inline__, __nodebug__))
> __yield(void) {
875,876c880,882
< static __inline__ const void * __attribute__((__always_inline__, __nodebug__, target("gcs")))
< __gcsss(const void *__stack) {
---
> static __inline__ void *__attribute__((__always_inline__, __nodebug__,
> target("gcs")))
> __gcsss(void *__stack) {
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/arm_fp16.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/arm_fp16.h
37c37
< __ret = (float16_t) __builtin_neon_vabdh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vabdh_f16(__s0, __s1)); \
43c43
< __ret = (float16_t) __builtin_neon_vabsh_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vabsh_f16(__s0)); \
50c50
< __ret = (float16_t) __builtin_neon_vaddh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vaddh_f16(__s0, __s1)); \
57c57
< __ret = (uint16_t) __builtin_neon_vcageh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcageh_f16(__s0, __s1)); \
64c64
< __ret = (uint16_t) __builtin_neon_vcagth_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcagth_f16(__s0, __s1)); \
71c71
< __ret = (uint16_t) __builtin_neon_vcaleh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcaleh_f16(__s0, __s1)); \
78c78
< __ret = (uint16_t) __builtin_neon_vcalth_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcalth_f16(__s0, __s1)); \
85c85
< __ret = (uint16_t) __builtin_neon_vceqh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vceqh_f16(__s0, __s1)); \
91c91
< __ret = (uint16_t) __builtin_neon_vceqzh_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vceqzh_f16(__s0)); \
98c98
< __ret = (uint16_t) __builtin_neon_vcgeh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcgeh_f16(__s0, __s1)); \
104c104
< __ret = (uint16_t) __builtin_neon_vcgezh_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcgezh_f16(__s0)); \
111c111
< __ret = (uint16_t) __builtin_neon_vcgth_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcgth_f16(__s0, __s1)); \
117c117
< __ret = (uint16_t) __builtin_neon_vcgtzh_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcgtzh_f16(__s0)); \
124c124
< __ret = (uint16_t) __builtin_neon_vcleh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcleh_f16(__s0, __s1)); \
130c130
< __ret = (uint16_t) __builtin_neon_vclezh_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vclezh_f16(__s0)); \
137c137
< __ret = (uint16_t) __builtin_neon_vclth_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vclth_f16(__s0, __s1)); \
143c143
< __ret = (uint16_t) __builtin_neon_vcltzh_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcltzh_f16(__s0)); \
149c149
< __ret = (int16_t) __builtin_neon_vcvth_n_s16_f16(__s0, __p1); \
---
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vcvth_n_s16_f16(__s0, __p1)); \
155c155
< __ret = (int32_t) __builtin_neon_vcvth_n_s32_f16(__s0, __p1); \
---
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vcvth_n_s32_f16(__s0, __p1)); \
161c161
< __ret = (int64_t) __builtin_neon_vcvth_n_s64_f16(__s0, __p1); \
---
> __ret = __builtin_bit_cast(int64_t, __builtin_neon_vcvth_n_s64_f16(__s0, __p1)); \
167c167
< __ret = (uint16_t) __builtin_neon_vcvth_n_u16_f16(__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcvth_n_u16_f16(__s0, __p1)); \
173c173
< __ret = (uint32_t) __builtin_neon_vcvth_n_u32_f16(__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vcvth_n_u32_f16(__s0, __p1)); \
179c179
< __ret = (uint64_t) __builtin_neon_vcvth_n_u64_f16(__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint64_t, __builtin_neon_vcvth_n_u64_f16(__s0, __p1)); \
185c185
< __ret = (int16_t) __builtin_neon_vcvth_s16_f16(__s0); \
---
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vcvth_s16_f16(__s0)); \
191c191
< __ret = (int32_t) __builtin_neon_vcvth_s32_f16(__s0); \
---
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vcvth_s32_f16(__s0)); \
197c197
< __ret = (int64_t) __builtin_neon_vcvth_s64_f16(__s0); \
---
> __ret = __builtin_bit_cast(int64_t, __builtin_neon_vcvth_s64_f16(__s0)); \
203c203
< __ret = (uint16_t) __builtin_neon_vcvth_u16_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcvth_u16_f16(__s0)); \
209c209
< __ret = (uint32_t) __builtin_neon_vcvth_u32_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vcvth_u32_f16(__s0)); \
215c215
< __ret = (uint64_t) __builtin_neon_vcvth_u64_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint64_t, __builtin_neon_vcvth_u64_f16(__s0)); \
221c221
< __ret = (int16_t) __builtin_neon_vcvtah_s16_f16(__s0); \
---
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vcvtah_s16_f16(__s0)); \
227c227
< __ret = (int32_t) __builtin_neon_vcvtah_s32_f16(__s0); \
---
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vcvtah_s32_f16(__s0)); \
233c233
< __ret = (int64_t) __builtin_neon_vcvtah_s64_f16(__s0); \
---
> __ret = __builtin_bit_cast(int64_t, __builtin_neon_vcvtah_s64_f16(__s0)); \
239c239
< __ret = (uint16_t) __builtin_neon_vcvtah_u16_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcvtah_u16_f16(__s0)); \
245c245
< __ret = (uint32_t) __builtin_neon_vcvtah_u32_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vcvtah_u32_f16(__s0)); \
251c251
< __ret = (uint64_t) __builtin_neon_vcvtah_u64_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint64_t, __builtin_neon_vcvtah_u64_f16(__s0)); \
257c257
< __ret = (float16_t) __builtin_neon_vcvth_f16_u16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_f16_u16(__s0)); \
263c263
< __ret = (float16_t) __builtin_neon_vcvth_f16_s16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_f16_s16(__s0)); \
269c269
< __ret = (float16_t) __builtin_neon_vcvth_f16_u32(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_f16_u32(__s0)); \
275c275
< __ret = (float16_t) __builtin_neon_vcvth_f16_s32(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_f16_s32(__s0)); \
281c281
< __ret = (float16_t) __builtin_neon_vcvth_f16_u64(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_f16_u64(__s0)); \
287c287
< __ret = (float16_t) __builtin_neon_vcvth_f16_s64(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_f16_s64(__s0)); \
293c293
< __ret = (float16_t) __builtin_neon_vcvth_n_f16_u32(__s0, __p1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_n_f16_u32(__s0, __p1)); \
299c299
< __ret = (float16_t) __builtin_neon_vcvth_n_f16_s32(__s0, __p1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_n_f16_s32(__s0, __p1)); \
305c305
< __ret = (float16_t) __builtin_neon_vcvth_n_f16_u64(__s0, __p1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_n_f16_u64(__s0, __p1)); \
311c311
< __ret = (float16_t) __builtin_neon_vcvth_n_f16_s64(__s0, __p1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_n_f16_s64(__s0, __p1)); \
317c317
< __ret = (float16_t) __builtin_neon_vcvth_n_f16_u16(__s0, __p1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_n_f16_u16(__s0, __p1)); \
323c323
< __ret = (float16_t) __builtin_neon_vcvth_n_f16_s16(__s0, __p1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vcvth_n_f16_s16(__s0, __p1)); \
329c329
< __ret = (int16_t) __builtin_neon_vcvtmh_s16_f16(__s0); \
---
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vcvtmh_s16_f16(__s0)); \
335c335
< __ret = (int32_t) __builtin_neon_vcvtmh_s32_f16(__s0); \
---
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vcvtmh_s32_f16(__s0)); \
341c341
< __ret = (int64_t) __builtin_neon_vcvtmh_s64_f16(__s0); \
---
> __ret = __builtin_bit_cast(int64_t, __builtin_neon_vcvtmh_s64_f16(__s0)); \
347c347
< __ret = (uint16_t) __builtin_neon_vcvtmh_u16_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcvtmh_u16_f16(__s0)); \
353c353
< __ret = (uint32_t) __builtin_neon_vcvtmh_u32_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vcvtmh_u32_f16(__s0)); \
359c359
< __ret = (uint64_t) __builtin_neon_vcvtmh_u64_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint64_t, __builtin_neon_vcvtmh_u64_f16(__s0)); \
365c365
< __ret = (int16_t) __builtin_neon_vcvtnh_s16_f16(__s0); \
---
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vcvtnh_s16_f16(__s0)); \
371c371
< __ret = (int32_t) __builtin_neon_vcvtnh_s32_f16(__s0); \
---
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vcvtnh_s32_f16(__s0)); \
377c377
< __ret = (int64_t) __builtin_neon_vcvtnh_s64_f16(__s0); \
---
> __ret = __builtin_bit_cast(int64_t, __builtin_neon_vcvtnh_s64_f16(__s0)); \
383c383
< __ret = (uint16_t) __builtin_neon_vcvtnh_u16_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcvtnh_u16_f16(__s0)); \
389c389
< __ret = (uint32_t) __builtin_neon_vcvtnh_u32_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vcvtnh_u32_f16(__s0)); \
395c395
< __ret = (uint64_t) __builtin_neon_vcvtnh_u64_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint64_t, __builtin_neon_vcvtnh_u64_f16(__s0)); \
401c401
< __ret = (int16_t) __builtin_neon_vcvtph_s16_f16(__s0); \
---
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vcvtph_s16_f16(__s0)); \
407c407
< __ret = (int32_t) __builtin_neon_vcvtph_s32_f16(__s0); \
---
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vcvtph_s32_f16(__s0)); \
413c413
< __ret = (int64_t) __builtin_neon_vcvtph_s64_f16(__s0); \
---
> __ret = __builtin_bit_cast(int64_t, __builtin_neon_vcvtph_s64_f16(__s0)); \
419c419
< __ret = (uint16_t) __builtin_neon_vcvtph_u16_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vcvtph_u16_f16(__s0)); \
425c425
< __ret = (uint32_t) __builtin_neon_vcvtph_u32_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vcvtph_u32_f16(__s0)); \
431c431
< __ret = (uint64_t) __builtin_neon_vcvtph_u64_f16(__s0); \
---
> __ret = __builtin_bit_cast(uint64_t, __builtin_neon_vcvtph_u64_f16(__s0)); \
438c438
< __ret = (float16_t) __builtin_neon_vdivh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vdivh_f16(__s0, __s1)); \
446c446
< __ret = (float16_t) __builtin_neon_vfmah_f16(__s0, __s1, __s2); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vfmah_f16(__s0, __s1, __s2)); \
454c454
< __ret = (float16_t) __builtin_neon_vfmsh_f16(__s0, __s1, __s2); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vfmsh_f16(__s0, __s1, __s2)); \
461c461
< __ret = (float16_t) __builtin_neon_vmaxh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vmaxh_f16(__s0, __s1)); \
468c468
< __ret = (float16_t) __builtin_neon_vmaxnmh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vmaxnmh_f16(__s0, __s1)); \
475c475
< __ret = (float16_t) __builtin_neon_vminh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vminh_f16(__s0, __s1)); \
482c482
< __ret = (float16_t) __builtin_neon_vminnmh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vminnmh_f16(__s0, __s1)); \
489c489
< __ret = (float16_t) __builtin_neon_vmulh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vmulh_f16(__s0, __s1)); \
496c496
< __ret = (float16_t) __builtin_neon_vmulxh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vmulxh_f16(__s0, __s1)); \
502c502
< __ret = (float16_t) __builtin_neon_vnegh_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vnegh_f16(__s0)); \
508c508
< __ret = (float16_t) __builtin_neon_vrecpeh_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrecpeh_f16(__s0)); \
515c515
< __ret = (float16_t) __builtin_neon_vrecpsh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrecpsh_f16(__s0, __s1)); \
521c521
< __ret = (float16_t) __builtin_neon_vrecpxh_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrecpxh_f16(__s0)); \
527c527
< __ret = (float16_t) __builtin_neon_vrndh_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrndh_f16(__s0)); \
533c533
< __ret = (float16_t) __builtin_neon_vrndah_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrndah_f16(__s0)); \
539c539
< __ret = (float16_t) __builtin_neon_vrndih_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrndih_f16(__s0)); \
545c545
< __ret = (float16_t) __builtin_neon_vrndmh_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrndmh_f16(__s0)); \
551c551
< __ret = (float16_t) __builtin_neon_vrndnh_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrndnh_f16(__s0)); \
557c557
< __ret = (float16_t) __builtin_neon_vrndph_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrndph_f16(__s0)); \
563c563
< __ret = (float16_t) __builtin_neon_vrndxh_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrndxh_f16(__s0)); \
569c569
< __ret = (float16_t) __builtin_neon_vrsqrteh_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrsqrteh_f16(__s0)); \
576c576
< __ret = (float16_t) __builtin_neon_vrsqrtsh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vrsqrtsh_f16(__s0, __s1)); \
582c582
< __ret = (float16_t) __builtin_neon_vsqrth_f16(__s0); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vsqrth_f16(__s0)); \
589c589
< __ret = (float16_t) __builtin_neon_vsubh_f16(__s0, __s1); \
---
> __ret = __builtin_bit_cast(float16_t, __builtin_neon_vsubh_f16(__s0, __s1)); \
diff -r android-ndk-r29/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/arm_neon.h android-ndk-r30-beta1/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/21/include/arm_neon.h
125a126,144
> #if !defined(__LITTLE_ENDIAN__)
> #if defined(__aarch64__) || defined(__arm64ec__)
> #define __lane_reverse_64_32 1,0
> #define __lane_reverse_64_16 3,2,1,0
> #define __lane_reverse_64_8 7,6,5,4,3,2,1,0
> #define __lane_reverse_128_64 1,0
> #define __lane_reverse_128_32 3,2,1,0
> #define __lane_reverse_128_16 7,6,5,4,3,2,1,0
> #define __lane_reverse_128_8 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
> #else
> #define __lane_reverse_64_32 1,0
> #define __lane_reverse_64_16 3,2,1,0
> #define __lane_reverse_64_8 7,6,5,4,3,2,1,0
> #define __lane_reverse_128_64 0,1
> #define __lane_reverse_128_32 1,0,3,2
> #define __lane_reverse_128_16 3,2,1,0,7,6,5,4
> #define __lane_reverse_128_8 7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8
> #endif
> #endif
130c149
< __ret = (bfloat16x8_t) __builtin_neon_splatq_lane_bf16((int8x8_t)__s0, __p1, 11); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_splatq_lane_bf16(__builtin_bit_cast(int8x8_t, __s0), __p1, 11)); \
137,139c156,158
< bfloat16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (bfloat16x8_t) __builtin_neon_splatq_lane_bf16((int8x8_t)__rev0, __p1, 11); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> bfloat16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_splatq_lane_bf16(__builtin_bit_cast(int8x8_t, __rev0), __p1, 11)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
145c164
< __ret = (bfloat16x8_t) __builtin_neon_splatq_lane_bf16((int8x8_t)__s0, __p1, 11); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_splatq_lane_bf16(__builtin_bit_cast(int8x8_t, __s0), __p1, 11)); \
154c173
< __ret = (bfloat16x4_t) __builtin_neon_splat_lane_bf16((int8x8_t)__s0, __p1, 11); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_splat_lane_bf16(__builtin_bit_cast(int8x8_t, __s0), __p1, 11)); \
161,163c180,182
< bfloat16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (bfloat16x4_t) __builtin_neon_splat_lane_bf16((int8x8_t)__rev0, __p1, 11); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> bfloat16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_splat_lane_bf16(__builtin_bit_cast(int8x8_t, __rev0), __p1, 11)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
169c188
< __ret = (bfloat16x4_t) __builtin_neon_splat_lane_bf16((int8x8_t)__s0, __p1, 11); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_splat_lane_bf16(__builtin_bit_cast(int8x8_t, __s0), __p1, 11)); \
178c197
< __ret = (bfloat16x8_t) __builtin_neon_splatq_laneq_bf16((int8x16_t)__s0, __p1, 43); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_splatq_laneq_bf16(__builtin_bit_cast(int8x16_t, __s0), __p1, 43)); \
185,187c204,206
< bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (bfloat16x8_t) __builtin_neon_splatq_laneq_bf16((int8x16_t)__rev0, __p1, 43); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_splatq_laneq_bf16(__builtin_bit_cast(int8x16_t, __rev0), __p1, 43)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
193c212
< __ret = (bfloat16x8_t) __builtin_neon_splatq_laneq_bf16((int8x16_t)__s0, __p1, 43); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_splatq_laneq_bf16(__builtin_bit_cast(int8x16_t, __s0), __p1, 43)); \
202c221
< __ret = (bfloat16x4_t) __builtin_neon_splat_laneq_bf16((int8x16_t)__s0, __p1, 43); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_splat_laneq_bf16(__builtin_bit_cast(int8x16_t, __s0), __p1, 43)); \
209,211c228,230
< bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (bfloat16x4_t) __builtin_neon_splat_laneq_bf16((int8x16_t)__rev0, __p1, 43); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_splat_laneq_bf16(__builtin_bit_cast(int8x16_t, __rev0), __p1, 43)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
217c236
< __ret = (bfloat16x4_t) __builtin_neon_splat_laneq_bf16((int8x16_t)__s0, __p1, 43); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_splat_laneq_bf16(__builtin_bit_cast(int8x16_t, __s0), __p1, 43)); \
225c244
< __ret = (float32x4_t) __builtin_neon_vbfdotq_f32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 41);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfdotq_f32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 41));
231,235c250,254
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< bfloat16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float32x4_t) __builtin_neon_vbfdotq_f32((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 41);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> bfloat16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfdotq_f32(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 41));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
240c259
< __ret = (float32x4_t) __builtin_neon_vbfdotq_f32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 41);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfdotq_f32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 41));
248c267
< __ret = (float32x2_t) __builtin_neon_vbfdot_f32((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 9);
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vbfdot_f32(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 9));
254,258c273,277
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< bfloat16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 3, 2, 1, 0);
< __ret = (float32x2_t) __builtin_neon_vbfdot_f32((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 9);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> bfloat16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vbfdot_f32(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 9));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
263c282
< __ret = (float32x2_t) __builtin_neon_vbfdot_f32((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 9);
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vbfdot_f32(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 9));
271c290
< __ret = (float32x4_t) __builtin_neon_vbfmlalbq_f32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 41);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfmlalbq_f32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 41));
277,281c296,300
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< bfloat16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float32x4_t) __builtin_neon_vbfmlalbq_f32((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 41);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> bfloat16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfmlalbq_f32(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 41));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
286c305
< __ret = (float32x4_t) __builtin_neon_vbfmlalbq_f32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 41);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfmlalbq_f32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 41));
294c313
< __ret = (float32x4_t) __builtin_neon_vbfmlaltq_f32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 41);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfmlaltq_f32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 41));
300,304c319,323
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< bfloat16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float32x4_t) __builtin_neon_vbfmlaltq_f32((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 41);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> bfloat16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfmlaltq_f32(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 41));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
309c328
< __ret = (float32x4_t) __builtin_neon_vbfmlaltq_f32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 41);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfmlaltq_f32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 41));
317c336
< __ret = (float32x4_t) __builtin_neon_vbfmmlaq_f32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 41);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfmmlaq_f32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 41));
323,327c342,346
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< bfloat16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float32x4_t) __builtin_neon_vbfmmlaq_f32((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 41);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> bfloat16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbfmmlaq_f32(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 41));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
341,342c360,361
< bfloat16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> bfloat16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
344c363
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
357c376
< __ret = (bfloat16x4_t)(__promote); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __promote); \
367c386
< __ret = (bfloat16_t) __builtin_neon_vcvth_bf16_f32(__p0);
---
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vcvth_bf16_f32(__p0));
374c393
< __ret = (bfloat16_t) __builtin_neon_vduph_lane_bf16((bfloat16x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vduph_lane_bf16(__s0, __p1)); \
381,382c400,401
< bfloat16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (bfloat16_t) __builtin_neon_vduph_lane_bf16((bfloat16x4_t)__rev0, __p1); \
---
> bfloat16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vduph_lane_bf16(__rev0, __p1)); \
398c417
< bfloat16x4_t __rev0_1; __rev0_1 = __builtin_shufflevector(__s0_1, __s0_1, 3, 2, 1, 0); \
---
> bfloat16x4_t __rev0_1; __rev0_1 = __builtin_shufflevector(__s0_1, __s0_1, __lane_reverse_64_16); \
400c419
< __ret_1 = __builtin_shufflevector(__ret_1, __ret_1, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_1 = __builtin_shufflevector(__ret_1, __ret_1, __lane_reverse_128_16); \
416c435
< bfloat16x4_t __rev0_3; __rev0_3 = __builtin_shufflevector(__s0_3, __s0_3, 3, 2, 1, 0); \
---
> bfloat16x4_t __rev0_3; __rev0_3 = __builtin_shufflevector(__s0_3, __s0_3, __lane_reverse_64_16); \
418c437
< __ret_3 = __builtin_shufflevector(__ret_3, __ret_3, 3, 2, 1, 0); \
---
> __ret_3 = __builtin_shufflevector(__ret_3, __ret_3, __lane_reverse_64_16); \
427c446
< __ret = (bfloat16_t) __builtin_neon_vduph_laneq_bf16((bfloat16x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vduph_laneq_bf16(__s0, __p1)); \
434,435c453,454
< bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (bfloat16_t) __builtin_neon_vduph_laneq_bf16((bfloat16x8_t)__rev0, __p1); \
---
> bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vduph_laneq_bf16(__rev0, __p1)); \
451c470
< bfloat16x8_t __rev0_5; __rev0_5 = __builtin_shufflevector(__s0_5, __s0_5, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> bfloat16x8_t __rev0_5; __rev0_5 = __builtin_shufflevector(__s0_5, __s0_5, __lane_reverse_128_16); \
453c472
< __ret_5 = __builtin_shufflevector(__ret_5, __ret_5, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_5 = __builtin_shufflevector(__ret_5, __ret_5, __lane_reverse_128_16); \
469c488
< bfloat16x8_t __rev0_7; __rev0_7 = __builtin_shufflevector(__s0_7, __s0_7, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> bfloat16x8_t __rev0_7; __rev0_7 = __builtin_shufflevector(__s0_7, __s0_7, __lane_reverse_128_16); \
471c490
< __ret_7 = __builtin_shufflevector(__ret_7, __ret_7, 3, 2, 1, 0); \
---
> __ret_7 = __builtin_shufflevector(__ret_7, __ret_7, __lane_reverse_64_16); \
486c505
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
501c520
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
515c534
< bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
517c536
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
531c550
< __ret = (bfloat16_t) __builtin_neon_vgetq_lane_bf16((bfloat16x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vgetq_lane_bf16(__s0, __p1)); \
538,539c557,558
< bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (bfloat16_t) __builtin_neon_vgetq_lane_bf16((bfloat16x8_t)__rev0, __p1); \
---
> bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vgetq_lane_bf16(__rev0, __p1)); \
545c564
< __ret = (bfloat16_t) __builtin_neon_vgetq_lane_bf16((bfloat16x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vgetq_lane_bf16(__s0, __p1)); \
554c573
< __ret = (bfloat16_t) __builtin_neon_vget_lane_bf16((bfloat16x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vget_lane_bf16(__s0, __p1)); \
561,562c580,581
< bfloat16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (bfloat16_t) __builtin_neon_vget_lane_bf16((bfloat16x4_t)__rev0, __p1); \
---
> bfloat16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vget_lane_bf16(__rev0, __p1)); \
568c587
< __ret = (bfloat16_t) __builtin_neon_vget_lane_bf16((bfloat16x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(bfloat16_t, __builtin_neon_vget_lane_bf16(__s0, __p1)); \
582c601
< bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> bfloat16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
584c603
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
597c616
< __ret = (bfloat16x8_t) __builtin_neon_vld1q_bf16(__p0, 43); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_vld1q_bf16(__p0, 43)); \
603,604c622,623
< __ret = (bfloat16x8_t) __builtin_neon_vld1q_bf16(__p0, 43); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_vld1q_bf16(__p0, 43)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
612c631
< __ret = (bfloat16x4_t) __builtin_neon_vld1_bf16(__p0, 11); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_vld1_bf16(__p0, 11)); \
618,619c637,638
< __ret = (bfloat16x4_t) __builtin_neon_vld1_bf16(__p0, 11); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_vld1_bf16(__p0, 11)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
627c646
< __ret = (bfloat16x8_t) __builtin_neon_vld1q_dup_bf16(__p0, 43); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_vld1q_dup_bf16(__p0, 43)); \
633,634c652,653
< __ret = (bfloat16x8_t) __builtin_neon_vld1q_dup_bf16(__p0, 43); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_vld1q_dup_bf16(__p0, 43)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
642c661
< __ret = (bfloat16x4_t) __builtin_neon_vld1_dup_bf16(__p0, 11); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_vld1_dup_bf16(__p0, 11)); \
648,649c667,668
< __ret = (bfloat16x4_t) __builtin_neon_vld1_dup_bf16(__p0, 11); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_vld1_dup_bf16(__p0, 11)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
658c677
< __ret = (bfloat16x8_t) __builtin_neon_vld1q_lane_bf16(__p0, (int8x16_t)__s1, __p2, 43); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_vld1q_lane_bf16(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 43)); \
665,667c684,686
< bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (bfloat16x8_t) __builtin_neon_vld1q_lane_bf16(__p0, (int8x16_t)__rev1, __p2, 43); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_vld1q_lane_bf16(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 43)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
676c695
< __ret = (bfloat16x4_t) __builtin_neon_vld1_lane_bf16(__p0, (int8x8_t)__s1, __p2, 11); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_vld1_lane_bf16(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 11)); \
683,685c702,704
< bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (bfloat16x4_t) __builtin_neon_vld1_lane_bf16(__p0, (int8x8_t)__rev1, __p2, 11); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_vld1_lane_bf16(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 11)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
701,702c720,721
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
718,719c737,738
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
735,737c754,756
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
753,755c772,774
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
771,774c790,793
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
790,793c809,812
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
809,810c828,829
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
826,827c845,846
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
843,844c862,863
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
860,861c879,880
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
870c889
< __builtin_neon_vld2q_lane_bf16(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], __p2, 43); \
---
> __builtin_neon_vld2q_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __p2, 43); \
878,880c897,899
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld2q_lane_bf16(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], __p2, 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __builtin_neon_vld2q_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __p2, 43); \
882,883c901,902
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
892c911
< __builtin_neon_vld2_lane_bf16(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 11); \
---
> __builtin_neon_vld2_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 11); \
900,902c919,921
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __builtin_neon_vld2_lane_bf16(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __builtin_neon_vld2_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 11); \
904,905c923,924
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
921,923c940,942
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
939,941c958,960
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
957,959c976,978
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
975,977c994,996
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
986c1005
< __builtin_neon_vld3q_lane_bf16(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], __p2, 43); \
---
> __builtin_neon_vld3q_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __p2, 43); \
994,997c1013,1016
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld3q_lane_bf16(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], __p2, 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __builtin_neon_vld3q_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __p2, 43); \
999,1001c1018,1020
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
1010c1029
< __builtin_neon_vld3_lane_bf16(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 11); \
---
> __builtin_neon_vld3_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 11); \
1018,1021c1037,1040
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __builtin_neon_vld3_lane_bf16(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __builtin_neon_vld3_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 11); \
1023,1025c1042,1044
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
1041,1044c1060,1063
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
1060,1063c1079,1082
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
1079,1082c1098,1101
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
1098,1101c1117,1120
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
1110c1129
< __builtin_neon_vld4q_lane_bf16(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], (int8x16_t)__s1.val[3], __p2, 43); \
---
> __builtin_neon_vld4q_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __builtin_bit_cast(int8x16_t, __s1.val[3]), __p2, 43); \
1118,1122c1137,1141
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld4q_lane_bf16(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], (int8x16_t)__rev1.val[3], __p2, 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_128_16); \
> __builtin_neon_vld4q_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __builtin_bit_cast(int8x16_t, __rev1.val[3]), __p2, 43); \
1124,1127c1143,1146
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
1136c1155
< __builtin_neon_vld4_lane_bf16(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], (int8x8_t)__s1.val[3], __p2, 11); \
---
> __builtin_neon_vld4_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __builtin_bit_cast(int8x8_t, __s1.val[3]), __p2, 11); \
1144,1148c1163,1167
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 3, 2, 1, 0); \
< __builtin_neon_vld4_lane_bf16(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], (int8x8_t)__rev1.val[3], __p2, 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_64_16); \
> __builtin_neon_vld4_lane_bf16(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __builtin_bit_cast(int8x8_t, __rev1.val[3]), __p2, 11); \
1150,1153c1169,1172
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
1163c1182
< __ret = (bfloat16x8_t) __builtin_neon_vsetq_lane_bf16(__s0, (bfloat16x8_t)__s1, __p2); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_vsetq_lane_bf16(__s0, __s1, __p2)); \
1171,1173c1190,1192
< bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (bfloat16x8_t) __builtin_neon_vsetq_lane_bf16(__s0, (bfloat16x8_t)__rev1, __p2); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_vsetq_lane_bf16(__s0, __rev1, __p2)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
1180c1199
< __ret = (bfloat16x8_t) __builtin_neon_vsetq_lane_bf16(__s0, (bfloat16x8_t)__s1, __p2); \
---
> __ret = __builtin_bit_cast(bfloat16x8_t, __builtin_neon_vsetq_lane_bf16(__s0, __s1, __p2)); \
1190c1209
< __ret = (bfloat16x4_t) __builtin_neon_vset_lane_bf16(__s0, (bfloat16x4_t)__s1, __p2); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_vset_lane_bf16(__s0, __s1, __p2)); \
1198,1200c1217,1219
< bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (bfloat16x4_t) __builtin_neon_vset_lane_bf16(__s0, (bfloat16x4_t)__rev1, __p2); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_vset_lane_bf16(__s0, __rev1, __p2)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
1207c1226
< __ret = (bfloat16x4_t) __builtin_neon_vset_lane_bf16(__s0, (bfloat16x4_t)__s1, __p2); \
---
> __ret = __builtin_bit_cast(bfloat16x4_t, __builtin_neon_vset_lane_bf16(__s0, __s1, __p2)); \
1215c1234
< __builtin_neon_vst1q_bf16(__p0, (int8x16_t)__s1, 43); \
---
> __builtin_neon_vst1q_bf16(__p0, __builtin_bit_cast(int8x16_t, __s1), 43); \
1220,1221c1239,1240
< bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst1q_bf16(__p0, (int8x16_t)__rev1, 43); \
---
> bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __builtin_neon_vst1q_bf16(__p0, __builtin_bit_cast(int8x16_t, __rev1), 43); \
1228c1247
< __builtin_neon_vst1_bf16(__p0, (int8x8_t)__s1, 11); \
---
> __builtin_neon_vst1_bf16(__p0, __builtin_bit_cast(int8x8_t, __s1), 11); \
1233,1234c1252,1253
< bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __builtin_neon_vst1_bf16(__p0, (int8x8_t)__rev1, 11); \
---
> bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __builtin_neon_vst1_bf16(__p0, __builtin_bit_cast(int8x8_t, __rev1), 11); \
1241c1260
< __builtin_neon_vst1q_lane_bf16(__p0, (int8x16_t)__s1, __p2, 43); \
---
> __builtin_neon_vst1q_lane_bf16(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 43); \
1246,1247c1265,1266
< bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst1q_lane_bf16(__p0, (int8x16_t)__rev1, __p2, 43); \
---
> bfloat16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __builtin_neon_vst1q_lane_bf16(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 43); \
1254c1273
< __builtin_neon_vst1_lane_bf16(__p0, (int8x8_t)__s1, __p2, 11); \
---
> __builtin_neon_vst1_lane_bf16(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 11); \
1259,1260c1278,1279
< bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __builtin_neon_vst1_lane_bf16(__p0, (int8x8_t)__rev1, __p2, 11); \
---
> bfloat16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __builtin_neon_vst1_lane_bf16(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 11); \
1267c1286
< __builtin_neon_vst1q_bf16_x2(__p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], 43); \
---
> __builtin_neon_vst1q_bf16_x2(__p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), 43); \
1273,1275c1292,1294
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst1q_bf16_x2(__p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __builtin_neon_vst1q_bf16_x2(__p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), 43); \
1282c1301
< __builtin_neon_vst1_bf16_x2(__p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], 11); \
---
> __builtin_neon_vst1_bf16_x2(__p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), 11); \
1288,1290c1307,1309
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __builtin_neon_vst1_bf16_x2(__p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __builtin_neon_vst1_bf16_x2(__p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), 11); \
1297c1316
< __builtin_neon_vst1q_bf16_x3(__p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], 43); \
---
> __builtin_neon_vst1q_bf16_x3(__p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), 43); \
1303,1306c1322,1325
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst1q_bf16_x3(__p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __builtin_neon_vst1q_bf16_x3(__p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), 43); \
1313c1332
< __builtin_neon_vst1_bf16_x3(__p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], 11); \
---
> __builtin_neon_vst1_bf16_x3(__p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), 11); \
1319,1322c1338,1341
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __builtin_neon_vst1_bf16_x3(__p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __builtin_neon_vst1_bf16_x3(__p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), 11); \
1329c1348
< __builtin_neon_vst1q_bf16_x4(__p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], (int8x16_t)__s1.val[3], 43); \
---
> __builtin_neon_vst1q_bf16_x4(__p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __builtin_bit_cast(int8x16_t, __s1.val[3]), 43); \
1335,1339c1354,1358
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst1q_bf16_x4(__p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], (int8x16_t)__rev1.val[3], 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_128_16); \
> __builtin_neon_vst1q_bf16_x4(__p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __builtin_bit_cast(int8x16_t, __rev1.val[3]), 43); \
1346c1365
< __builtin_neon_vst1_bf16_x4(__p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], (int8x8_t)__s1.val[3], 11); \
---
> __builtin_neon_vst1_bf16_x4(__p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __builtin_bit_cast(int8x8_t, __s1.val[3]), 11); \
1352,1356c1371,1375
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 3, 2, 1, 0); \
< __builtin_neon_vst1_bf16_x4(__p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], (int8x8_t)__rev1.val[3], 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_64_16); \
> __builtin_neon_vst1_bf16_x4(__p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __builtin_bit_cast(int8x8_t, __rev1.val[3]), 11); \
1363c1382
< __builtin_neon_vst2q_bf16(__p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], 43); \
---
> __builtin_neon_vst2q_bf16(__p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), 43); \
1369,1371c1388,1390
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst2q_bf16(__p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __builtin_neon_vst2q_bf16(__p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), 43); \
1378c1397
< __builtin_neon_vst2_bf16(__p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], 11); \
---
> __builtin_neon_vst2_bf16(__p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), 11); \
1384,1386c1403,1405
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __builtin_neon_vst2_bf16(__p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __builtin_neon_vst2_bf16(__p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), 11); \
1393c1412
< __builtin_neon_vst2q_lane_bf16(__p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], __p2, 43); \
---
> __builtin_neon_vst2q_lane_bf16(__p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __p2, 43); \
1399,1401c1418,1420
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst2q_lane_bf16(__p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], __p2, 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __builtin_neon_vst2q_lane_bf16(__p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __p2, 43); \
1408c1427
< __builtin_neon_vst2_lane_bf16(__p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 11); \
---
> __builtin_neon_vst2_lane_bf16(__p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 11); \
1414,1416c1433,1435
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __builtin_neon_vst2_lane_bf16(__p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __builtin_neon_vst2_lane_bf16(__p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 11); \
1423c1442
< __builtin_neon_vst3q_bf16(__p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], 43); \
---
> __builtin_neon_vst3q_bf16(__p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), 43); \
1429,1432c1448,1451
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst3q_bf16(__p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __builtin_neon_vst3q_bf16(__p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), 43); \
1439c1458
< __builtin_neon_vst3_bf16(__p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], 11); \
---
> __builtin_neon_vst3_bf16(__p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), 11); \
1445,1448c1464,1467
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __builtin_neon_vst3_bf16(__p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __builtin_neon_vst3_bf16(__p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), 11); \
1455c1474
< __builtin_neon_vst3q_lane_bf16(__p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], __p2, 43); \
---
> __builtin_neon_vst3q_lane_bf16(__p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __p2, 43); \
1461,1464c1480,1483
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst3q_lane_bf16(__p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], __p2, 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __builtin_neon_vst3q_lane_bf16(__p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __p2, 43); \
1471c1490
< __builtin_neon_vst3_lane_bf16(__p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 11); \
---
> __builtin_neon_vst3_lane_bf16(__p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 11); \
1477,1480c1496,1499
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __builtin_neon_vst3_lane_bf16(__p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __builtin_neon_vst3_lane_bf16(__p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 11); \
1487c1506
< __builtin_neon_vst4q_bf16(__p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], (int8x16_t)__s1.val[3], 43); \
---
> __builtin_neon_vst4q_bf16(__p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __builtin_bit_cast(int8x16_t, __s1.val[3]), 43); \
1493,1497c1512,1516
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst4q_bf16(__p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], (int8x16_t)__rev1.val[3], 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_128_16); \
> __builtin_neon_vst4q_bf16(__p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __builtin_bit_cast(int8x16_t, __rev1.val[3]), 43); \
1504c1523
< __builtin_neon_vst4_bf16(__p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], (int8x8_t)__s1.val[3], 11); \
---
> __builtin_neon_vst4_bf16(__p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __builtin_bit_cast(int8x8_t, __s1.val[3]), 11); \
1510,1514c1529,1533
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 3, 2, 1, 0); \
< __builtin_neon_vst4_bf16(__p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], (int8x8_t)__rev1.val[3], 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_64_16); \
> __builtin_neon_vst4_bf16(__p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __builtin_bit_cast(int8x8_t, __rev1.val[3]), 11); \
1521c1540
< __builtin_neon_vst4q_lane_bf16(__p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], (int8x16_t)__s1.val[3], __p2, 43); \
---
> __builtin_neon_vst4q_lane_bf16(__p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __builtin_bit_cast(int8x16_t, __s1.val[3]), __p2, 43); \
1527,1531c1546,1550
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vst4q_lane_bf16(__p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], (int8x16_t)__rev1.val[3], __p2, 43); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_128_16); \
> __builtin_neon_vst4q_lane_bf16(__p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __builtin_bit_cast(int8x16_t, __rev1.val[3]), __p2, 43); \
1538c1557
< __builtin_neon_vst4_lane_bf16(__p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], (int8x8_t)__s1.val[3], __p2, 11); \
---
> __builtin_neon_vst4_lane_bf16(__p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __builtin_bit_cast(int8x8_t, __s1.val[3]), __p2, 11); \
1544,1548c1563,1567
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 3, 2, 1, 0); \
< __builtin_neon_vst4_lane_bf16(__p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], (int8x8_t)__rev1.val[3], __p2, 11); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_64_16); \
> __builtin_neon_vst4_lane_bf16(__p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __builtin_bit_cast(int8x8_t, __rev1.val[3]), __p2, 11); \
1555c1574
< __ret = (uint32x4_t) __builtin_neon_vdotq_u32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vdotq_u32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 50));
1561,1565c1580,1584
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vdotq_u32((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> uint8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vdotq_u32(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
1570c1589
< __ret = (uint32x4_t) __builtin_neon_vdotq_u32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vdotq_u32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 50));
1578c1597
< __ret = (int32x4_t) __builtin_neon_vdotq_s32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vdotq_s32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 34));
1584,1588c1603,1607
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vdotq_s32((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> int8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vdotq_s32(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
1593c1612
< __ret = (int32x4_t) __builtin_neon_vdotq_s32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vdotq_s32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 34));
1601c1620
< __ret = (uint32x2_t) __builtin_neon_vdot_u32((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vdot_u32(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 18));
1607,1611c1626,1630
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vdot_u32((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> uint8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vdot_u32(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
1616c1635
< __ret = (uint32x2_t) __builtin_neon_vdot_u32((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vdot_u32(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 18));
1624c1643
< __ret = (int32x2_t) __builtin_neon_vdot_s32((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vdot_s32(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 2));
1630,1634c1649,1653
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vdot_s32((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> int8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vdot_s32(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
1639c1658
< __ret = (int32x2_t) __builtin_neon_vdot_s32((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vdot_s32(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 2));
1647c1666
< __ret = (float16x8_t) __builtin_neon_vabdq_f16((int8x16_t)__p0, (int8x16_t)__p1, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vabdq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 40));
1653,1656c1672,1675
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vabdq_f16((int8x16_t)__rev0, (int8x16_t)__rev1, 40);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vabdq_f16(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1664c1683
< __ret = (float16x4_t) __builtin_neon_vabd_f16((int8x8_t)__p0, (int8x8_t)__p1, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vabd_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 8));
1670,1673c1689,1692
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vabd_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vabd_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
1681c1700
< __ret = (float16x8_t) __builtin_neon_vabsq_f16((int8x16_t)__p0, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vabsq_f16(__builtin_bit_cast(int8x16_t, __p0), 40));
1687,1689c1706,1708
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vabsq_f16((int8x16_t)__rev0, 40);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vabsq_f16(__builtin_bit_cast(int8x16_t, __rev0), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1697c1716
< __ret = (float16x4_t) __builtin_neon_vabs_f16((int8x8_t)__p0, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vabs_f16(__builtin_bit_cast(int8x8_t, __p0), 8));
1703,1705c1722,1724
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vabs_f16((int8x8_t)__rev0, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vabs_f16(__builtin_bit_cast(int8x8_t, __rev0), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
1719,1720c1738,1739
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
1722c1741
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1736,1737c1755,1756
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
1739c1758
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
1747c1766
< __ret = (uint16x8_t) __builtin_neon_vcageq_f16((int8x16_t)__p0, (int8x16_t)__p1, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcageq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 49));
1753,1756c1772,1775
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcageq_f16((int8x16_t)__rev0, (int8x16_t)__rev1, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcageq_f16(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1764c1783
< __ret = (uint16x4_t) __builtin_neon_vcage_f16((int8x8_t)__p0, (int8x8_t)__p1, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcage_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 17));
1770,1773c1789,1792
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcage_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcage_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
1781c1800
< __ret = (uint16x8_t) __builtin_neon_vcagtq_f16((int8x16_t)__p0, (int8x16_t)__p1, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcagtq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 49));
1787,1790c1806,1809
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcagtq_f16((int8x16_t)__rev0, (int8x16_t)__rev1, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcagtq_f16(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1798c1817
< __ret = (uint16x4_t) __builtin_neon_vcagt_f16((int8x8_t)__p0, (int8x8_t)__p1, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcagt_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 17));
1804,1807c1823,1826
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcagt_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcagt_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
1815c1834
< __ret = (uint16x8_t) __builtin_neon_vcaleq_f16((int8x16_t)__p0, (int8x16_t)__p1, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcaleq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 49));
1821,1824c1840,1843
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcaleq_f16((int8x16_t)__rev0, (int8x16_t)__rev1, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcaleq_f16(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1832c1851
< __ret = (uint16x4_t) __builtin_neon_vcale_f16((int8x8_t)__p0, (int8x8_t)__p1, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcale_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 17));
1838,1841c1857,1860
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcale_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcale_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
1849c1868
< __ret = (uint16x8_t) __builtin_neon_vcaltq_f16((int8x16_t)__p0, (int8x16_t)__p1, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcaltq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 49));
1855,1858c1874,1877
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcaltq_f16((int8x16_t)__rev0, (int8x16_t)__rev1, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcaltq_f16(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1866c1885
< __ret = (uint16x4_t) __builtin_neon_vcalt_f16((int8x8_t)__p0, (int8x8_t)__p1, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcalt_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 17));
1872,1875c1891,1894
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcalt_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcalt_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
1883c1902
< __ret = (uint16x8_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 == __p1);
1889,1892c1908,1911
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1900c1919
< __ret = (uint16x4_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 == __p1);
1906,1909c1925,1928
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
1917c1936
< __ret = (uint16x8_t) __builtin_neon_vceqzq_f16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vceqzq_f16(__builtin_bit_cast(int8x16_t, __p0), 40));
1923,1925c1942,1944
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vceqzq_f16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vceqzq_f16(__builtin_bit_cast(int8x16_t, __rev0), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1933c1952
< __ret = (uint16x4_t) __builtin_neon_vceqz_f16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vceqz_f16(__builtin_bit_cast(int8x8_t, __p0), 8));
1939,1941c1958,1960
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vceqz_f16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vceqz_f16(__builtin_bit_cast(int8x8_t, __rev0), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
1949c1968
< __ret = (uint16x8_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 >= __p1);
1955,1958c1974,1977
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1966c1985
< __ret = (uint16x4_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 >= __p1);
1972,1975c1991,1994
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
1983c2002
< __ret = (uint16x8_t) __builtin_neon_vcgezq_f16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcgezq_f16(__builtin_bit_cast(int8x16_t, __p0), 40));
1989,1991c2008,2010
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcgezq_f16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcgezq_f16(__builtin_bit_cast(int8x16_t, __rev0), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
1999c2018
< __ret = (uint16x4_t) __builtin_neon_vcgez_f16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcgez_f16(__builtin_bit_cast(int8x8_t, __p0), 8));
2005,2007c2024,2026
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcgez_f16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcgez_f16(__builtin_bit_cast(int8x8_t, __rev0), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2015c2034
< __ret = (uint16x8_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 > __p1);
2021,2024c2040,2043
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2032c2051
< __ret = (uint16x4_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 > __p1);
2038,2041c2057,2060
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2049c2068
< __ret = (uint16x8_t) __builtin_neon_vcgtzq_f16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcgtzq_f16(__builtin_bit_cast(int8x16_t, __p0), 40));
2055,2057c2074,2076
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcgtzq_f16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcgtzq_f16(__builtin_bit_cast(int8x16_t, __rev0), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2065c2084
< __ret = (uint16x4_t) __builtin_neon_vcgtz_f16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcgtz_f16(__builtin_bit_cast(int8x8_t, __p0), 8));
2071,2073c2090,2092
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcgtz_f16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcgtz_f16(__builtin_bit_cast(int8x8_t, __rev0), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2081c2100
< __ret = (uint16x8_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 <= __p1);
2087,2090c2106,2109
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2098c2117
< __ret = (uint16x4_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 <= __p1);
2104,2107c2123,2126
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2115c2134
< __ret = (uint16x8_t) __builtin_neon_vclezq_f16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vclezq_f16(__builtin_bit_cast(int8x16_t, __p0), 40));
2121,2123c2140,2142
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vclezq_f16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vclezq_f16(__builtin_bit_cast(int8x16_t, __rev0), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2131c2150
< __ret = (uint16x4_t) __builtin_neon_vclez_f16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vclez_f16(__builtin_bit_cast(int8x8_t, __p0), 8));
2137,2139c2156,2158
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vclez_f16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vclez_f16(__builtin_bit_cast(int8x8_t, __rev0), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2147c2166
< __ret = (uint16x8_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 < __p1);
2153,2156c2172,2175
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2164c2183
< __ret = (uint16x4_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 < __p1);
2170,2173c2189,2192
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2181c2200
< __ret = (uint16x8_t) __builtin_neon_vcltzq_f16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcltzq_f16(__builtin_bit_cast(int8x16_t, __p0), 40));
2187,2189c2206,2208
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcltzq_f16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcltzq_f16(__builtin_bit_cast(int8x16_t, __rev0), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2197c2216
< __ret = (uint16x4_t) __builtin_neon_vcltz_f16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcltz_f16(__builtin_bit_cast(int8x8_t, __p0), 8));
2203,2205c2222,2224
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcltz_f16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcltz_f16(__builtin_bit_cast(int8x8_t, __rev0), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2213c2232
< __ret = (float16x8_t) __builtin_neon_vcvtq_f16_u16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vcvtq_f16_u16(__builtin_bit_cast(int8x16_t, __p0), 49));
2219,2221c2238,2240
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vcvtq_f16_u16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vcvtq_f16_u16(__builtin_bit_cast(int8x16_t, __rev0), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2229c2248
< __ret = (float16x8_t) __builtin_neon_vcvtq_f16_s16((int8x16_t)__p0, 33);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vcvtq_f16_s16(__builtin_bit_cast(int8x16_t, __p0), 33));
2235,2237c2254,2256
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vcvtq_f16_s16((int8x16_t)__rev0, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vcvtq_f16_s16(__builtin_bit_cast(int8x16_t, __rev0), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2245c2264
< __ret = (float16x4_t) __builtin_neon_vcvt_f16_u16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vcvt_f16_u16(__builtin_bit_cast(int8x8_t, __p0), 17));
2251,2253c2270,2272
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vcvt_f16_u16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vcvt_f16_u16(__builtin_bit_cast(int8x8_t, __rev0), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2261c2280
< __ret = (float16x4_t) __builtin_neon_vcvt_f16_s16((int8x8_t)__p0, 1);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vcvt_f16_s16(__builtin_bit_cast(int8x8_t, __p0), 1));
2267,2269c2286,2288
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vcvt_f16_s16((int8x8_t)__rev0, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vcvt_f16_s16(__builtin_bit_cast(int8x8_t, __rev0), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2278c2297
< __ret = (float16x8_t) __builtin_neon_vcvtq_n_f16_u16((int8x16_t)__s0, __p1, 49); \
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vcvtq_n_f16_u16(__builtin_bit_cast(int8x16_t, __s0), __p1, 49)); \
2285,2287c2304,2306
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (float16x8_t) __builtin_neon_vcvtq_n_f16_u16((int8x16_t)__rev0, __p1, 49); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vcvtq_n_f16_u16(__builtin_bit_cast(int8x16_t, __rev0), __p1, 49)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
2296c2315
< __ret = (float16x8_t) __builtin_neon_vcvtq_n_f16_s16((int8x16_t)__s0, __p1, 33); \
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vcvtq_n_f16_s16(__builtin_bit_cast(int8x16_t, __s0), __p1, 33)); \
2303,2305c2322,2324
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (float16x8_t) __builtin_neon_vcvtq_n_f16_s16((int8x16_t)__rev0, __p1, 33); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vcvtq_n_f16_s16(__builtin_bit_cast(int8x16_t, __rev0), __p1, 33)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
2314c2333
< __ret = (float16x4_t) __builtin_neon_vcvt_n_f16_u16((int8x8_t)__s0, __p1, 17); \
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vcvt_n_f16_u16(__builtin_bit_cast(int8x8_t, __s0), __p1, 17)); \
2321,2323c2340,2342
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (float16x4_t) __builtin_neon_vcvt_n_f16_u16((int8x8_t)__rev0, __p1, 17); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vcvt_n_f16_u16(__builtin_bit_cast(int8x8_t, __rev0), __p1, 17)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
2332c2351
< __ret = (float16x4_t) __builtin_neon_vcvt_n_f16_s16((int8x8_t)__s0, __p1, 1); \
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vcvt_n_f16_s16(__builtin_bit_cast(int8x8_t, __s0), __p1, 1)); \
2339,2341c2358,2360
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (float16x4_t) __builtin_neon_vcvt_n_f16_s16((int8x8_t)__rev0, __p1, 1); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vcvt_n_f16_s16(__builtin_bit_cast(int8x8_t, __rev0), __p1, 1)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
2350c2369
< __ret = (int16x8_t) __builtin_neon_vcvtq_n_s16_f16((int8x16_t)__s0, __p1, 33); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtq_n_s16_f16(__builtin_bit_cast(int8x16_t, __s0), __p1, 33)); \
2357,2359c2376,2378
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int16x8_t) __builtin_neon_vcvtq_n_s16_f16((int8x16_t)__rev0, __p1, 33); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtq_n_s16_f16(__builtin_bit_cast(int8x16_t, __rev0), __p1, 33)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
2368c2387
< __ret = (int16x4_t) __builtin_neon_vcvt_n_s16_f16((int8x8_t)__s0, __p1, 1); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvt_n_s16_f16(__builtin_bit_cast(int8x8_t, __s0), __p1, 1)); \
2375,2377c2394,2396
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (int16x4_t) __builtin_neon_vcvt_n_s16_f16((int8x8_t)__rev0, __p1, 1); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvt_n_s16_f16(__builtin_bit_cast(int8x8_t, __rev0), __p1, 1)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
2386c2405
< __ret = (uint16x8_t) __builtin_neon_vcvtq_n_u16_f16((int8x16_t)__s0, __p1, 49); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtq_n_u16_f16(__builtin_bit_cast(int8x16_t, __s0), __p1, 49)); \
2393,2395c2412,2414
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint16x8_t) __builtin_neon_vcvtq_n_u16_f16((int8x16_t)__rev0, __p1, 49); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtq_n_u16_f16(__builtin_bit_cast(int8x16_t, __rev0), __p1, 49)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
2404c2423
< __ret = (uint16x4_t) __builtin_neon_vcvt_n_u16_f16((int8x8_t)__s0, __p1, 17); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvt_n_u16_f16(__builtin_bit_cast(int8x8_t, __s0), __p1, 17)); \
2411,2413c2430,2432
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (uint16x4_t) __builtin_neon_vcvt_n_u16_f16((int8x8_t)__rev0, __p1, 17); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvt_n_u16_f16(__builtin_bit_cast(int8x8_t, __rev0), __p1, 17)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
2421c2440
< __ret = (int16x8_t) __builtin_neon_vcvtq_s16_f16((int8x16_t)__p0, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtq_s16_f16(__builtin_bit_cast(int8x16_t, __p0), 33));
2427,2429c2446,2448
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vcvtq_s16_f16((int8x16_t)__rev0, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtq_s16_f16(__builtin_bit_cast(int8x16_t, __rev0), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2437c2456
< __ret = (int16x4_t) __builtin_neon_vcvt_s16_f16((int8x8_t)__p0, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvt_s16_f16(__builtin_bit_cast(int8x8_t, __p0), 1));
2443,2445c2462,2464
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vcvt_s16_f16((int8x8_t)__rev0, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvt_s16_f16(__builtin_bit_cast(int8x8_t, __rev0), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2453c2472
< __ret = (uint16x8_t) __builtin_neon_vcvtq_u16_f16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtq_u16_f16(__builtin_bit_cast(int8x16_t, __p0), 49));
2459,2461c2478,2480
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcvtq_u16_f16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtq_u16_f16(__builtin_bit_cast(int8x16_t, __rev0), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2469c2488
< __ret = (uint16x4_t) __builtin_neon_vcvt_u16_f16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvt_u16_f16(__builtin_bit_cast(int8x8_t, __p0), 17));
2475,2477c2494,2496
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcvt_u16_f16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvt_u16_f16(__builtin_bit_cast(int8x8_t, __rev0), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2485c2504
< __ret = (int16x8_t) __builtin_neon_vcvtaq_s16_f16((int8x16_t)__p0, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtaq_s16_f16(__builtin_bit_cast(int8x16_t, __p0), 33));
2491,2493c2510,2512
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vcvtaq_s16_f16((int8x16_t)__rev0, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtaq_s16_f16(__builtin_bit_cast(int8x16_t, __rev0), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2501c2520
< __ret = (int16x4_t) __builtin_neon_vcvta_s16_f16((int8x8_t)__p0, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvta_s16_f16(__builtin_bit_cast(int8x8_t, __p0), 1));
2507,2509c2526,2528
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vcvta_s16_f16((int8x8_t)__rev0, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvta_s16_f16(__builtin_bit_cast(int8x8_t, __rev0), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2517c2536
< __ret = (uint16x8_t) __builtin_neon_vcvtaq_u16_f16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtaq_u16_f16(__builtin_bit_cast(int8x16_t, __p0), 49));
2523,2525c2542,2544
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcvtaq_u16_f16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtaq_u16_f16(__builtin_bit_cast(int8x16_t, __rev0), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2533c2552
< __ret = (uint16x4_t) __builtin_neon_vcvta_u16_f16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvta_u16_f16(__builtin_bit_cast(int8x8_t, __p0), 17));
2539,2541c2558,2560
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcvta_u16_f16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvta_u16_f16(__builtin_bit_cast(int8x8_t, __rev0), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2549c2568
< __ret = (int16x8_t) __builtin_neon_vcvtmq_s16_f16((int8x16_t)__p0, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtmq_s16_f16(__builtin_bit_cast(int8x16_t, __p0), 33));
2555,2557c2574,2576
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vcvtmq_s16_f16((int8x16_t)__rev0, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtmq_s16_f16(__builtin_bit_cast(int8x16_t, __rev0), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2565c2584
< __ret = (int16x4_t) __builtin_neon_vcvtm_s16_f16((int8x8_t)__p0, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvtm_s16_f16(__builtin_bit_cast(int8x8_t, __p0), 1));
2571,2573c2590,2592
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vcvtm_s16_f16((int8x8_t)__rev0, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvtm_s16_f16(__builtin_bit_cast(int8x8_t, __rev0), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2581c2600
< __ret = (uint16x8_t) __builtin_neon_vcvtmq_u16_f16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtmq_u16_f16(__builtin_bit_cast(int8x16_t, __p0), 49));
2587,2589c2606,2608
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcvtmq_u16_f16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtmq_u16_f16(__builtin_bit_cast(int8x16_t, __rev0), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2597c2616
< __ret = (uint16x4_t) __builtin_neon_vcvtm_u16_f16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvtm_u16_f16(__builtin_bit_cast(int8x8_t, __p0), 17));
2603,2605c2622,2624
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcvtm_u16_f16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvtm_u16_f16(__builtin_bit_cast(int8x8_t, __rev0), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2613c2632
< __ret = (int16x8_t) __builtin_neon_vcvtnq_s16_f16((int8x16_t)__p0, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtnq_s16_f16(__builtin_bit_cast(int8x16_t, __p0), 33));
2619,2621c2638,2640
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vcvtnq_s16_f16((int8x16_t)__rev0, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtnq_s16_f16(__builtin_bit_cast(int8x16_t, __rev0), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2629c2648
< __ret = (int16x4_t) __builtin_neon_vcvtn_s16_f16((int8x8_t)__p0, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvtn_s16_f16(__builtin_bit_cast(int8x8_t, __p0), 1));
2635,2637c2654,2656
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vcvtn_s16_f16((int8x8_t)__rev0, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvtn_s16_f16(__builtin_bit_cast(int8x8_t, __rev0), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2645c2664
< __ret = (uint16x8_t) __builtin_neon_vcvtnq_u16_f16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtnq_u16_f16(__builtin_bit_cast(int8x16_t, __p0), 49));
2651,2653c2670,2672
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcvtnq_u16_f16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtnq_u16_f16(__builtin_bit_cast(int8x16_t, __rev0), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2661c2680
< __ret = (uint16x4_t) __builtin_neon_vcvtn_u16_f16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvtn_u16_f16(__builtin_bit_cast(int8x8_t, __p0), 17));
2667,2669c2686,2688
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcvtn_u16_f16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvtn_u16_f16(__builtin_bit_cast(int8x8_t, __rev0), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2677c2696
< __ret = (int16x8_t) __builtin_neon_vcvtpq_s16_f16((int8x16_t)__p0, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtpq_s16_f16(__builtin_bit_cast(int8x16_t, __p0), 33));
2683,2685c2702,2704
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vcvtpq_s16_f16((int8x16_t)__rev0, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vcvtpq_s16_f16(__builtin_bit_cast(int8x16_t, __rev0), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2693c2712
< __ret = (int16x4_t) __builtin_neon_vcvtp_s16_f16((int8x8_t)__p0, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvtp_s16_f16(__builtin_bit_cast(int8x8_t, __p0), 1));
2699,2701c2718,2720
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vcvtp_s16_f16((int8x8_t)__rev0, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcvtp_s16_f16(__builtin_bit_cast(int8x8_t, __rev0), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2709c2728
< __ret = (uint16x8_t) __builtin_neon_vcvtpq_u16_f16((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtpq_u16_f16(__builtin_bit_cast(int8x16_t, __p0), 49));
2715,2717c2734,2736
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vcvtpq_u16_f16((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vcvtpq_u16_f16(__builtin_bit_cast(int8x16_t, __rev0), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2725c2744
< __ret = (uint16x4_t) __builtin_neon_vcvtp_u16_f16((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvtp_u16_f16(__builtin_bit_cast(int8x8_t, __p0), 17));
2731,2733c2750,2752
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vcvtp_u16_f16((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vcvtp_u16_f16(__builtin_bit_cast(int8x8_t, __rev0), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2741c2760
< __ret = (float16x8_t) __builtin_neon_vfmaq_f16((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vfmaq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 40));
2747,2751c2766,2770
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vfmaq_f16((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 40);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> float16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vfmaq_f16(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2756c2775
< __ret = (float16x8_t) __builtin_neon_vfmaq_f16((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vfmaq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 40));
2764c2783
< __ret = (float16x4_t) __builtin_neon_vfma_f16((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vfma_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 8));
2770,2774c2789,2793
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< float16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vfma_f16((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> float16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vfma_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2779c2798
< __ret = (float16x4_t) __builtin_neon_vfma_f16((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vfma_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 8));
2793,2795c2812,2814
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> float16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_16);
2797c2816
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2811,2813c2830,2832
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< float16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> float16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_16);
2815c2834
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2823c2842
< __ret = (float16x8_t) __builtin_neon_vmaxq_f16((int8x16_t)__p0, (int8x16_t)__p1, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vmaxq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 40));
2829,2832c2848,2851
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vmaxq_f16((int8x16_t)__rev0, (int8x16_t)__rev1, 40);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vmaxq_f16(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2840c2859
< __ret = (float16x4_t) __builtin_neon_vmax_f16((int8x8_t)__p0, (int8x8_t)__p1, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vmax_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 8));
2846,2849c2865,2868
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vmax_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vmax_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2857c2876
< __ret = (float16x8_t) __builtin_neon_vminq_f16((int8x16_t)__p0, (int8x16_t)__p1, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vminq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 40));
2863,2866c2882,2885
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vminq_f16((int8x16_t)__rev0, (int8x16_t)__rev1, 40);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vminq_f16(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2874c2893
< __ret = (float16x4_t) __builtin_neon_vmin_f16((int8x8_t)__p0, (int8x8_t)__p1, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vmin_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 8));
2880,2883c2899,2902
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vmin_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vmin_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2897,2898c2916,2917
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
2900c2919
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2914,2915c2933,2934
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
2917c2936
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2935c2954
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
2937c2956
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
2955c2974
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
2957c2976
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
2971c2990
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
2973c2992
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
2987c3006
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
2989c3008
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
2997c3016
< __ret = (float16x4_t) __builtin_neon_vpadd_f16((int8x8_t)__p0, (int8x8_t)__p1, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vpadd_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 8));
3003,3006c3022,3025
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vpadd_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vpadd_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
3014c3033
< __ret = (float16x4_t) __builtin_neon_vpmax_f16((int8x8_t)__p0, (int8x8_t)__p1, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vpmax_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 8));
3020,3023c3039,3042
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vpmax_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vpmax_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
3031c3050
< __ret = (float16x4_t) __builtin_neon_vpmin_f16((int8x8_t)__p0, (int8x8_t)__p1, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vpmin_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 8));
3037,3040c3056,3059
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vpmin_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vpmin_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
3048c3067
< __ret = (float16x8_t) __builtin_neon_vrecpeq_f16((int8x16_t)__p0, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vrecpeq_f16(__builtin_bit_cast(int8x16_t, __p0), 40));
3054,3056c3073,3075
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vrecpeq_f16((int8x16_t)__rev0, 40);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vrecpeq_f16(__builtin_bit_cast(int8x16_t, __rev0), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
3064c3083
< __ret = (float16x4_t) __builtin_neon_vrecpe_f16((int8x8_t)__p0, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vrecpe_f16(__builtin_bit_cast(int8x8_t, __p0), 8));
3070,3072c3089,3091
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vrecpe_f16((int8x8_t)__rev0, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vrecpe_f16(__builtin_bit_cast(int8x8_t, __rev0), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
3080c3099
< __ret = (float16x8_t) __builtin_neon_vrecpsq_f16((int8x16_t)__p0, (int8x16_t)__p1, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vrecpsq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 40));
3086,3089c3105,3108
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vrecpsq_f16((int8x16_t)__rev0, (int8x16_t)__rev1, 40);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vrecpsq_f16(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
3097c3116
< __ret = (float16x4_t) __builtin_neon_vrecps_f16((int8x8_t)__p0, (int8x8_t)__p1, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vrecps_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 8));
3103,3106c3122,3125
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vrecps_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vrecps_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
3114c3133
< __ret = (float16x8_t) __builtin_neon_vrsqrteq_f16((int8x16_t)__p0, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vrsqrteq_f16(__builtin_bit_cast(int8x16_t, __p0), 40));
3120,3122c3139,3141
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vrsqrteq_f16((int8x16_t)__rev0, 40);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vrsqrteq_f16(__builtin_bit_cast(int8x16_t, __rev0), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
3130c3149
< __ret = (float16x4_t) __builtin_neon_vrsqrte_f16((int8x8_t)__p0, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vrsqrte_f16(__builtin_bit_cast(int8x8_t, __p0), 8));
3136,3138c3155,3157
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vrsqrte_f16((int8x8_t)__rev0, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vrsqrte_f16(__builtin_bit_cast(int8x8_t, __rev0), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
3146c3165
< __ret = (float16x8_t) __builtin_neon_vrsqrtsq_f16((int8x16_t)__p0, (int8x16_t)__p1, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vrsqrtsq_f16(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 40));
3152,3155c3171,3174
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vrsqrtsq_f16((int8x16_t)__rev0, (int8x16_t)__rev1, 40);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vrsqrtsq_f16(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
3163c3182
< __ret = (float16x4_t) __builtin_neon_vrsqrts_f16((int8x8_t)__p0, (int8x8_t)__p1, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vrsqrts_f16(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 8));
3169,3172c3188,3191
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vrsqrts_f16((int8x8_t)__rev0, (int8x8_t)__rev1, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vrsqrts_f16(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
3186,3187c3205,3206
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
3189c3208
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
3203,3204c3222,3223
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
3206c3225
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
3214c3233
< __ret = (uint32x4_t) __builtin_neon_vmmlaq_u32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vmmlaq_u32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 50));
3220,3224c3239,3243
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vmmlaq_u32((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> uint8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vmmlaq_u32(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
3232c3251
< __ret = (int32x4_t) __builtin_neon_vmmlaq_s32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vmmlaq_s32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 34));
3238,3242c3257,3261
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vmmlaq_s32((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> int8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vmmlaq_s32(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
3250c3269
< __ret = (int32x4_t) __builtin_neon_vusdotq_s32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vusdotq_s32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 34));
3256,3260c3275,3279
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vusdotq_s32((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> int8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vusdotq_s32(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
3265c3284
< __ret = (int32x4_t) __builtin_neon_vusdotq_s32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vusdotq_s32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 34));
3273c3292
< __ret = (int32x2_t) __builtin_neon_vusdot_s32((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vusdot_s32(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 2));
3279,3283c3298,3302
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vusdot_s32((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> int8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vusdot_s32(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
3288c3307
< __ret = (int32x2_t) __builtin_neon_vusdot_s32((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vusdot_s32(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 2));
3296c3315
< __ret = (int32x4_t) __builtin_neon_vusmmlaq_s32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vusmmlaq_s32(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 34));
3302,3306c3321,3325
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vusmmlaq_s32((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> int8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vusmmlaq_s32(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
3315c3334
< __ret = (poly8x8_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 4); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 4)); \
3322,3324c3341,3343
< poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly8x8_t) __builtin_neon_splat_lane_v((int8x8_t)__rev0, __p1, 4); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 4)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
3330c3349
< __ret = (poly8x8_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 4); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 4)); \
3338c3357
< __ret = (poly64x1_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 6); \
---
> __ret = __builtin_bit_cast(poly64x1_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 6)); \
3345c3364
< __ret = (poly16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 5); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 5)); \
3352,3354c3371,3373
< poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (poly16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__rev0, __p1, 5); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 5)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
3360c3379
< __ret = (poly16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 5); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 5)); \
3369c3388
< __ret = (poly8x16_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 4); \
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 4)); \
3376,3378c3395,3397
< poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly8x16_t) __builtin_neon_splatq_lane_v((int8x8_t)__rev0, __p1, 4); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 4)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
3384c3403
< __ret = (poly8x16_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 4); \
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 4)); \
3393c3412
< __ret = (poly64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 6); \
---
> __ret = __builtin_bit_cast(poly64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 6)); \
3400,3401c3419,3420
< __ret = (poly64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 6); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(poly64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 6)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
3407c3426
< __ret = (poly64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 6); \
---
> __ret = __builtin_bit_cast(poly64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 6)); \
3416c3435
< __ret = (poly16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 5); \
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 5)); \
3423,3425c3442,3444
< poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (poly16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__rev0, __p1, 5); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 5)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
3431c3450
< __ret = (poly16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 5); \
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 5)); \
3440c3459
< __ret = (uint8x16_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 16); \
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 16)); \
3447,3449c3466,3468
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint8x16_t) __builtin_neon_splatq_lane_v((int8x8_t)__rev0, __p1, 16); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 16)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
3455c3474
< __ret = (uint8x16_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 16); \
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 16)); \
3464c3483
< __ret = (uint32x4_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 18); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 18)); \
3471,3473c3490,3492
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (uint32x4_t) __builtin_neon_splatq_lane_v((int8x8_t)__rev0, __p1, 18); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 18)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
3479c3498
< __ret = (uint32x4_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 18); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 18)); \
3488c3507
< __ret = (uint64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 19); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 19)); \
3495,3496c3514,3515
< __ret = (uint64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 19); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 19)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
3502c3521
< __ret = (uint64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 19); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 19)); \
3511c3530
< __ret = (uint16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 17); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 17)); \
3518,3520c3537,3539
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (uint16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__rev0, __p1, 17); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 17)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
3526c3545
< __ret = (uint16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 17); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 17)); \
3535c3554
< __ret = (int8x16_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 0); \
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 0)); \
3542,3544c3561,3563
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int8x16_t) __builtin_neon_splatq_lane_v((int8x8_t)__rev0, __p1, 0); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 0)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
3550c3569
< __ret = (int8x16_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 0); \
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 0)); \
3559c3578
< __ret = (float64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 10); \
---
> __ret = __builtin_bit_cast(float64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 10)); \
3566,3567c3585,3586
< __ret = (float64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 10); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(float64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 10)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
3573c3592
< __ret = (float64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 10); \
---
> __ret = __builtin_bit_cast(float64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 10)); \
3582c3601
< __ret = (float32x4_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 9); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 9)); \
3589,3591c3608,3610
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (float32x4_t) __builtin_neon_splatq_lane_v((int8x8_t)__rev0, __p1, 9); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 9)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
3597c3616
< __ret = (float32x4_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 9); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 9)); \
3606c3625
< __ret = (float16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 8); \
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 8)); \
3613,3615c3632,3634
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (float16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__rev0, __p1, 8); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 8)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
3621c3640
< __ret = (float16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 8); \
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 8)); \
3630c3649
< __ret = (int32x4_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 2); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 2)); \
3637,3639c3656,3658
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (int32x4_t) __builtin_neon_splatq_lane_v((int8x8_t)__rev0, __p1, 2); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 2)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
3645c3664
< __ret = (int32x4_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 2); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 2)); \
3654c3673
< __ret = (int64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 3); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 3)); \
3661,3662c3680,3681
< __ret = (int64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 3); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 3)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
3668c3687
< __ret = (int64x2_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 3); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 3)); \
3677c3696
< __ret = (int16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 1); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 1)); \
3684,3686c3703,3705
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (int16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__rev0, __p1, 1); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 1)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
3692c3711
< __ret = (int16x8_t) __builtin_neon_splatq_lane_v((int8x8_t)__s0, __p1, 1); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_splatq_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 1)); \
3701c3720
< __ret = (uint8x8_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 16); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 16)); \
3708,3710c3727,3729
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint8x8_t) __builtin_neon_splat_lane_v((int8x8_t)__rev0, __p1, 16); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 16)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
3716c3735
< __ret = (uint8x8_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 16); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 16)); \
3725c3744
< __ret = (uint32x2_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 18); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 18)); \
3732,3734c3751,3753
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (uint32x2_t) __builtin_neon_splat_lane_v((int8x8_t)__rev0, __p1, 18); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 18)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
3740c3759
< __ret = (uint32x2_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 18); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 18)); \
3748c3767
< __ret = (uint64x1_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 19); \
---
> __ret = __builtin_bit_cast(uint64x1_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 19)); \
3755c3774
< __ret = (uint16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 17); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 17)); \
3762,3764c3781,3783
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (uint16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__rev0, __p1, 17); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 17)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
3770c3789
< __ret = (uint16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 17); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 17)); \
3779c3798
< __ret = (int8x8_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 0); \
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 0)); \
3786,3788c3805,3807
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int8x8_t) __builtin_neon_splat_lane_v((int8x8_t)__rev0, __p1, 0); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 0)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
3794c3813
< __ret = (int8x8_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 0); \
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 0)); \
3802c3821
< __ret = (float64x1_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 10); \
---
> __ret = __builtin_bit_cast(float64x1_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 10)); \
3809c3828
< __ret = (float32x2_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 9); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 9)); \
3816,3818c3835,3837
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (float32x2_t) __builtin_neon_splat_lane_v((int8x8_t)__rev0, __p1, 9); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 9)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
3824c3843
< __ret = (float32x2_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 9); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 9)); \
3833c3852
< __ret = (float16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 8); \
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 8)); \
3840,3842c3859,3861
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (float16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__rev0, __p1, 8); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 8)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
3848c3867
< __ret = (float16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 8); \
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 8)); \
3857c3876
< __ret = (int32x2_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 2); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 2)); \
3864,3866c3883,3885
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (int32x2_t) __builtin_neon_splat_lane_v((int8x8_t)__rev0, __p1, 2); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 2)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
3872c3891
< __ret = (int32x2_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 2); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 2)); \
3880c3899
< __ret = (int64x1_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 3); \
---
> __ret = __builtin_bit_cast(int64x1_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 3)); \
3887c3906
< __ret = (int16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 1); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 1)); \
3894,3896c3913,3915
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (int16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__rev0, __p1, 1); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 1)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
3902c3921
< __ret = (int16x4_t) __builtin_neon_splat_lane_v((int8x8_t)__s0, __p1, 1); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_splat_lane_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 1)); \
3911c3930
< __ret = (poly8x8_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 36); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 36)); \
3918,3920c3937,3939
< poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly8x8_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 36); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 36)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
3926c3945
< __ret = (poly8x8_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 36); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 36)); \
3935c3954
< __ret = (poly64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 38); \
---
> __ret = __builtin_bit_cast(poly64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 38)); \
3942,3943c3961,3962
< poly64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (poly64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 38); \
---
> poly64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(poly64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 38)); \
3949c3968
< __ret = (poly64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 38); \
---
> __ret = __builtin_bit_cast(poly64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 38)); \
3958c3977
< __ret = (poly16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 37); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 37)); \
3965,3967c3984,3986
< poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 37); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 37)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
3973c3992
< __ret = (poly16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 37); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 37)); \
3982c4001
< __ret = (poly8x16_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 36); \
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 36)); \
3989,3991c4008,4010
< poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly8x16_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 36); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 36)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
3997c4016
< __ret = (poly8x16_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 36); \
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 36)); \
4006c4025
< __ret = (poly64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 38); \
---
> __ret = __builtin_bit_cast(poly64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 38)); \
4013,4015c4032,4034
< poly64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (poly64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 38); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> poly64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(poly64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 38)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
4021c4040
< __ret = (poly64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 38); \
---
> __ret = __builtin_bit_cast(poly64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 38)); \
4030c4049
< __ret = (poly16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 37); \
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 37)); \
4037,4039c4056,4058
< poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 37); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 37)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
4045c4064
< __ret = (poly16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 37); \
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 37)); \
4054c4073
< __ret = (uint8x16_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 48); \
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 48)); \
4061,4063c4080,4082
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint8x16_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 48); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 48)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
4069c4088
< __ret = (uint8x16_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 48); \
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 48)); \
4078c4097
< __ret = (uint32x4_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 50); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 50)); \
4085,4087c4104,4106
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (uint32x4_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 50); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 50)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
4093c4112
< __ret = (uint32x4_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 50); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 50)); \
4102c4121
< __ret = (uint64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 51); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 51)); \
4109,4111c4128,4130
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (uint64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 51); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 51)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
4117c4136
< __ret = (uint64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 51); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 51)); \
4126c4145
< __ret = (uint16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 49); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 49)); \
4133,4135c4152,4154
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 49); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 49)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
4141c4160
< __ret = (uint16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 49); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 49)); \
4150c4169
< __ret = (int8x16_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 32); \
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 32)); \
4157,4159c4176,4178
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int8x16_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 32); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 32)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
4165c4184
< __ret = (int8x16_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 32); \
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 32)); \
4174c4193
< __ret = (float64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 42); \
---
> __ret = __builtin_bit_cast(float64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 42)); \
4181,4183c4200,4202
< float64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (float64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 42); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> float64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(float64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 42)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
4189c4208
< __ret = (float64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 42); \
---
> __ret = __builtin_bit_cast(float64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 42)); \
4198c4217
< __ret = (float32x4_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 41); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 41)); \
4205,4207c4224,4226
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (float32x4_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 41); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 41)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
4213c4232
< __ret = (float32x4_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 41); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 41)); \
4222c4241
< __ret = (float16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 40); \
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 40)); \
4229,4231c4248,4250
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (float16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 40); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 40)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
4237c4256
< __ret = (float16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 40); \
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 40)); \
4246c4265
< __ret = (int32x4_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 34); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 34)); \
4253,4255c4272,4274
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (int32x4_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 34); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 34)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
4261c4280
< __ret = (int32x4_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 34); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 34)); \
4270c4289
< __ret = (int64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 35); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 35)); \
4277,4279c4296,4298
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (int64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 35); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 35)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
4285c4304
< __ret = (int64x2_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 35); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 35)); \
4294c4313
< __ret = (int16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 33); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 33)); \
4301,4303c4320,4322
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__rev0, __p1, 33); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 33)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
4309c4328
< __ret = (int16x8_t) __builtin_neon_splatq_laneq_v((int8x16_t)__s0, __p1, 33); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_splatq_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 33)); \
4318c4337
< __ret = (uint8x8_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 48); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 48)); \
4325,4327c4344,4346
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint8x8_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 48); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 48)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
4333c4352
< __ret = (uint8x8_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 48); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 48)); \
4342c4361
< __ret = (uint32x2_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 50); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 50)); \
4349,4351c4368,4370
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (uint32x2_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 50); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 50)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
4357c4376
< __ret = (uint32x2_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 50); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 50)); \
4366c4385
< __ret = (uint64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 51); \
---
> __ret = __builtin_bit_cast(uint64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 51)); \
4373,4374c4392,4393
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (uint64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 51); \
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(uint64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 51)); \
4380c4399
< __ret = (uint64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 51); \
---
> __ret = __builtin_bit_cast(uint64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 51)); \
4389c4408
< __ret = (uint16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 49); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 49)); \
4396,4398c4415,4417
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 49); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 49)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
4404c4423
< __ret = (uint16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 49); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 49)); \
4413c4432
< __ret = (int8x8_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 32); \
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 32)); \
4420,4422c4439,4441
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int8x8_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 32); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 32)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
4428c4447
< __ret = (int8x8_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 32); \
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 32)); \
4437c4456
< __ret = (float64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 42); \
---
> __ret = __builtin_bit_cast(float64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 42)); \
4444,4445c4463,4464
< float64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (float64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 42); \
---
> float64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(float64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 42)); \
4451c4470
< __ret = (float64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 42); \
---
> __ret = __builtin_bit_cast(float64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 42)); \
4460c4479
< __ret = (float32x2_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 41); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 41)); \
4467,4469c4486,4488
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (float32x2_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 41); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 41)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
4475c4494
< __ret = (float32x2_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 41); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 41)); \
4484c4503
< __ret = (float16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 40); \
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 40)); \
4491,4493c4510,4512
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (float16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 40); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 40)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
4499c4518
< __ret = (float16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 40); \
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 40)); \
4508c4527
< __ret = (int32x2_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 34); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 34)); \
4515,4517c4534,4536
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (int32x2_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 34); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 34)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
4523c4542
< __ret = (int32x2_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 34); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 34)); \
4532c4551
< __ret = (int64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 35); \
---
> __ret = __builtin_bit_cast(int64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 35)); \
4539,4540c4558,4559
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (int64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 35); \
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(int64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 35)); \
4546c4565
< __ret = (int64x1_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 35); \
---
> __ret = __builtin_bit_cast(int64x1_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 35)); \
4555c4574
< __ret = (int16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 33); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 33)); \
4562,4564c4581,4583
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__rev0, __p1, 33); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 33)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
4570c4589
< __ret = (int16x4_t) __builtin_neon_splat_laneq_v((int8x16_t)__s0, __p1, 33); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_splat_laneq_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 33)); \
4578c4597
< __ret = (uint8x16_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 48);
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 48));
4584,4587c4603,4606
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t) __builtin_neon_vabdq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 48);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 48));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
4592c4611
< __ret = (uint8x16_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 48);
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 48));
4600c4619
< __ret = (uint32x4_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 50));
4606,4609c4625,4628
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vabdq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
4614c4633
< __ret = (uint32x4_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 50));
4622c4641
< __ret = (uint16x8_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 49));
4628,4631c4647,4650
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vabdq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
4636c4655
< __ret = (uint16x8_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 49));
4644c4663
< __ret = (int8x16_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 32);
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 32));
4650,4653c4669,4672
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x16_t) __builtin_neon_vabdq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 32);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 32));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
4658c4677
< __ret = (int8x16_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 32);
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 32));
4666c4685
< __ret = (float32x4_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 41);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 41));
4672,4675c4691,4694
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (float32x4_t) __builtin_neon_vabdq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 41);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 41));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
4683c4702
< __ret = (int32x4_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 34));
4689,4692c4708,4711
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vabdq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
4697c4716
< __ret = (int32x4_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 34));
4705c4724
< __ret = (int16x8_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 33));
4711,4714c4730,4733
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vabdq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
4719c4738
< __ret = (int16x8_t) __builtin_neon_vabdq_v((int8x16_t)__p0, (int8x16_t)__p1, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vabdq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 33));
4727c4746
< __ret = (uint8x8_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 16);
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 16));
4733,4736c4752,4755
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t) __builtin_neon_vabd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 16);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 16));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
4741c4760
< __ret = (uint8x8_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 16);
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 16));
4749c4768
< __ret = (uint32x2_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 18));
4755,4758c4774,4777
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vabd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
4763c4782
< __ret = (uint32x2_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 18));
4771c4790
< __ret = (uint16x4_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 17));
4777,4780c4796,4799
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vabd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
4785c4804
< __ret = (uint16x4_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 17));
4793c4812
< __ret = (int8x8_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 0));
4799,4802c4818,4821
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x8_t) __builtin_neon_vabd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 0);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 0));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
4807c4826
< __ret = (int8x8_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 0));
4815c4834
< __ret = (float32x2_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 9);
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 9));
4821,4824c4840,4843
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (float32x2_t) __builtin_neon_vabd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 9);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 9));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
4832c4851
< __ret = (int32x2_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 2));
4838,4841c4857,4860
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vabd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
4846c4865
< __ret = (int32x2_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 2));
4854c4873
< __ret = (int16x4_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 1));
4860,4863c4879,4882
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vabd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
4868c4887
< __ret = (int16x4_t) __builtin_neon_vabd_v((int8x8_t)__p0, (int8x8_t)__p1, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vabd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 1));
4876c4895
< __ret = (int8x16_t) __builtin_neon_vabsq_v((int8x16_t)__p0, 32);
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vabsq_v(__builtin_bit_cast(int8x16_t, __p0), 32));
4882,4884c4901,4903
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x16_t) __builtin_neon_vabsq_v((int8x16_t)__rev0, 32);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vabsq_v(__builtin_bit_cast(int8x16_t, __rev0), 32));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
4892c4911
< __ret = (float32x4_t) __builtin_neon_vabsq_v((int8x16_t)__p0, 41);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vabsq_v(__builtin_bit_cast(int8x16_t, __p0), 41));
4898,4900c4917,4919
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (float32x4_t) __builtin_neon_vabsq_v((int8x16_t)__rev0, 41);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vabsq_v(__builtin_bit_cast(int8x16_t, __rev0), 41));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
4908c4927
< __ret = (int32x4_t) __builtin_neon_vabsq_v((int8x16_t)__p0, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vabsq_v(__builtin_bit_cast(int8x16_t, __p0), 34));
4914,4916c4933,4935
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vabsq_v((int8x16_t)__rev0, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vabsq_v(__builtin_bit_cast(int8x16_t, __rev0), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
4924c4943
< __ret = (int16x8_t) __builtin_neon_vabsq_v((int8x16_t)__p0, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vabsq_v(__builtin_bit_cast(int8x16_t, __p0), 33));
4930,4932c4949,4951
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vabsq_v((int8x16_t)__rev0, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vabsq_v(__builtin_bit_cast(int8x16_t, __rev0), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
4940c4959
< __ret = (int8x8_t) __builtin_neon_vabs_v((int8x8_t)__p0, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vabs_v(__builtin_bit_cast(int8x8_t, __p0), 0));
4946,4948c4965,4967
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x8_t) __builtin_neon_vabs_v((int8x8_t)__rev0, 0);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vabs_v(__builtin_bit_cast(int8x8_t, __rev0), 0));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
4956c4975
< __ret = (float32x2_t) __builtin_neon_vabs_v((int8x8_t)__p0, 9);
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vabs_v(__builtin_bit_cast(int8x8_t, __p0), 9));
4962,4964c4981,4983
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< __ret = (float32x2_t) __builtin_neon_vabs_v((int8x8_t)__rev0, 9);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vabs_v(__builtin_bit_cast(int8x8_t, __rev0), 9));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
4972c4991
< __ret = (int32x2_t) __builtin_neon_vabs_v((int8x8_t)__p0, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vabs_v(__builtin_bit_cast(int8x8_t, __p0), 2));
4978,4980c4997,4999
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vabs_v((int8x8_t)__rev0, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vabs_v(__builtin_bit_cast(int8x8_t, __rev0), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
4988c5007
< __ret = (int16x4_t) __builtin_neon_vabs_v((int8x8_t)__p0, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vabs_v(__builtin_bit_cast(int8x8_t, __p0), 1));
4994,4996c5013,5015
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vabs_v((int8x8_t)__rev0, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vabs_v(__builtin_bit_cast(int8x8_t, __rev0), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
5010,5011c5029,5030
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
5013c5032
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
5027,5028c5046,5047
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
5030c5049
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
5044,5045c5063,5064
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
5047c5066
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
5061,5062c5080,5081
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
5064c5083
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
5078,5079c5097,5098
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
5081c5100
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
5095,5096c5114,5115
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
5098c5117
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
5112,5113c5131,5132
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
5115c5134
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
5129,5130c5148,5149
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
5132c5151
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
5146,5147c5165,5166
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
5149c5168
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
5163,5164c5182,5183
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
5166c5185
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
5180,5181c5199,5200
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
5183c5202
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
5202,5203c5221,5222
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
5205c5224
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
5219,5220c5238,5239
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
5222c5241
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
5236,5237c5255,5256
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
5239c5258
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
5253,5254c5272,5273
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
5256c5275
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
5275,5276c5294,5295
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
5278c5297
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
5286c5305
< __ret = (poly8x8_t) __builtin_neon_vadd_v((int8x8_t)__p0, (int8x8_t)__p1, 4);
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vadd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 4));
5292,5295c5311,5314
< poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (poly8x8_t) __builtin_neon_vadd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 4);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vadd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 4));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
5302c5321
< __ret = (poly64x1_t) __builtin_neon_vadd_v((int8x8_t)__p0, (int8x8_t)__p1, 6);
---
> __ret = __builtin_bit_cast(poly64x1_t, __builtin_neon_vadd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 6));
5308c5327
< __ret = (poly16x4_t) __builtin_neon_vadd_v((int8x8_t)__p0, (int8x8_t)__p1, 5);
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vadd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 5));
5314,5317c5333,5336
< poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< poly16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (poly16x4_t) __builtin_neon_vadd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 5);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> poly16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vadd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 5));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
5325c5344
< __ret = (poly8x16_t) __builtin_neon_vaddq_v((int8x16_t)__p0, (int8x16_t)__p1, 36);
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vaddq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 36));
5331,5334c5350,5353
< poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< poly8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (poly8x16_t) __builtin_neon_vaddq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 36);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> poly8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vaddq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 36));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
5342c5361
< __ret = (poly64x2_t) __builtin_neon_vaddq_v((int8x16_t)__p0, (int8x16_t)__p1, 38);
---
> __ret = __builtin_bit_cast(poly64x2_t, __builtin_neon_vaddq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 38));
5348,5351c5367,5370
< poly64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< poly64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (poly64x2_t) __builtin_neon_vaddq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 38);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> poly64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> poly64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
> __ret = __builtin_bit_cast(poly64x2_t, __builtin_neon_vaddq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 38));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
5359c5378
< __ret = (poly16x8_t) __builtin_neon_vaddq_v((int8x16_t)__p0, (int8x16_t)__p1, 37);
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vaddq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 37));
5365,5368c5384,5387
< poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< poly16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (poly16x8_t) __builtin_neon_vaddq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 37);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> poly16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vaddq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 37));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
5376c5395
< __ret = (uint16x4_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 17));
5382,5385c5401,5404
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vaddhn_v((int8x16_t)__rev0, (int8x16_t)__rev1, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
5390c5409
< __ret = (uint16x4_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 17));
5398c5417
< __ret = (uint32x2_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 18));
5404,5407c5423,5426
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vaddhn_v((int8x16_t)__rev0, (int8x16_t)__rev1, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
5412c5431
< __ret = (uint32x2_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 18));
5420c5439
< __ret = (uint8x8_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 16);
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 16));
5426,5429c5445,5448
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t) __builtin_neon_vaddhn_v((int8x16_t)__rev0, (int8x16_t)__rev1, 16);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 16));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
5434c5453
< __ret = (uint8x8_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 16);
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 16));
5442c5461
< __ret = (int16x4_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 1));
5448,5451c5467,5470
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vaddhn_v((int8x16_t)__rev0, (int8x16_t)__rev1, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
5456c5475
< __ret = (int16x4_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 1));
5464c5483
< __ret = (int32x2_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 2));
5470,5473c5489,5492
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vaddhn_v((int8x16_t)__rev0, (int8x16_t)__rev1, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
5478c5497
< __ret = (int32x2_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 2));
5486c5505
< __ret = (int8x8_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 0));
5492,5495c5511,5514
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x8_t) __builtin_neon_vaddhn_v((int8x16_t)__rev0, (int8x16_t)__rev1, 0);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 0));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
5500c5519
< __ret = (int8x8_t) __builtin_neon_vaddhn_v((int8x16_t)__p0, (int8x16_t)__p1, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vaddhn_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 0));
5514,5515c5533,5534
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
5517c5536
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
5531,5532c5550,5551
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
5534c5553
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
5548,5549c5567,5568
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
5551c5570
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
5565,5566c5584,5585
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
5568c5587
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
5582,5583c5601,5602
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
5585c5604
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
5599,5600c5618,5619
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
5602c5621
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
5616,5617c5635,5636
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
5619c5638
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
5633,5634c5652,5653
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
5636c5655
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
5650,5651c5669,5670
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
5653c5672
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
5667,5668c5686,5687
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
5670c5689
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
5689,5690c5708,5709
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
5692c5711
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
5706,5707c5725,5726
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
5709c5728
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
5723,5724c5742,5743
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
5726c5745
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
5745,5746c5764,5765
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
5748c5767
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
5762,5763c5781,5782
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
5765c5784
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
5779,5780c5798,5799
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
5782c5801
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
5796,5797c5815,5816
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
5799c5818
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
5813,5814c5832,5833
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
5816c5835
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
5830,5831c5849,5850
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
5833c5852
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
5847,5848c5866,5867
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
5850c5869
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
5864,5865c5883,5884
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
5867c5886
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
5881,5882c5900,5901
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
5884c5903
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
5898,5899c5917,5918
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
5901c5920
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
5915,5916c5934,5935
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
5918c5937
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
5937,5938c5956,5957
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
5940c5959
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
5954,5955c5973,5974
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
5957c5976
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
5971,5972c5990,5991
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
5974c5993
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
5993,5994c6012,6013
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
5996c6015
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
6004c6023
< __ret = (poly8x8_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 4);
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 4));
6010,6014c6029,6033
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< poly8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (poly8x8_t) __builtin_neon_vbsl_v((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 4);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> poly8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 4));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
6022c6041
< __ret = (poly16x4_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 5);
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 5));
6028,6032c6047,6051
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< poly16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< poly16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 3, 2, 1, 0);
< __ret = (poly16x4_t) __builtin_neon_vbsl_v((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 5);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> poly16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> poly16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 5));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
6040c6059
< __ret = (poly8x16_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 36);
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 36));
6046,6050c6065,6069
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< poly8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< poly8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (poly8x16_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 36);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> poly8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> poly8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 36));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
6058c6077
< __ret = (poly16x8_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 37);
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 37));
6064,6068c6083,6087
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< poly16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< poly16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (poly16x8_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 37);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> poly16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> poly16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 37));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
6076c6095
< __ret = (uint8x16_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 48);
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 48));
6082,6086c6101,6105
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 48);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> uint8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 48));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
6094c6113
< __ret = (uint32x4_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 50));
6100,6104c6119,6123
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< uint32x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> uint32x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6112c6131
< __ret = (uint64x2_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 51);
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 51));
6118,6122c6137,6141
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< uint64x2_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 1, 0);
< __ret = (uint64x2_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 51);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
> uint64x2_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_64);
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 51));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
6130c6149
< __ret = (uint16x8_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 49));
6136,6140c6155,6159
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> uint16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
6148c6167
< __ret = (int8x16_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 32);
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 32));
6154,6158c6173,6177
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x16_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 32);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> int8x16_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 32));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
6166c6185
< __ret = (float32x4_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 41);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 41));
6172,6176c6191,6195
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< float32x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 3, 2, 1, 0);
< __ret = (float32x4_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 41);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> float32x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 41));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6184c6203
< __ret = (int32x4_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 34));
6190,6194c6209,6213
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< int32x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> int32x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6202c6221
< __ret = (int64x2_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 35);
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 35));
6208,6212c6227,6231
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< int64x2_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 1, 0);
< __ret = (int64x2_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 35);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
> int64x2_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_64);
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 35));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
6220c6239
< __ret = (int16x8_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 33));
6226,6230c6245,6249
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> int16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
6238c6257
< __ret = (uint8x8_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 16);
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 16));
6244,6248c6263,6267
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t) __builtin_neon_vbsl_v((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 16);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> uint8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 16));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
6256c6275
< __ret = (uint32x2_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 18));
6262,6266c6281,6285
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< uint32x2_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vbsl_v((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> uint32x2_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6273c6292
< __ret = (uint64x1_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 19);
---
> __ret = __builtin_bit_cast(uint64x1_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 19));
6279c6298
< __ret = (uint16x4_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 17));
6285,6289c6304,6308
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< uint16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vbsl_v((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> uint16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
6297c6316
< __ret = (int8x8_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 0));
6303,6307c6322,6326
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x8_t) __builtin_neon_vbsl_v((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 0);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> int8x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 0));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
6315c6334
< __ret = (float32x2_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 9);
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 9));
6321,6325c6340,6344
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< float32x2_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 1, 0);
< __ret = (float32x2_t) __builtin_neon_vbsl_v((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 9);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> float32x2_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 9));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6333c6352
< __ret = (int32x2_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 2));
6339,6343c6358,6362
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< int32x2_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vbsl_v((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> int32x2_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6350c6369
< __ret = (int64x1_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 3);
---
> __ret = __builtin_bit_cast(int64x1_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 3));
6356c6375
< __ret = (int16x4_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 1));
6362,6366c6381,6385
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< int16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vbsl_v((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> int16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
6374c6393
< __ret = (float16x8_t) __builtin_neon_vbslq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 40);
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), __builtin_bit_cast(int8x16_t, __p2), 40));
6380,6384c6399,6403
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< float16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (float16x8_t) __builtin_neon_vbslq_v((int8x16_t)__rev0, (int8x16_t)__rev1, (int8x16_t)__rev2, 40);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> float16x8_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vbslq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __builtin_bit_cast(int8x16_t, __rev2), 40));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
6392c6411
< __ret = (float16x4_t) __builtin_neon_vbsl_v((int8x8_t)__p0, (int8x8_t)__p1, (int8x8_t)__p2, 8);
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), __builtin_bit_cast(int8x8_t, __p2), 8));
6398,6402c6417,6421
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< float16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, 3, 2, 1, 0);
< __ret = (float16x4_t) __builtin_neon_vbsl_v((int8x8_t)__rev0, (int8x8_t)__rev1, (int8x8_t)__rev2, 8);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> float16x4_t __rev2; __rev2 = __builtin_shufflevector(__p2, __p2, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vbsl_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __builtin_bit_cast(int8x8_t, __rev2), 8));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
6410c6429
< __ret = (uint32x4_t) __builtin_neon_vcageq_v((int8x16_t)__p0, (int8x16_t)__p1, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcageq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 50));
6416,6419c6435,6438
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vcageq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcageq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6427c6446
< __ret = (uint32x2_t) __builtin_neon_vcage_v((int8x8_t)__p0, (int8x8_t)__p1, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcage_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 18));
6433,6436c6452,6455
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vcage_v((int8x8_t)__rev0, (int8x8_t)__rev1, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcage_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6444c6463
< __ret = (uint32x4_t) __builtin_neon_vcagtq_v((int8x16_t)__p0, (int8x16_t)__p1, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcagtq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 50));
6450,6453c6469,6472
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vcagtq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcagtq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6461c6480
< __ret = (uint32x2_t) __builtin_neon_vcagt_v((int8x8_t)__p0, (int8x8_t)__p1, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcagt_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 18));
6467,6470c6486,6489
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vcagt_v((int8x8_t)__rev0, (int8x8_t)__rev1, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcagt_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6478c6497
< __ret = (uint32x4_t) __builtin_neon_vcaleq_v((int8x16_t)__p0, (int8x16_t)__p1, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcaleq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 50));
6484,6487c6503,6506
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vcaleq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcaleq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6495c6514
< __ret = (uint32x2_t) __builtin_neon_vcale_v((int8x8_t)__p0, (int8x8_t)__p1, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcale_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 18));
6501,6504c6520,6523
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vcale_v((int8x8_t)__rev0, (int8x8_t)__rev1, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcale_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6512c6531
< __ret = (uint32x4_t) __builtin_neon_vcaltq_v((int8x16_t)__p0, (int8x16_t)__p1, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcaltq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 50));
6518,6521c6537,6540
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vcaltq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcaltq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6529c6548
< __ret = (uint32x2_t) __builtin_neon_vcalt_v((int8x8_t)__p0, (int8x8_t)__p1, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcalt_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 18));
6535,6538c6554,6557
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vcalt_v((int8x8_t)__rev0, (int8x8_t)__rev1, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcalt_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6546c6565
< __ret = (uint8x8_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 == __p1);
6552,6555c6571,6574
< poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
6563c6582
< __ret = (uint8x16_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 == __p1);
6569,6572c6588,6591
< poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< poly8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> poly8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
6580c6599
< __ret = (uint8x16_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 == __p1);
6586,6589c6605,6608
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
6597c6616
< __ret = (uint32x4_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 == __p1);
6603,6606c6622,6625
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6614c6633
< __ret = (uint16x8_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 == __p1);
6620,6623c6639,6642
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
6631c6650
< __ret = (uint8x16_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 == __p1);
6637,6640c6656,6659
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
6648c6667
< __ret = (uint32x4_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 == __p1);
6654,6657c6673,6676
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6665c6684
< __ret = (uint32x4_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 == __p1);
6671,6674c6690,6693
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6682c6701
< __ret = (uint16x8_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 == __p1);
6688,6691c6707,6710
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
6699c6718
< __ret = (uint8x8_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 == __p1);
6705,6708c6724,6727
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
6716c6735
< __ret = (uint32x2_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 == __p1);
6722,6725c6741,6744
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6733c6752
< __ret = (uint16x4_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 == __p1);
6739,6742c6758,6761
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
6750c6769
< __ret = (uint8x8_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 == __p1);
6756,6759c6775,6778
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
6767c6786
< __ret = (uint32x2_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 == __p1);
6773,6776c6792,6795
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6784c6803
< __ret = (uint32x2_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 == __p1);
6790,6793c6809,6812
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6801c6820
< __ret = (uint16x4_t)(__p0 == __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 == __p1);
6807,6810c6826,6829
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 == __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 == __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
6818c6837
< __ret = (uint8x16_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 >= __p1);
6824,6827c6843,6846
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
6835c6854
< __ret = (uint32x4_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 >= __p1);
6841,6844c6860,6863
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6852c6871
< __ret = (uint16x8_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 >= __p1);
6858,6861c6877,6880
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
6869c6888
< __ret = (uint8x16_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 >= __p1);
6875,6878c6894,6897
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
6886c6905
< __ret = (uint32x4_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 >= __p1);
6892,6895c6911,6914
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6903c6922
< __ret = (uint32x4_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 >= __p1);
6909,6912c6928,6931
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
6920c6939
< __ret = (uint16x8_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 >= __p1);
6926,6929c6945,6948
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
6937c6956
< __ret = (uint8x8_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 >= __p1);
6943,6946c6962,6965
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
6954c6973
< __ret = (uint32x2_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 >= __p1);
6960,6963c6979,6982
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
6971c6990
< __ret = (uint16x4_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 >= __p1);
6977,6980c6996,6999
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
6988c7007
< __ret = (uint8x8_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 >= __p1);
6994,6997c7013,7016
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
7005c7024
< __ret = (uint32x2_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 >= __p1);
7011,7014c7030,7033
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7022c7041
< __ret = (uint32x2_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 >= __p1);
7028,7031c7047,7050
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7039c7058
< __ret = (uint16x4_t)(__p0 >= __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 >= __p1);
7045,7048c7064,7067
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 >= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 >= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
7056c7075
< __ret = (uint8x16_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 > __p1);
7062,7065c7081,7084
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
7073c7092
< __ret = (uint32x4_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 > __p1);
7079,7082c7098,7101
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7090c7109
< __ret = (uint16x8_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 > __p1);
7096,7099c7115,7118
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
7107c7126
< __ret = (uint8x16_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 > __p1);
7113,7116c7132,7135
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
7124c7143
< __ret = (uint32x4_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 > __p1);
7130,7133c7149,7152
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7141c7160
< __ret = (uint32x4_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 > __p1);
7147,7150c7166,7169
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7158c7177
< __ret = (uint16x8_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 > __p1);
7164,7167c7183,7186
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
7175c7194
< __ret = (uint8x8_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 > __p1);
7181,7184c7200,7203
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
7192c7211
< __ret = (uint32x2_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 > __p1);
7198,7201c7217,7220
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7209c7228
< __ret = (uint16x4_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 > __p1);
7215,7218c7234,7237
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
7226c7245
< __ret = (uint8x8_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 > __p1);
7232,7235c7251,7254
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
7243c7262
< __ret = (uint32x2_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 > __p1);
7249,7252c7268,7271
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7260c7279
< __ret = (uint32x2_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 > __p1);
7266,7269c7285,7288
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7277c7296
< __ret = (uint16x4_t)(__p0 > __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 > __p1);
7283,7286c7302,7305
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 > __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 > __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
7294c7313
< __ret = (uint8x16_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 <= __p1);
7300,7303c7319,7322
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
7311c7330
< __ret = (uint32x4_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 <= __p1);
7317,7320c7336,7339
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7328c7347
< __ret = (uint16x8_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 <= __p1);
7334,7337c7353,7356
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
7345c7364
< __ret = (uint8x16_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 <= __p1);
7351,7354c7370,7373
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
7362c7381
< __ret = (uint32x4_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 <= __p1);
7368,7371c7387,7390
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7379c7398
< __ret = (uint32x4_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 <= __p1);
7385,7388c7404,7407
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7396c7415
< __ret = (uint16x8_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 <= __p1);
7402,7405c7421,7424
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
7413c7432
< __ret = (uint8x8_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 <= __p1);
7419,7422c7438,7441
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
7430c7449
< __ret = (uint32x2_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 <= __p1);
7436,7439c7455,7458
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7447c7466
< __ret = (uint16x4_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 <= __p1);
7453,7456c7472,7475
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
7464c7483
< __ret = (uint8x8_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 <= __p1);
7470,7473c7489,7492
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
7481c7500
< __ret = (uint32x2_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 <= __p1);
7487,7490c7506,7509
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7498c7517
< __ret = (uint32x2_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 <= __p1);
7504,7507c7523,7526
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7515c7534
< __ret = (uint16x4_t)(__p0 <= __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 <= __p1);
7521,7524c7540,7543
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 <= __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 <= __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
7532c7551
< __ret = (int8x16_t) __builtin_neon_vclsq_v((int8x16_t)__p0, 32);
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __p0), 32));
7538,7540c7557,7559
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x16_t) __builtin_neon_vclsq_v((int8x16_t)__rev0, 32);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __rev0), 32));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
7548c7567
< __ret = (int32x4_t) __builtin_neon_vclsq_v((int8x16_t)__p0, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __p0), 34));
7554,7556c7573,7575
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vclsq_v((int8x16_t)__rev0, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __rev0), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7564c7583
< __ret = (int16x8_t) __builtin_neon_vclsq_v((int8x16_t)__p0, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __p0), 33));
7570,7572c7589,7591
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vclsq_v((int8x16_t)__rev0, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __rev0), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
7580c7599
< __ret = (int8x16_t) __builtin_neon_vclsq_v((int8x16_t)__p0, 32);
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __p0), 32));
7586,7588c7605,7607
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x16_t) __builtin_neon_vclsq_v((int8x16_t)__rev0, 32);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __rev0), 32));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
7596c7615
< __ret = (int32x4_t) __builtin_neon_vclsq_v((int8x16_t)__p0, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __p0), 34));
7602,7604c7621,7623
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vclsq_v((int8x16_t)__rev0, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __rev0), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7612c7631
< __ret = (int16x8_t) __builtin_neon_vclsq_v((int8x16_t)__p0, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __p0), 33));
7618,7620c7637,7639
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vclsq_v((int8x16_t)__rev0, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vclsq_v(__builtin_bit_cast(int8x16_t, __rev0), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
7628c7647
< __ret = (int8x8_t) __builtin_neon_vcls_v((int8x8_t)__p0, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __p0), 0));
7634,7636c7653,7655
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x8_t) __builtin_neon_vcls_v((int8x8_t)__rev0, 0);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __rev0), 0));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
7644c7663
< __ret = (int32x2_t) __builtin_neon_vcls_v((int8x8_t)__p0, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __p0), 2));
7650,7652c7669,7671
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vcls_v((int8x8_t)__rev0, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __rev0), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7660c7679
< __ret = (int16x4_t) __builtin_neon_vcls_v((int8x8_t)__p0, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __p0), 1));
7666,7668c7685,7687
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vcls_v((int8x8_t)__rev0, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __rev0), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
7676c7695
< __ret = (int8x8_t) __builtin_neon_vcls_v((int8x8_t)__p0, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __p0), 0));
7682,7684c7701,7703
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x8_t) __builtin_neon_vcls_v((int8x8_t)__rev0, 0);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __rev0), 0));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
7692c7711
< __ret = (int32x2_t) __builtin_neon_vcls_v((int8x8_t)__p0, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __p0), 2));
7698,7700c7717,7719
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vcls_v((int8x8_t)__rev0, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __rev0), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7708c7727
< __ret = (int16x4_t) __builtin_neon_vcls_v((int8x8_t)__p0, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __p0), 1));
7714,7716c7733,7735
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vcls_v((int8x8_t)__rev0, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vcls_v(__builtin_bit_cast(int8x8_t, __rev0), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
7724c7743
< __ret = (uint8x16_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 < __p1);
7730,7733c7749,7752
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
7741c7760
< __ret = (uint32x4_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 < __p1);
7747,7750c7766,7769
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7758c7777
< __ret = (uint16x8_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 < __p1);
7764,7767c7783,7786
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
7775c7794
< __ret = (uint8x16_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint8x16_t, __p0 < __p1);
7781,7784c7800,7803
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
7792c7811
< __ret = (uint32x4_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 < __p1);
7798,7801c7817,7820
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7809c7828
< __ret = (uint32x4_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint32x4_t, __p0 < __p1);
7815,7818c7834,7837
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7826c7845
< __ret = (uint16x8_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint16x8_t, __p0 < __p1);
7832,7835c7851,7854
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
7843c7862
< __ret = (uint8x8_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 < __p1);
7849,7852c7868,7871
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
7860c7879
< __ret = (uint32x2_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 < __p1);
7866,7869c7885,7888
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7877c7896
< __ret = (uint16x4_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 < __p1);
7883,7886c7902,7905
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
7894c7913
< __ret = (uint8x8_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint8x8_t, __p0 < __p1);
7900,7903c7919,7922
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
7911c7930
< __ret = (uint32x2_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 < __p1);
7917,7920c7936,7939
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7928c7947
< __ret = (uint32x2_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint32x2_t, __p0 < __p1);
7934,7937c7953,7956
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
7945c7964
< __ret = (uint16x4_t)(__p0 < __p1);
---
> __ret = __builtin_bit_cast(uint16x4_t, __p0 < __p1);
7951,7954c7970,7973
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t)(__rev0 < __rev1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __rev0 < __rev1);
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
7962c7981
< __ret = (uint8x16_t) __builtin_neon_vclzq_v((int8x16_t)__p0, 48);
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __p0), 48));
7968,7970c7987,7989
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t) __builtin_neon_vclzq_v((int8x16_t)__rev0, 48);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __rev0), 48));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
7978c7997
< __ret = (uint32x4_t) __builtin_neon_vclzq_v((int8x16_t)__p0, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __p0), 50));
7984,7986c8003,8005
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vclzq_v((int8x16_t)__rev0, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __rev0), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
7994c8013
< __ret = (uint16x8_t) __builtin_neon_vclzq_v((int8x16_t)__p0, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __p0), 49));
8000,8002c8019,8021
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vclzq_v((int8x16_t)__rev0, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __rev0), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
8010c8029
< __ret = (int8x16_t) __builtin_neon_vclzq_v((int8x16_t)__p0, 32);
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __p0), 32));
8016,8018c8035,8037
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x16_t) __builtin_neon_vclzq_v((int8x16_t)__rev0, 32);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __rev0), 32));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
8026c8045
< __ret = (int32x4_t) __builtin_neon_vclzq_v((int8x16_t)__p0, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __p0), 34));
8032,8034c8051,8053
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vclzq_v((int8x16_t)__rev0, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __rev0), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
8042c8061
< __ret = (int16x8_t) __builtin_neon_vclzq_v((int8x16_t)__p0, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __p0), 33));
8048,8050c8067,8069
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vclzq_v((int8x16_t)__rev0, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vclzq_v(__builtin_bit_cast(int8x16_t, __rev0), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
8058c8077
< __ret = (uint8x8_t) __builtin_neon_vclz_v((int8x8_t)__p0, 16);
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __p0), 16));
8064,8066c8083,8085
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t) __builtin_neon_vclz_v((int8x8_t)__rev0, 16);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __rev0), 16));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
8074c8093
< __ret = (uint32x2_t) __builtin_neon_vclz_v((int8x8_t)__p0, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __p0), 18));
8080,8082c8099,8101
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vclz_v((int8x8_t)__rev0, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __rev0), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
8090c8109
< __ret = (uint16x4_t) __builtin_neon_vclz_v((int8x8_t)__p0, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __p0), 17));
8096,8098c8115,8117
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vclz_v((int8x8_t)__rev0, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __rev0), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
8106c8125
< __ret = (int8x8_t) __builtin_neon_vclz_v((int8x8_t)__p0, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __p0), 0));
8112,8114c8131,8133
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x8_t) __builtin_neon_vclz_v((int8x8_t)__rev0, 0);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __rev0), 0));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
8122c8141
< __ret = (int32x2_t) __builtin_neon_vclz_v((int8x8_t)__p0, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __p0), 2));
8128,8130c8147,8149
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vclz_v((int8x8_t)__rev0, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __rev0), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
8138c8157
< __ret = (int16x4_t) __builtin_neon_vclz_v((int8x8_t)__p0, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __p0), 1));
8144,8146c8163,8165
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vclz_v((int8x8_t)__rev0, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vclz_v(__builtin_bit_cast(int8x8_t, __rev0), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
8154c8173
< __ret = (poly8x8_t) __builtin_neon_vcnt_v((int8x8_t)__p0, 4);
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vcnt_v(__builtin_bit_cast(int8x8_t, __p0), 4));
8160,8162c8179,8181
< poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (poly8x8_t) __builtin_neon_vcnt_v((int8x8_t)__rev0, 4);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vcnt_v(__builtin_bit_cast(int8x8_t, __rev0), 4));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
8170c8189
< __ret = (poly8x16_t) __builtin_neon_vcntq_v((int8x16_t)__p0, 36);
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vcntq_v(__builtin_bit_cast(int8x16_t, __p0), 36));
8176,8178c8195,8197
< poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (poly8x16_t) __builtin_neon_vcntq_v((int8x16_t)__rev0, 36);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vcntq_v(__builtin_bit_cast(int8x16_t, __rev0), 36));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
8186c8205
< __ret = (uint8x16_t) __builtin_neon_vcntq_v((int8x16_t)__p0, 48);
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vcntq_v(__builtin_bit_cast(int8x16_t, __p0), 48));
8192,8194c8211,8213
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t) __builtin_neon_vcntq_v((int8x16_t)__rev0, 48);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vcntq_v(__builtin_bit_cast(int8x16_t, __rev0), 48));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
8202c8221
< __ret = (int8x16_t) __builtin_neon_vcntq_v((int8x16_t)__p0, 32);
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vcntq_v(__builtin_bit_cast(int8x16_t, __p0), 32));
8208,8210c8227,8229
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x16_t) __builtin_neon_vcntq_v((int8x16_t)__rev0, 32);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vcntq_v(__builtin_bit_cast(int8x16_t, __rev0), 32));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
8218c8237
< __ret = (uint8x8_t) __builtin_neon_vcnt_v((int8x8_t)__p0, 16);
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vcnt_v(__builtin_bit_cast(int8x8_t, __p0), 16));
8224,8226c8243,8245
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t) __builtin_neon_vcnt_v((int8x8_t)__rev0, 16);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vcnt_v(__builtin_bit_cast(int8x8_t, __rev0), 16));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
8234c8253
< __ret = (int8x8_t) __builtin_neon_vcnt_v((int8x8_t)__p0, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vcnt_v(__builtin_bit_cast(int8x8_t, __p0), 0));
8240,8242c8259,8261
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x8_t) __builtin_neon_vcnt_v((int8x8_t)__rev0, 0);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vcnt_v(__builtin_bit_cast(int8x8_t, __rev0), 0));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
8256,8257c8275,8276
< poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
8259c8278
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
8273,8274c8292,8293
< poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< poly16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> poly16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
8276c8295
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
8290,8291c8309,8310
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
8293c8312
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
8312,8313c8331,8332
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
8315c8334
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
8335c8354
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
8349,8350c8368,8369
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
8352c8371
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
8371,8372c8390,8391
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
8374c8393
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
8393,8394c8412,8413
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
8396c8415
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
8415,8416c8434,8435
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
8418c8437
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
8437,8438c8456,8457
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
8440c8459
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
8460c8479
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
8474,8475c8493,8494
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
8477c8496
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
8490c8509
< __ret = (poly8x8_t)(__promote); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __promote); \
8496c8515
< __ret = (poly16x4_t)(__promote); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __promote); \
8502c8521
< __ret = (uint8x8_t)(__promote); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __promote); \
8508c8527
< __ret = (uint32x2_t)(__promote); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __promote); \
8514c8533
< __ret = (uint64x1_t)(__promote); \
---
> __ret = __builtin_bit_cast(uint64x1_t, __promote); \
8520c8539
< __ret = (uint16x4_t)(__promote); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __promote); \
8526c8545
< __ret = (int8x8_t)(__promote); \
---
> __ret = __builtin_bit_cast(int8x8_t, __promote); \
8532c8551
< __ret = (float32x2_t)(__promote); \
---
> __ret = __builtin_bit_cast(float32x2_t, __promote); \
8538c8557
< __ret = (float16x4_t)(__promote); \
---
> __ret = __builtin_bit_cast(float16x4_t, __promote); \
8544c8563
< __ret = (int32x2_t)(__promote); \
---
> __ret = __builtin_bit_cast(int32x2_t, __promote); \
8550c8569
< __ret = (int64x1_t)(__promote); \
---
> __ret = __builtin_bit_cast(int64x1_t, __promote); \
8556c8575
< __ret = (int16x4_t)(__promote); \
---
> __ret = __builtin_bit_cast(int16x4_t, __promote); \
8562c8581
< __ret = (float32x4_t) __builtin_neon_vcvtq_f32_v((int8x16_t)__p0, 50);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vcvtq_f32_v(__builtin_bit_cast(int8x16_t, __p0), 50));
8568,8570c8587,8589
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (float32x4_t) __builtin_neon_vcvtq_f32_v((int8x16_t)__rev0, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vcvtq_f32_v(__builtin_bit_cast(int8x16_t, __rev0), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
8578c8597
< __ret = (float32x4_t) __builtin_neon_vcvtq_f32_v((int8x16_t)__p0, 34);
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vcvtq_f32_v(__builtin_bit_cast(int8x16_t, __p0), 34));
8584,8586c8603,8605
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (float32x4_t) __builtin_neon_vcvtq_f32_v((int8x16_t)__rev0, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vcvtq_f32_v(__builtin_bit_cast(int8x16_t, __rev0), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
8594c8613
< __ret = (float32x2_t) __builtin_neon_vcvt_f32_v((int8x8_t)__p0, 18);
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vcvt_f32_v(__builtin_bit_cast(int8x8_t, __p0), 18));
8600,8602c8619,8621
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< __ret = (float32x2_t) __builtin_neon_vcvt_f32_v((int8x8_t)__rev0, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vcvt_f32_v(__builtin_bit_cast(int8x8_t, __rev0), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
8610c8629
< __ret = (float32x2_t) __builtin_neon_vcvt_f32_v((int8x8_t)__p0, 2);
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vcvt_f32_v(__builtin_bit_cast(int8x8_t, __p0), 2));
8616,8618c8635,8637
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< __ret = (float32x2_t) __builtin_neon_vcvt_f32_v((int8x8_t)__rev0, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vcvt_f32_v(__builtin_bit_cast(int8x8_t, __rev0), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
8627c8646
< __ret = (float32x4_t) __builtin_neon_vcvtq_n_f32_v((int8x16_t)__s0, __p1, 50); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vcvtq_n_f32_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 50)); \
8634,8636c8653,8655
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (float32x4_t) __builtin_neon_vcvtq_n_f32_v((int8x16_t)__rev0, __p1, 50); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vcvtq_n_f32_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 50)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
8645c8664
< __ret = (float32x4_t) __builtin_neon_vcvtq_n_f32_v((int8x16_t)__s0, __p1, 34); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vcvtq_n_f32_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 34)); \
8652,8654c8671,8673
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (float32x4_t) __builtin_neon_vcvtq_n_f32_v((int8x16_t)__rev0, __p1, 34); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vcvtq_n_f32_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 34)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
8663c8682
< __ret = (float32x2_t) __builtin_neon_vcvt_n_f32_v((int8x8_t)__s0, __p1, 18); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vcvt_n_f32_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 18)); \
8670,8672c8689,8691
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (float32x2_t) __builtin_neon_vcvt_n_f32_v((int8x8_t)__rev0, __p1, 18); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vcvt_n_f32_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 18)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
8681c8700
< __ret = (float32x2_t) __builtin_neon_vcvt_n_f32_v((int8x8_t)__s0, __p1, 2); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vcvt_n_f32_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 2)); \
8688,8690c8707,8709
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (float32x2_t) __builtin_neon_vcvt_n_f32_v((int8x8_t)__rev0, __p1, 2); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vcvt_n_f32_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 2)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
8699c8718
< __ret = (int32x4_t) __builtin_neon_vcvtq_n_s32_v((int8x16_t)__s0, __p1, 34); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vcvtq_n_s32_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 34)); \
8706,8708c8725,8727
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (int32x4_t) __builtin_neon_vcvtq_n_s32_v((int8x16_t)__rev0, __p1, 34); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vcvtq_n_s32_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 34)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
8717c8736
< __ret = (int32x2_t) __builtin_neon_vcvt_n_s32_v((int8x8_t)__s0, __p1, 2); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vcvt_n_s32_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 2)); \
8724,8726c8743,8745
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (int32x2_t) __builtin_neon_vcvt_n_s32_v((int8x8_t)__rev0, __p1, 2); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vcvt_n_s32_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 2)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
8735c8754
< __ret = (uint32x4_t) __builtin_neon_vcvtq_n_u32_v((int8x16_t)__s0, __p1, 50); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcvtq_n_u32_v(__builtin_bit_cast(int8x16_t, __s0), __p1, 50)); \
8742,8744c8761,8763
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (uint32x4_t) __builtin_neon_vcvtq_n_u32_v((int8x16_t)__rev0, __p1, 50); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcvtq_n_u32_v(__builtin_bit_cast(int8x16_t, __rev0), __p1, 50)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
8753c8772
< __ret = (uint32x2_t) __builtin_neon_vcvt_n_u32_v((int8x8_t)__s0, __p1, 18); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcvt_n_u32_v(__builtin_bit_cast(int8x8_t, __s0), __p1, 18)); \
8760,8762c8779,8781
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (uint32x2_t) __builtin_neon_vcvt_n_u32_v((int8x8_t)__rev0, __p1, 18); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcvt_n_u32_v(__builtin_bit_cast(int8x8_t, __rev0), __p1, 18)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
8770c8789
< __ret = (int32x4_t) __builtin_neon_vcvtq_s32_v((int8x16_t)__p0, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vcvtq_s32_v(__builtin_bit_cast(int8x16_t, __p0), 34));
8776,8778c8795,8797
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vcvtq_s32_v((int8x16_t)__rev0, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vcvtq_s32_v(__builtin_bit_cast(int8x16_t, __rev0), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
8786c8805
< __ret = (int32x2_t) __builtin_neon_vcvt_s32_v((int8x8_t)__p0, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vcvt_s32_v(__builtin_bit_cast(int8x8_t, __p0), 2));
8792,8794c8811,8813
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vcvt_s32_v((int8x8_t)__rev0, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vcvt_s32_v(__builtin_bit_cast(int8x8_t, __rev0), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
8802c8821
< __ret = (uint32x4_t) __builtin_neon_vcvtq_u32_v((int8x16_t)__p0, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcvtq_u32_v(__builtin_bit_cast(int8x16_t, __p0), 50));
8808,8810c8827,8829
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vcvtq_u32_v((int8x16_t)__rev0, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vcvtq_u32_v(__builtin_bit_cast(int8x16_t, __rev0), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
8818c8837
< __ret = (uint32x2_t) __builtin_neon_vcvt_u32_v((int8x8_t)__p0, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcvt_u32_v(__builtin_bit_cast(int8x8_t, __p0), 18));
8824,8826c8843,8845
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vcvt_u32_v((int8x8_t)__rev0, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vcvt_u32_v(__builtin_bit_cast(int8x8_t, __rev0), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
8842c8861
< poly8x8_t __rev0_9; __rev0_9 = __builtin_shufflevector(__s0_9, __s0_9, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly8x8_t __rev0_9; __rev0_9 = __builtin_shufflevector(__s0_9, __s0_9, __lane_reverse_64_8); \
8844c8863
< __ret_9 = __builtin_shufflevector(__ret_9, __ret_9, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_9 = __builtin_shufflevector(__ret_9, __ret_9, __lane_reverse_64_8); \
8860c8879
< poly16x4_t __rev0_11; __rev0_11 = __builtin_shufflevector(__s0_11, __s0_11, 3, 2, 1, 0); \
---
> poly16x4_t __rev0_11; __rev0_11 = __builtin_shufflevector(__s0_11, __s0_11, __lane_reverse_64_16); \
8862c8881
< __ret_11 = __builtin_shufflevector(__ret_11, __ret_11, 3, 2, 1, 0); \
---
> __ret_11 = __builtin_shufflevector(__ret_11, __ret_11, __lane_reverse_64_16); \
8878c8897
< poly8x8_t __rev0_13; __rev0_13 = __builtin_shufflevector(__s0_13, __s0_13, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly8x8_t __rev0_13; __rev0_13 = __builtin_shufflevector(__s0_13, __s0_13, __lane_reverse_64_8); \
8880c8899
< __ret_13 = __builtin_shufflevector(__ret_13, __ret_13, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_13 = __builtin_shufflevector(__ret_13, __ret_13, __lane_reverse_128_8); \
8896c8915
< poly16x4_t __rev0_15; __rev0_15 = __builtin_shufflevector(__s0_15, __s0_15, 3, 2, 1, 0); \
---
> poly16x4_t __rev0_15; __rev0_15 = __builtin_shufflevector(__s0_15, __s0_15, __lane_reverse_64_16); \
8898c8917
< __ret_15 = __builtin_shufflevector(__ret_15, __ret_15, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_15 = __builtin_shufflevector(__ret_15, __ret_15, __lane_reverse_128_16); \
8914c8933
< uint8x8_t __rev0_17; __rev0_17 = __builtin_shufflevector(__s0_17, __s0_17, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint8x8_t __rev0_17; __rev0_17 = __builtin_shufflevector(__s0_17, __s0_17, __lane_reverse_64_8); \
8916c8935
< __ret_17 = __builtin_shufflevector(__ret_17, __ret_17, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_17 = __builtin_shufflevector(__ret_17, __ret_17, __lane_reverse_128_8); \
8932c8951
< uint32x2_t __rev0_19; __rev0_19 = __builtin_shufflevector(__s0_19, __s0_19, 1, 0); \
---
> uint32x2_t __rev0_19; __rev0_19 = __builtin_shufflevector(__s0_19, __s0_19, __lane_reverse_64_32); \
8934c8953
< __ret_19 = __builtin_shufflevector(__ret_19, __ret_19, 3, 2, 1, 0); \
---
> __ret_19 = __builtin_shufflevector(__ret_19, __ret_19, __lane_reverse_128_32); \
8951c8970
< __ret_21 = __builtin_shufflevector(__ret_21, __ret_21, 1, 0); \
---
> __ret_21 = __builtin_shufflevector(__ret_21, __ret_21, __lane_reverse_128_64); \
8967c8986
< uint16x4_t __rev0_23; __rev0_23 = __builtin_shufflevector(__s0_23, __s0_23, 3, 2, 1, 0); \
---
> uint16x4_t __rev0_23; __rev0_23 = __builtin_shufflevector(__s0_23, __s0_23, __lane_reverse_64_16); \
8969c8988
< __ret_23 = __builtin_shufflevector(__ret_23, __ret_23, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_23 = __builtin_shufflevector(__ret_23, __ret_23, __lane_reverse_128_16); \
8985c9004
< int8x8_t __rev0_25; __rev0_25 = __builtin_shufflevector(__s0_25, __s0_25, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int8x8_t __rev0_25; __rev0_25 = __builtin_shufflevector(__s0_25, __s0_25, __lane_reverse_64_8); \
8987c9006
< __ret_25 = __builtin_shufflevector(__ret_25, __ret_25, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_25 = __builtin_shufflevector(__ret_25, __ret_25, __lane_reverse_128_8); \
9003c9022
< float32x2_t __rev0_27; __rev0_27 = __builtin_shufflevector(__s0_27, __s0_27, 1, 0); \
---
> float32x2_t __rev0_27; __rev0_27 = __builtin_shufflevector(__s0_27, __s0_27, __lane_reverse_64_32); \
9005c9024
< __ret_27 = __builtin_shufflevector(__ret_27, __ret_27, 3, 2, 1, 0); \
---
> __ret_27 = __builtin_shufflevector(__ret_27, __ret_27, __lane_reverse_128_32); \
9021c9040
< float16x4_t __rev0_29; __rev0_29 = __builtin_shufflevector(__s0_29, __s0_29, 3, 2, 1, 0); \
---
> float16x4_t __rev0_29; __rev0_29 = __builtin_shufflevector(__s0_29, __s0_29, __lane_reverse_64_16); \
9023c9042
< __ret_29 = __builtin_shufflevector(__ret_29, __ret_29, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_29 = __builtin_shufflevector(__ret_29, __ret_29, __lane_reverse_128_16); \
9039c9058
< int32x2_t __rev0_31; __rev0_31 = __builtin_shufflevector(__s0_31, __s0_31, 1, 0); \
---
> int32x2_t __rev0_31; __rev0_31 = __builtin_shufflevector(__s0_31, __s0_31, __lane_reverse_64_32); \
9041c9060
< __ret_31 = __builtin_shufflevector(__ret_31, __ret_31, 3, 2, 1, 0); \
---
> __ret_31 = __builtin_shufflevector(__ret_31, __ret_31, __lane_reverse_128_32); \
9058c9077
< __ret_33 = __builtin_shufflevector(__ret_33, __ret_33, 1, 0); \
---
> __ret_33 = __builtin_shufflevector(__ret_33, __ret_33, __lane_reverse_128_64); \
9074c9093
< int16x4_t __rev0_35; __rev0_35 = __builtin_shufflevector(__s0_35, __s0_35, 3, 2, 1, 0); \
---
> int16x4_t __rev0_35; __rev0_35 = __builtin_shufflevector(__s0_35, __s0_35, __lane_reverse_64_16); \
9076c9095
< __ret_35 = __builtin_shufflevector(__ret_35, __ret_35, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_35 = __builtin_shufflevector(__ret_35, __ret_35, __lane_reverse_128_16); \
9092c9111
< uint8x8_t __rev0_37; __rev0_37 = __builtin_shufflevector(__s0_37, __s0_37, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint8x8_t __rev0_37; __rev0_37 = __builtin_shufflevector(__s0_37, __s0_37, __lane_reverse_64_8); \
9094c9113
< __ret_37 = __builtin_shufflevector(__ret_37, __ret_37, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_37 = __builtin_shufflevector(__ret_37, __ret_37, __lane_reverse_64_8); \
9110c9129
< uint32x2_t __rev0_39; __rev0_39 = __builtin_shufflevector(__s0_39, __s0_39, 1, 0); \
---
> uint32x2_t __rev0_39; __rev0_39 = __builtin_shufflevector(__s0_39, __s0_39, __lane_reverse_64_32); \
9112c9131
< __ret_39 = __builtin_shufflevector(__ret_39, __ret_39, 1, 0); \
---
> __ret_39 = __builtin_shufflevector(__ret_39, __ret_39, __lane_reverse_64_32); \
9134c9153
< uint16x4_t __rev0_42; __rev0_42 = __builtin_shufflevector(__s0_42, __s0_42, 3, 2, 1, 0); \
---
> uint16x4_t __rev0_42; __rev0_42 = __builtin_shufflevector(__s0_42, __s0_42, __lane_reverse_64_16); \
9136c9155
< __ret_42 = __builtin_shufflevector(__ret_42, __ret_42, 3, 2, 1, 0); \
---
> __ret_42 = __builtin_shufflevector(__ret_42, __ret_42, __lane_reverse_64_16); \
9152c9171
< int8x8_t __rev0_44; __rev0_44 = __builtin_shufflevector(__s0_44, __s0_44, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int8x8_t __rev0_44; __rev0_44 = __builtin_shufflevector(__s0_44, __s0_44, __lane_reverse_64_8); \
9154c9173
< __ret_44 = __builtin_shufflevector(__ret_44, __ret_44, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret_44 = __builtin_shufflevector(__ret_44, __ret_44, __lane_reverse_64_8); \
9170c9189
< float32x2_t __rev0_46; __rev0_46 = __builtin_shufflevector(__s0_46, __s0_46, 1, 0); \
---
> float32x2_t __rev0_46; __rev0_46 = __builtin_shufflevector(__s0_46, __s0_46, __lane_reverse_64_32); \
9172c9191
< __ret_46 = __builtin_shufflevector(__ret_46, __ret_46, 1, 0); \
---
> __ret_46 = __builtin_shufflevector(__ret_46, __ret_46, __lane_reverse_64_32); \
9188c9207
< float16x4_t __rev0_48; __rev0_48 = __builtin_shufflevector(__s0_48, __s0_48, 3, 2, 1, 0); \
---
> float16x4_t __rev0_48; __rev0_48 = __builtin_shufflevector(__s0_48, __s0_48, __lane_reverse_64_16); \
9190c9209
< __ret_48 = __builtin_shufflevector(__ret_48, __ret_48, 3, 2, 1, 0); \
---
> __ret_48 = __builtin_shufflevector(__ret_48, __ret_48, __lane_reverse_64_16); \
9206c9225
< int32x2_t __rev0_50; __rev0_50 = __builtin_shufflevector(__s0_50, __s0_50, 1, 0); \
---
> int32x2_t __rev0_50; __rev0_50 = __builtin_shufflevector(__s0_50, __s0_50, __lane_reverse_64_32); \
9208c9227
< __ret_50 = __builtin_shufflevector(__ret_50, __ret_50, 1, 0); \
---
> __ret_50 = __builtin_shufflevector(__ret_50, __ret_50, __lane_reverse_64_32); \
9230c9249
< int16x4_t __rev0_53; __rev0_53 = __builtin_shufflevector(__s0_53, __s0_53, 3, 2, 1, 0); \
---
> int16x4_t __rev0_53; __rev0_53 = __builtin_shufflevector(__s0_53, __s0_53, __lane_reverse_64_16); \
9232c9251
< __ret_53 = __builtin_shufflevector(__ret_53, __ret_53, 3, 2, 1, 0); \
---
> __ret_53 = __builtin_shufflevector(__ret_53, __ret_53, __lane_reverse_64_16); \
9247c9266
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
9262c9281
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
9277c9296
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
9292c9311
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
9307c9326
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
9322c9341
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
9337c9356
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
9352c9371
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
9367c9386
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
9382c9401
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
9399c9418
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
9414c9433
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
9429c9448
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
9444c9463
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
9459c9478
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
9474c9493
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
9494c9513
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
9509c9528
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
9524c9543
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
9541c9560
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
9556c9575
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
9576c9595
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
9590,9591c9609,9610
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
9593c9612
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
9607,9608c9626,9627
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
9610c9629
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
9624,9625c9643,9644
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
9627c9646
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
9641,9642c9660,9661
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
9644c9663
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
9658,9659c9677,9678
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
9661c9680
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
9675,9676c9694,9695
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
9678c9697
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
9692,9693c9711,9712
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
> int64x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_64);
9695c9714
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64);
9709,9710c9728,9729
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
9712c9731
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
9726,9727c9745,9746
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
9729c9748
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
9743,9744c9762,9763
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
9746c9765
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
9765,9766c9784,9785
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
9768c9787
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
9782,9783c9801,9802
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
9785c9804
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
9799,9800c9818,9819
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
9802c9821
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
9821,9822c9840,9841
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
9824c9843
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
9834c9853
< __ret = (poly8x8_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 4); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 4)); \
9842,9845c9861,9864
< poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly8x8_t) __builtin_neon_vext_v((int8x8_t)__rev0, (int8x8_t)__rev1, __p2, 4); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __p2, 4)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
9855c9874
< __ret = (poly16x4_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 5); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 5)); \
9863,9866c9882,9885
< poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< poly16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (poly16x4_t) __builtin_neon_vext_v((int8x8_t)__rev0, (int8x8_t)__rev1, __p2, 5); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> poly16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __p2, 5)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
9876c9895
< __ret = (poly8x16_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 36); \
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 36)); \
9884,9887c9903,9906
< poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< poly8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly8x16_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 36); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> poly8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 36)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
9897c9916
< __ret = (poly16x8_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 37); \
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 37)); \
9905,9908c9924,9927
< poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< poly16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly16x8_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 37); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> poly16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 37)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
9918c9937
< __ret = (uint8x16_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 48); \
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 48)); \
9926,9929c9945,9948
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint8x16_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 48); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 48)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
9939c9958
< __ret = (uint32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 50); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 50)); \
9947,9950c9966,9969
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (uint32x4_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 50); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 50)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
9960c9979
< __ret = (uint64x2_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 51); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 51)); \
9968,9971c9987,9990
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 1, 0); \
< __ret = (uint64x2_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 51); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 51)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
9981c10000
< __ret = (uint16x8_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 49); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 49)); \
9989,9992c10008,10011
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint16x8_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 49); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 49)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
10002c10021
< __ret = (int8x16_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 32); \
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 32)); \
10010,10013c10029,10032
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int8x16_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 32); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 32)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
10023c10042
< __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 41); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 41)); \
10031,10034c10050,10053
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (float32x4_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 41); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 41)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
10044c10063
< __ret = (int32x4_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 34); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 34)); \
10052,10055c10071,10074
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (int32x4_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 34); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 34)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
10065c10084
< __ret = (int64x2_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 35); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 35)); \
10073,10076c10092,10095
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< int64x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 1, 0); \
< __ret = (int64x2_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 35); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> int64x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 35)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
10086c10105
< __ret = (int16x8_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 33); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 33)); \
10094,10097c10113,10116
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int16x8_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 33); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 33)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
10107c10126
< __ret = (uint8x8_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 16); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 16)); \
10115,10118c10134,10137
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint8x8_t) __builtin_neon_vext_v((int8x8_t)__rev0, (int8x8_t)__rev1, __p2, 16); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __p2, 16)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
10128c10147
< __ret = (uint32x2_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 18); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 18)); \
10136,10139c10155,10158
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 1, 0); \
< __ret = (uint32x2_t) __builtin_neon_vext_v((int8x8_t)__rev0, (int8x8_t)__rev1, __p2, 18); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __p2, 18)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
10148c10167
< __ret = (uint64x1_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 19); \
---
> __ret = __builtin_bit_cast(uint64x1_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 19)); \
10156c10175
< __ret = (uint16x4_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 17); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 17)); \
10164,10167c10183,10186
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (uint16x4_t) __builtin_neon_vext_v((int8x8_t)__rev0, (int8x8_t)__rev1, __p2, 17); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __p2, 17)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
10177c10196
< __ret = (int8x8_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 0); \
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 0)); \
10185,10188c10204,10207
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int8x8_t) __builtin_neon_vext_v((int8x8_t)__rev0, (int8x8_t)__rev1, __p2, 0); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __p2, 0)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
10198c10217
< __ret = (float32x2_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 9); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 9)); \
10206,10209c10225,10228
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 1, 0); \
< __ret = (float32x2_t) __builtin_neon_vext_v((int8x8_t)__rev0, (int8x8_t)__rev1, __p2, 9); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __p2, 9)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
10219c10238
< __ret = (int32x2_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 2); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 2)); \
10227,10230c10246,10249
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 1, 0); \
< __ret = (int32x2_t) __builtin_neon_vext_v((int8x8_t)__rev0, (int8x8_t)__rev1, __p2, 2); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __p2, 2)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
10239c10258
< __ret = (int64x1_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 3); \
---
> __ret = __builtin_bit_cast(int64x1_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 3)); \
10247c10266
< __ret = (int16x4_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 1); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 1)); \
10255,10258c10274,10277
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (int16x4_t) __builtin_neon_vext_v((int8x8_t)__rev0, (int8x8_t)__rev1, __p2, 1); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __p2, 1)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
10268c10287
< __ret = (float16x8_t) __builtin_neon_vextq_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 40); \
---
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __s0), __builtin_bit_cast(int8x16_t, __s1), __p2, 40)); \
10276,10279c10295,10298
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< float16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (float16x8_t) __builtin_neon_vextq_v((int8x16_t)__rev0, (int8x16_t)__rev1, __p2, 40); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> float16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(float16x8_t, __builtin_neon_vextq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), __p2, 40)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
10289c10308
< __ret = (float16x4_t) __builtin_neon_vext_v((int8x8_t)__s0, (int8x8_t)__s1, __p2, 8); \
---
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __s0), __builtin_bit_cast(int8x8_t, __s1), __p2, 8)); \
10297,10300c10316,10319
< float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< float16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (float16x4_t) __builtin_neon_vext_v((int8x8_t)__rev0, (int8x8_t)__rev1, __p2, 8); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> float16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(float16x4_t, __builtin_neon_vext_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), __p2, 8)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
10314c10333
< poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
10316c10335
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
10335c10354
< poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
10337c10356
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
10351c10370
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
10353c10372
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
10372c10391
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
10374c10393
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
10393c10412
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
10408c10427
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
10410c10429
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
10429c10448
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
10431c10450
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
10450c10469
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
10452c10471
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
10471c10490
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
10473c10492
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
10492c10511
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
10494c10513
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
10513c10532
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
10528c10547
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
10530c10549
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
10544c10563
< __ret = (poly8_t) __builtin_neon_vget_lane_i8((poly8x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(poly8_t, __builtin_neon_vget_lane_i8(__s0, __p1)); \
10551,10552c10570,10571
< poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly8_t) __builtin_neon_vget_lane_i8((poly8x8_t)__rev0, __p1); \
---
> poly8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(poly8_t, __builtin_neon_vget_lane_i8(__rev0, __p1)); \
10558c10577
< __ret = (poly8_t) __builtin_neon_vget_lane_i8((poly8x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(poly8_t, __builtin_neon_vget_lane_i8(__s0, __p1)); \
10567c10586
< __ret = (poly16_t) __builtin_neon_vget_lane_i16((poly16x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(poly16_t, __builtin_neon_vget_lane_i16(__s0, __p1)); \
10574,10575c10593,10594
< poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (poly16_t) __builtin_neon_vget_lane_i16((poly16x4_t)__rev0, __p1); \
---
> poly16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(poly16_t, __builtin_neon_vget_lane_i16(__rev0, __p1)); \
10581c10600
< __ret = (poly16_t) __builtin_neon_vget_lane_i16((poly16x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(poly16_t, __builtin_neon_vget_lane_i16(__s0, __p1)); \
10590c10609
< __ret = (poly8_t) __builtin_neon_vgetq_lane_i8((poly8x16_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(poly8_t, __builtin_neon_vgetq_lane_i8(__s0, __p1)); \
10597,10598c10616,10617
< poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly8_t) __builtin_neon_vgetq_lane_i8((poly8x16_t)__rev0, __p1); \
---
> poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(poly8_t, __builtin_neon_vgetq_lane_i8(__rev0, __p1)); \
10604c10623
< __ret = (poly8_t) __builtin_neon_vgetq_lane_i8((poly8x16_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(poly8_t, __builtin_neon_vgetq_lane_i8(__s0, __p1)); \
10613c10632
< __ret = (poly16_t) __builtin_neon_vgetq_lane_i16((poly16x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(poly16_t, __builtin_neon_vgetq_lane_i16(__s0, __p1)); \
10620,10621c10639,10640
< poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly16_t) __builtin_neon_vgetq_lane_i16((poly16x8_t)__rev0, __p1); \
---
> poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(poly16_t, __builtin_neon_vgetq_lane_i16(__rev0, __p1)); \
10627c10646
< __ret = (poly16_t) __builtin_neon_vgetq_lane_i16((poly16x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(poly16_t, __builtin_neon_vgetq_lane_i16(__s0, __p1)); \
10636c10655
< __ret = (uint8_t) __builtin_neon_vgetq_lane_i8((int8x16_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint8_t, __builtin_neon_vgetq_lane_i8(__builtin_bit_cast(int8x16_t, __s0), __p1)); \
10643,10644c10662,10663
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint8_t) __builtin_neon_vgetq_lane_i8((int8x16_t)__rev0, __p1); \
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(uint8_t, __builtin_neon_vgetq_lane_i8(__builtin_bit_cast(int8x16_t, __rev0), __p1)); \
10650c10669
< __ret = (uint8_t) __builtin_neon_vgetq_lane_i8((int8x16_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint8_t, __builtin_neon_vgetq_lane_i8(__builtin_bit_cast(int8x16_t, __s0), __p1)); \
10659c10678
< __ret = (uint32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vgetq_lane_i32(__builtin_bit_cast(int32x4_t, __s0), __p1)); \
10666,10667c10685,10686
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (uint32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__rev0, __p1); \
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vgetq_lane_i32(__builtin_bit_cast(int32x4_t, __rev0), __p1)); \
10673c10692
< __ret = (uint32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vgetq_lane_i32(__builtin_bit_cast(int32x4_t, __s0), __p1)); \
10682c10701
< __ret = (uint64_t) __builtin_neon_vgetq_lane_i64((int64x2_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint64_t, __builtin_neon_vgetq_lane_i64(__builtin_bit_cast(int64x2_t, __s0), __p1)); \
10689,10690c10708,10709
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (uint64_t) __builtin_neon_vgetq_lane_i64((int64x2_t)__rev0, __p1); \
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(uint64_t, __builtin_neon_vgetq_lane_i64(__builtin_bit_cast(int64x2_t, __rev0), __p1)); \
10696c10715
< __ret = (uint64_t) __builtin_neon_vgetq_lane_i64((int64x2_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint64_t, __builtin_neon_vgetq_lane_i64(__builtin_bit_cast(int64x2_t, __s0), __p1)); \
10705c10724
< __ret = (uint16_t) __builtin_neon_vgetq_lane_i16((int16x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vgetq_lane_i16(__builtin_bit_cast(int16x8_t, __s0), __p1)); \
10712,10713c10731,10732
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint16_t) __builtin_neon_vgetq_lane_i16((int16x8_t)__rev0, __p1); \
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vgetq_lane_i16(__builtin_bit_cast(int16x8_t, __rev0), __p1)); \
10719c10738
< __ret = (uint16_t) __builtin_neon_vgetq_lane_i16((int16x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vgetq_lane_i16(__builtin_bit_cast(int16x8_t, __s0), __p1)); \
10728c10747
< __ret = (int8_t) __builtin_neon_vgetq_lane_i8((int8x16_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int8_t, __builtin_neon_vgetq_lane_i8(__builtin_bit_cast(int8x16_t, __s0), __p1)); \
10735,10736c10754,10755
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int8_t) __builtin_neon_vgetq_lane_i8((int8x16_t)__rev0, __p1); \
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(int8_t, __builtin_neon_vgetq_lane_i8(__builtin_bit_cast(int8x16_t, __rev0), __p1)); \
10742c10761
< __ret = (int8_t) __builtin_neon_vgetq_lane_i8((int8x16_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int8_t, __builtin_neon_vgetq_lane_i8(__builtin_bit_cast(int8x16_t, __s0), __p1)); \
10751c10770
< __ret = (float32_t) __builtin_neon_vgetq_lane_f32((float32x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(float32_t, __builtin_neon_vgetq_lane_f32(__s0, __p1)); \
10758,10759c10777,10778
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (float32_t) __builtin_neon_vgetq_lane_f32((float32x4_t)__rev0, __p1); \
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(float32_t, __builtin_neon_vgetq_lane_f32(__rev0, __p1)); \
10765c10784
< __ret = (float32_t) __builtin_neon_vgetq_lane_f32((float32x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(float32_t, __builtin_neon_vgetq_lane_f32(__s0, __p1)); \
10774c10793
< __ret = (int32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vgetq_lane_i32(__builtin_bit_cast(int32x4_t, __s0), __p1)); \
10781,10782c10800,10801
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (int32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__rev0, __p1); \
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vgetq_lane_i32(__builtin_bit_cast(int32x4_t, __rev0), __p1)); \
10788c10807
< __ret = (int32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vgetq_lane_i32(__builtin_bit_cast(int32x4_t, __s0), __p1)); \
10797c10816
< __ret = (int64_t) __builtin_neon_vgetq_lane_i64((int64x2_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int64_t, __builtin_neon_vgetq_lane_i64(__builtin_bit_cast(int64x2_t, __s0), __p1)); \
10804,10805c10823,10824
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (int64_t) __builtin_neon_vgetq_lane_i64((int64x2_t)__rev0, __p1); \
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(int64_t, __builtin_neon_vgetq_lane_i64(__builtin_bit_cast(int64x2_t, __rev0), __p1)); \
10811c10830
< __ret = (int64_t) __builtin_neon_vgetq_lane_i64((int64x2_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int64_t, __builtin_neon_vgetq_lane_i64(__builtin_bit_cast(int64x2_t, __s0), __p1)); \
10820c10839
< __ret = (int16_t) __builtin_neon_vgetq_lane_i16((int16x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vgetq_lane_i16(__builtin_bit_cast(int16x8_t, __s0), __p1)); \
10827,10828c10846,10847
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int16_t) __builtin_neon_vgetq_lane_i16((int16x8_t)__rev0, __p1); \
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vgetq_lane_i16(__builtin_bit_cast(int16x8_t, __rev0), __p1)); \
10834c10853
< __ret = (int16_t) __builtin_neon_vgetq_lane_i16((int16x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vgetq_lane_i16(__builtin_bit_cast(int16x8_t, __s0), __p1)); \
10843c10862
< __ret = (uint8_t) __builtin_neon_vget_lane_i8((int8x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint8_t, __builtin_neon_vget_lane_i8(__builtin_bit_cast(int8x8_t, __s0), __p1)); \
10850,10851c10869,10870
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint8_t) __builtin_neon_vget_lane_i8((int8x8_t)__rev0, __p1); \
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(uint8_t, __builtin_neon_vget_lane_i8(__builtin_bit_cast(int8x8_t, __rev0), __p1)); \
10857c10876
< __ret = (uint8_t) __builtin_neon_vget_lane_i8((int8x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint8_t, __builtin_neon_vget_lane_i8(__builtin_bit_cast(int8x8_t, __s0), __p1)); \
10866c10885
< __ret = (uint32_t) __builtin_neon_vget_lane_i32((int32x2_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vget_lane_i32(__builtin_bit_cast(int32x2_t, __s0), __p1)); \
10873,10874c10892,10893
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (uint32_t) __builtin_neon_vget_lane_i32((int32x2_t)__rev0, __p1); \
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vget_lane_i32(__builtin_bit_cast(int32x2_t, __rev0), __p1)); \
10880c10899
< __ret = (uint32_t) __builtin_neon_vget_lane_i32((int32x2_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint32_t, __builtin_neon_vget_lane_i32(__builtin_bit_cast(int32x2_t, __s0), __p1)); \
10888c10907
< __ret = (uint64_t) __builtin_neon_vget_lane_i64((int64x1_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint64_t, __builtin_neon_vget_lane_i64(__builtin_bit_cast(int64x1_t, __s0), __p1)); \
10895c10914
< __ret = (uint16_t) __builtin_neon_vget_lane_i16((int16x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vget_lane_i16(__builtin_bit_cast(int16x4_t, __s0), __p1)); \
10902,10903c10921,10922
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (uint16_t) __builtin_neon_vget_lane_i16((int16x4_t)__rev0, __p1); \
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vget_lane_i16(__builtin_bit_cast(int16x4_t, __rev0), __p1)); \
10909c10928
< __ret = (uint16_t) __builtin_neon_vget_lane_i16((int16x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(uint16_t, __builtin_neon_vget_lane_i16(__builtin_bit_cast(int16x4_t, __s0), __p1)); \
10918c10937
< __ret = (int8_t) __builtin_neon_vget_lane_i8((int8x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int8_t, __builtin_neon_vget_lane_i8(__builtin_bit_cast(int8x8_t, __s0), __p1)); \
10925,10926c10944,10945
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int8_t) __builtin_neon_vget_lane_i8((int8x8_t)__rev0, __p1); \
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(int8_t, __builtin_neon_vget_lane_i8(__builtin_bit_cast(int8x8_t, __rev0), __p1)); \
10932c10951
< __ret = (int8_t) __builtin_neon_vget_lane_i8((int8x8_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int8_t, __builtin_neon_vget_lane_i8(__builtin_bit_cast(int8x8_t, __s0), __p1)); \
10941c10960
< __ret = (float32_t) __builtin_neon_vget_lane_f32((float32x2_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(float32_t, __builtin_neon_vget_lane_f32(__s0, __p1)); \
10948,10949c10967,10968
< float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (float32_t) __builtin_neon_vget_lane_f32((float32x2_t)__rev0, __p1); \
---
> float32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(float32_t, __builtin_neon_vget_lane_f32(__rev0, __p1)); \
10955c10974
< __ret = (float32_t) __builtin_neon_vget_lane_f32((float32x2_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(float32_t, __builtin_neon_vget_lane_f32(__s0, __p1)); \
10964c10983
< __ret = (int32_t) __builtin_neon_vget_lane_i32((int32x2_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vget_lane_i32(__builtin_bit_cast(int32x2_t, __s0), __p1)); \
10971,10972c10990,10991
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 1, 0); \
< __ret = (int32_t) __builtin_neon_vget_lane_i32((int32x2_t)__rev0, __p1); \
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vget_lane_i32(__builtin_bit_cast(int32x2_t, __rev0), __p1)); \
10978c10997
< __ret = (int32_t) __builtin_neon_vget_lane_i32((int32x2_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int32_t, __builtin_neon_vget_lane_i32(__builtin_bit_cast(int32x2_t, __s0), __p1)); \
10986c11005
< __ret = (int64_t) __builtin_neon_vget_lane_i64((int64x1_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int64_t, __builtin_neon_vget_lane_i64(__builtin_bit_cast(int64x1_t, __s0), __p1)); \
10993c11012
< __ret = (int16_t) __builtin_neon_vget_lane_i16((int16x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vget_lane_i16(__builtin_bit_cast(int16x4_t, __s0), __p1)); \
11000,11001c11019,11020
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, 3, 2, 1, 0); \
< __ret = (int16_t) __builtin_neon_vget_lane_i16((int16x4_t)__rev0, __p1); \
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__s0, __s0, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vget_lane_i16(__builtin_bit_cast(int16x4_t, __rev0), __p1)); \
11007c11026
< __ret = (int16_t) __builtin_neon_vget_lane_i16((int16x4_t)__s0, __p1); \
---
> __ret = __builtin_bit_cast(int16_t, __builtin_neon_vget_lane_i16(__builtin_bit_cast(int16x4_t, __s0), __p1)); \
11021c11040
< poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
11023c11042
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
11037c11056
< poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> poly16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
11039c11058
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
11053c11072
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
11055c11074
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
11069c11088
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
11071c11090
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
11085c11104
< uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
---
> uint64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
11100c11119
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
11102c11121
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
11116c11135
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
11118c11137
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
11132c11151
< float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
---
> float32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
11134c11153
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
11148c11167
< float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> float16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
11150c11169
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
11164c11183
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
11166c11185
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
11180c11199
< int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
---
> int64x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_64);
11195c11214
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
11197c11216
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
11205c11224
< __ret = (uint8x16_t) __builtin_neon_vhaddq_v((int8x16_t)__p0, (int8x16_t)__p1, 48);
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 48));
11211,11214c11230,11233
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t) __builtin_neon_vhaddq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 48);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 48));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
11222c11241
< __ret = (uint32x4_t) __builtin_neon_vhaddq_v((int8x16_t)__p0, (int8x16_t)__p1, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 50));
11228,11231c11247,11250
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vhaddq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
11239c11258
< __ret = (uint16x8_t) __builtin_neon_vhaddq_v((int8x16_t)__p0, (int8x16_t)__p1, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 49));
11245,11248c11264,11267
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vhaddq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
11256c11275
< __ret = (int8x16_t) __builtin_neon_vhaddq_v((int8x16_t)__p0, (int8x16_t)__p1, 32);
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 32));
11262,11265c11281,11284
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x16_t) __builtin_neon_vhaddq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 32);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 32));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
11273c11292
< __ret = (int32x4_t) __builtin_neon_vhaddq_v((int8x16_t)__p0, (int8x16_t)__p1, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 34));
11279,11282c11298,11301
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vhaddq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
11290c11309
< __ret = (int16x8_t) __builtin_neon_vhaddq_v((int8x16_t)__p0, (int8x16_t)__p1, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 33));
11296,11299c11315,11318
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vhaddq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vhaddq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
11307c11326
< __ret = (uint8x8_t) __builtin_neon_vhadd_v((int8x8_t)__p0, (int8x8_t)__p1, 16);
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 16));
11313,11316c11332,11335
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t) __builtin_neon_vhadd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 16);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 16));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
11324c11343
< __ret = (uint32x2_t) __builtin_neon_vhadd_v((int8x8_t)__p0, (int8x8_t)__p1, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 18));
11330,11333c11349,11352
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vhadd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
11341c11360
< __ret = (uint16x4_t) __builtin_neon_vhadd_v((int8x8_t)__p0, (int8x8_t)__p1, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 17));
11347,11350c11366,11369
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vhadd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
11358c11377
< __ret = (int8x8_t) __builtin_neon_vhadd_v((int8x8_t)__p0, (int8x8_t)__p1, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 0));
11364,11367c11383,11386
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x8_t) __builtin_neon_vhadd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 0);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 0));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
11375c11394
< __ret = (int32x2_t) __builtin_neon_vhadd_v((int8x8_t)__p0, (int8x8_t)__p1, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 2));
11381,11384c11400,11403
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vhadd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
11392c11411
< __ret = (int16x4_t) __builtin_neon_vhadd_v((int8x8_t)__p0, (int8x8_t)__p1, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 1));
11398,11401c11417,11420
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vhadd_v((int8x8_t)__rev0, (int8x8_t)__rev1, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vhadd_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
11409c11428
< __ret = (uint8x16_t) __builtin_neon_vhsubq_v((int8x16_t)__p0, (int8x16_t)__p1, 48);
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 48));
11415,11418c11434,11437
< uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x16_t) __builtin_neon_vhsubq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 48);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 48));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
11426c11445
< __ret = (uint32x4_t) __builtin_neon_vhsubq_v((int8x16_t)__p0, (int8x16_t)__p1, 50);
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 50));
11432,11435c11451,11454
< uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint32x4_t) __builtin_neon_vhsubq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 50);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 50));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
11443c11462
< __ret = (uint16x8_t) __builtin_neon_vhsubq_v((int8x16_t)__p0, (int8x16_t)__p1, 49);
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 49));
11449,11452c11468,11471
< uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint16x8_t) __builtin_neon_vhsubq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 49);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 49));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
11460c11479
< __ret = (int8x16_t) __builtin_neon_vhsubq_v((int8x16_t)__p0, (int8x16_t)__p1, 32);
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 32));
11466,11469c11485,11488
< int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x16_t) __builtin_neon_vhsubq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 32);
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x16_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_8);
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_8);
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 32));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8);
11477c11496
< __ret = (int32x4_t) __builtin_neon_vhsubq_v((int8x16_t)__p0, (int8x16_t)__p1, 34);
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 34));
11483,11486c11502,11505
< int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (int32x4_t) __builtin_neon_vhsubq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 34);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int32x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_32);
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_32);
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 34));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32);
11494c11513
< __ret = (int16x8_t) __builtin_neon_vhsubq_v((int8x16_t)__p0, (int8x16_t)__p1, 33);
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __p0), __builtin_bit_cast(int8x16_t, __p1), 33));
11500,11503c11519,11522
< int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int16x8_t) __builtin_neon_vhsubq_v((int8x16_t)__rev0, (int8x16_t)__rev1, 33);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int16x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_128_16);
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_128_16);
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vhsubq_v(__builtin_bit_cast(int8x16_t, __rev0), __builtin_bit_cast(int8x16_t, __rev1), 33));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16);
11511c11530
< __ret = (uint8x8_t) __builtin_neon_vhsub_v((int8x8_t)__p0, (int8x8_t)__p1, 16);
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 16));
11517,11520c11536,11539
< uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (uint8x8_t) __builtin_neon_vhsub_v((int8x8_t)__rev0, (int8x8_t)__rev1, 16);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> uint8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 16));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
11528c11547
< __ret = (uint32x2_t) __builtin_neon_vhsub_v((int8x8_t)__p0, (int8x8_t)__p1, 18);
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 18));
11534,11537c11553,11556
< uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (uint32x2_t) __builtin_neon_vhsub_v((int8x8_t)__rev0, (int8x8_t)__rev1, 18);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> uint32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 18));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
11545c11564
< __ret = (uint16x4_t) __builtin_neon_vhsub_v((int8x8_t)__p0, (int8x8_t)__p1, 17);
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 17));
11551,11554c11570,11573
< uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (uint16x4_t) __builtin_neon_vhsub_v((int8x8_t)__rev0, (int8x8_t)__rev1, 17);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> uint16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 17));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
11562c11581
< __ret = (int8x8_t) __builtin_neon_vhsub_v((int8x8_t)__p0, (int8x8_t)__p1, 0);
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 0));
11568,11571c11587,11590
< int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
< __ret = (int8x8_t) __builtin_neon_vhsub_v((int8x8_t)__rev0, (int8x8_t)__rev1, 0);
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
---
> int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8);
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8);
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 0));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8);
11579c11598
< __ret = (int32x2_t) __builtin_neon_vhsub_v((int8x8_t)__p0, (int8x8_t)__p1, 2);
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 2));
11585,11588c11604,11607
< int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 1, 0);
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 1, 0);
< __ret = (int32x2_t) __builtin_neon_vhsub_v((int8x8_t)__rev0, (int8x8_t)__rev1, 2);
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0);
---
> int32x2_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_32);
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_32);
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 2));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32);
11596c11615
< __ret = (int16x4_t) __builtin_neon_vhsub_v((int8x8_t)__p0, (int8x8_t)__p1, 1);
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __p0), __builtin_bit_cast(int8x8_t, __p1), 1));
11602,11605c11621,11624
< int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 3, 2, 1, 0);
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 3, 2, 1, 0);
< __ret = (int16x4_t) __builtin_neon_vhsub_v((int8x8_t)__rev0, (int8x8_t)__rev1, 1);
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0);
---
> int16x4_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_16);
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_16);
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vhsub_v(__builtin_bit_cast(int8x8_t, __rev0), __builtin_bit_cast(int8x8_t, __rev1), 1));
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16);
11613c11632
< __ret = (poly8x8_t) __builtin_neon_vld1_v(__p0, 4); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vld1_v(__p0, 4)); \
11619,11620c11638,11639
< __ret = (poly8x8_t) __builtin_neon_vld1_v(__p0, 4); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vld1_v(__p0, 4)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
11628c11647
< __ret = (poly16x4_t) __builtin_neon_vld1_v(__p0, 5); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vld1_v(__p0, 5)); \
11634,11635c11653,11654
< __ret = (poly16x4_t) __builtin_neon_vld1_v(__p0, 5); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vld1_v(__p0, 5)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
11643c11662
< __ret = (poly8x16_t) __builtin_neon_vld1q_v(__p0, 36); \
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vld1q_v(__p0, 36)); \
11649,11650c11668,11669
< __ret = (poly8x16_t) __builtin_neon_vld1q_v(__p0, 36); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vld1q_v(__p0, 36)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
11658c11677
< __ret = (poly16x8_t) __builtin_neon_vld1q_v(__p0, 37); \
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vld1q_v(__p0, 37)); \
11664,11665c11683,11684
< __ret = (poly16x8_t) __builtin_neon_vld1q_v(__p0, 37); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vld1q_v(__p0, 37)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
11673c11692
< __ret = (uint8x16_t) __builtin_neon_vld1q_v(__p0, 48); \
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vld1q_v(__p0, 48)); \
11679,11680c11698,11699
< __ret = (uint8x16_t) __builtin_neon_vld1q_v(__p0, 48); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vld1q_v(__p0, 48)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
11688c11707
< __ret = (uint32x4_t) __builtin_neon_vld1q_v(__p0, 50); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vld1q_v(__p0, 50)); \
11694,11695c11713,11714
< __ret = (uint32x4_t) __builtin_neon_vld1q_v(__p0, 50); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vld1q_v(__p0, 50)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
11703c11722
< __ret = (uint64x2_t) __builtin_neon_vld1q_v(__p0, 51); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_vld1q_v(__p0, 51)); \
11709,11710c11728,11729
< __ret = (uint64x2_t) __builtin_neon_vld1q_v(__p0, 51); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_vld1q_v(__p0, 51)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
11718c11737
< __ret = (uint16x8_t) __builtin_neon_vld1q_v(__p0, 49); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vld1q_v(__p0, 49)); \
11724,11725c11743,11744
< __ret = (uint16x8_t) __builtin_neon_vld1q_v(__p0, 49); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vld1q_v(__p0, 49)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
11733c11752
< __ret = (int8x16_t) __builtin_neon_vld1q_v(__p0, 32); \
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vld1q_v(__p0, 32)); \
11739,11740c11758,11759
< __ret = (int8x16_t) __builtin_neon_vld1q_v(__p0, 32); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vld1q_v(__p0, 32)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
11748c11767
< __ret = (float32x4_t) __builtin_neon_vld1q_v(__p0, 41); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vld1q_v(__p0, 41)); \
11754,11755c11773,11774
< __ret = (float32x4_t) __builtin_neon_vld1q_v(__p0, 41); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vld1q_v(__p0, 41)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
11763c11782
< __ret = (int32x4_t) __builtin_neon_vld1q_v(__p0, 34); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vld1q_v(__p0, 34)); \
11769,11770c11788,11789
< __ret = (int32x4_t) __builtin_neon_vld1q_v(__p0, 34); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vld1q_v(__p0, 34)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
11778c11797
< __ret = (int64x2_t) __builtin_neon_vld1q_v(__p0, 35); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_vld1q_v(__p0, 35)); \
11784,11785c11803,11804
< __ret = (int64x2_t) __builtin_neon_vld1q_v(__p0, 35); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_vld1q_v(__p0, 35)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
11793c11812
< __ret = (int16x8_t) __builtin_neon_vld1q_v(__p0, 33); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vld1q_v(__p0, 33)); \
11799,11800c11818,11819
< __ret = (int16x8_t) __builtin_neon_vld1q_v(__p0, 33); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vld1q_v(__p0, 33)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
11808c11827
< __ret = (uint8x8_t) __builtin_neon_vld1_v(__p0, 16); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vld1_v(__p0, 16)); \
11814,11815c11833,11834
< __ret = (uint8x8_t) __builtin_neon_vld1_v(__p0, 16); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vld1_v(__p0, 16)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
11823c11842
< __ret = (uint32x2_t) __builtin_neon_vld1_v(__p0, 18); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vld1_v(__p0, 18)); \
11829,11830c11848,11849
< __ret = (uint32x2_t) __builtin_neon_vld1_v(__p0, 18); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vld1_v(__p0, 18)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
11837c11856
< __ret = (uint64x1_t) __builtin_neon_vld1_v(__p0, 19); \
---
> __ret = __builtin_bit_cast(uint64x1_t, __builtin_neon_vld1_v(__p0, 19)); \
11843c11862
< __ret = (uint16x4_t) __builtin_neon_vld1_v(__p0, 17); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vld1_v(__p0, 17)); \
11849,11850c11868,11869
< __ret = (uint16x4_t) __builtin_neon_vld1_v(__p0, 17); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vld1_v(__p0, 17)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
11858c11877
< __ret = (int8x8_t) __builtin_neon_vld1_v(__p0, 0); \
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vld1_v(__p0, 0)); \
11864,11865c11883,11884
< __ret = (int8x8_t) __builtin_neon_vld1_v(__p0, 0); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vld1_v(__p0, 0)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
11873c11892
< __ret = (float32x2_t) __builtin_neon_vld1_v(__p0, 9); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vld1_v(__p0, 9)); \
11879,11880c11898,11899
< __ret = (float32x2_t) __builtin_neon_vld1_v(__p0, 9); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vld1_v(__p0, 9)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
11888c11907
< __ret = (int32x2_t) __builtin_neon_vld1_v(__p0, 2); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vld1_v(__p0, 2)); \
11894,11895c11913,11914
< __ret = (int32x2_t) __builtin_neon_vld1_v(__p0, 2); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vld1_v(__p0, 2)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
11902c11921
< __ret = (int64x1_t) __builtin_neon_vld1_v(__p0, 3); \
---
> __ret = __builtin_bit_cast(int64x1_t, __builtin_neon_vld1_v(__p0, 3)); \
11908c11927
< __ret = (int16x4_t) __builtin_neon_vld1_v(__p0, 1); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vld1_v(__p0, 1)); \
11914,11915c11933,11934
< __ret = (int16x4_t) __builtin_neon_vld1_v(__p0, 1); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vld1_v(__p0, 1)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
11923c11942
< __ret = (poly8x8_t) __builtin_neon_vld1_dup_v(__p0, 4); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vld1_dup_v(__p0, 4)); \
11929,11930c11948,11949
< __ret = (poly8x8_t) __builtin_neon_vld1_dup_v(__p0, 4); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vld1_dup_v(__p0, 4)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
11938c11957
< __ret = (poly16x4_t) __builtin_neon_vld1_dup_v(__p0, 5); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vld1_dup_v(__p0, 5)); \
11944,11945c11963,11964
< __ret = (poly16x4_t) __builtin_neon_vld1_dup_v(__p0, 5); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vld1_dup_v(__p0, 5)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
11953c11972
< __ret = (poly8x16_t) __builtin_neon_vld1q_dup_v(__p0, 36); \
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vld1q_dup_v(__p0, 36)); \
11959,11960c11978,11979
< __ret = (poly8x16_t) __builtin_neon_vld1q_dup_v(__p0, 36); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vld1q_dup_v(__p0, 36)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
11968c11987
< __ret = (poly16x8_t) __builtin_neon_vld1q_dup_v(__p0, 37); \
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vld1q_dup_v(__p0, 37)); \
11974,11975c11993,11994
< __ret = (poly16x8_t) __builtin_neon_vld1q_dup_v(__p0, 37); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vld1q_dup_v(__p0, 37)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
11983c12002
< __ret = (uint8x16_t) __builtin_neon_vld1q_dup_v(__p0, 48); \
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vld1q_dup_v(__p0, 48)); \
11989,11990c12008,12009
< __ret = (uint8x16_t) __builtin_neon_vld1q_dup_v(__p0, 48); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vld1q_dup_v(__p0, 48)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
11998c12017
< __ret = (uint32x4_t) __builtin_neon_vld1q_dup_v(__p0, 50); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vld1q_dup_v(__p0, 50)); \
12004,12005c12023,12024
< __ret = (uint32x4_t) __builtin_neon_vld1q_dup_v(__p0, 50); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vld1q_dup_v(__p0, 50)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
12013c12032
< __ret = (uint64x2_t) __builtin_neon_vld1q_dup_v(__p0, 51); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_vld1q_dup_v(__p0, 51)); \
12019,12020c12038,12039
< __ret = (uint64x2_t) __builtin_neon_vld1q_dup_v(__p0, 51); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_vld1q_dup_v(__p0, 51)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
12028c12047
< __ret = (uint16x8_t) __builtin_neon_vld1q_dup_v(__p0, 49); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vld1q_dup_v(__p0, 49)); \
12034,12035c12053,12054
< __ret = (uint16x8_t) __builtin_neon_vld1q_dup_v(__p0, 49); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vld1q_dup_v(__p0, 49)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
12043c12062
< __ret = (int8x16_t) __builtin_neon_vld1q_dup_v(__p0, 32); \
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vld1q_dup_v(__p0, 32)); \
12049,12050c12068,12069
< __ret = (int8x16_t) __builtin_neon_vld1q_dup_v(__p0, 32); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vld1q_dup_v(__p0, 32)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
12058c12077
< __ret = (float32x4_t) __builtin_neon_vld1q_dup_v(__p0, 41); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vld1q_dup_v(__p0, 41)); \
12064,12065c12083,12084
< __ret = (float32x4_t) __builtin_neon_vld1q_dup_v(__p0, 41); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vld1q_dup_v(__p0, 41)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
12073c12092
< __ret = (int32x4_t) __builtin_neon_vld1q_dup_v(__p0, 34); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vld1q_dup_v(__p0, 34)); \
12079,12080c12098,12099
< __ret = (int32x4_t) __builtin_neon_vld1q_dup_v(__p0, 34); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vld1q_dup_v(__p0, 34)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
12088c12107
< __ret = (int64x2_t) __builtin_neon_vld1q_dup_v(__p0, 35); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_vld1q_dup_v(__p0, 35)); \
12094,12095c12113,12114
< __ret = (int64x2_t) __builtin_neon_vld1q_dup_v(__p0, 35); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_vld1q_dup_v(__p0, 35)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
12103c12122
< __ret = (int16x8_t) __builtin_neon_vld1q_dup_v(__p0, 33); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vld1q_dup_v(__p0, 33)); \
12109,12110c12128,12129
< __ret = (int16x8_t) __builtin_neon_vld1q_dup_v(__p0, 33); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vld1q_dup_v(__p0, 33)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
12118c12137
< __ret = (uint8x8_t) __builtin_neon_vld1_dup_v(__p0, 16); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vld1_dup_v(__p0, 16)); \
12124,12125c12143,12144
< __ret = (uint8x8_t) __builtin_neon_vld1_dup_v(__p0, 16); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vld1_dup_v(__p0, 16)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
12133c12152
< __ret = (uint32x2_t) __builtin_neon_vld1_dup_v(__p0, 18); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vld1_dup_v(__p0, 18)); \
12139,12140c12158,12159
< __ret = (uint32x2_t) __builtin_neon_vld1_dup_v(__p0, 18); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vld1_dup_v(__p0, 18)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
12147c12166
< __ret = (uint64x1_t) __builtin_neon_vld1_dup_v(__p0, 19); \
---
> __ret = __builtin_bit_cast(uint64x1_t, __builtin_neon_vld1_dup_v(__p0, 19)); \
12153c12172
< __ret = (uint16x4_t) __builtin_neon_vld1_dup_v(__p0, 17); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vld1_dup_v(__p0, 17)); \
12159,12160c12178,12179
< __ret = (uint16x4_t) __builtin_neon_vld1_dup_v(__p0, 17); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vld1_dup_v(__p0, 17)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
12168c12187
< __ret = (int8x8_t) __builtin_neon_vld1_dup_v(__p0, 0); \
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vld1_dup_v(__p0, 0)); \
12174,12175c12193,12194
< __ret = (int8x8_t) __builtin_neon_vld1_dup_v(__p0, 0); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vld1_dup_v(__p0, 0)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
12183c12202
< __ret = (float32x2_t) __builtin_neon_vld1_dup_v(__p0, 9); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vld1_dup_v(__p0, 9)); \
12189,12190c12208,12209
< __ret = (float32x2_t) __builtin_neon_vld1_dup_v(__p0, 9); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vld1_dup_v(__p0, 9)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
12198c12217
< __ret = (int32x2_t) __builtin_neon_vld1_dup_v(__p0, 2); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vld1_dup_v(__p0, 2)); \
12204,12205c12223,12224
< __ret = (int32x2_t) __builtin_neon_vld1_dup_v(__p0, 2); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vld1_dup_v(__p0, 2)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
12212c12231
< __ret = (int64x1_t) __builtin_neon_vld1_dup_v(__p0, 3); \
---
> __ret = __builtin_bit_cast(int64x1_t, __builtin_neon_vld1_dup_v(__p0, 3)); \
12218c12237
< __ret = (int16x4_t) __builtin_neon_vld1_dup_v(__p0, 1); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vld1_dup_v(__p0, 1)); \
12224,12225c12243,12244
< __ret = (int16x4_t) __builtin_neon_vld1_dup_v(__p0, 1); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vld1_dup_v(__p0, 1)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
12234c12253
< __ret = (poly8x8_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 4); \
---
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 4)); \
12241,12243c12260,12262
< poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly8x8_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__rev1, __p2, 4); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(poly8x8_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 4)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
12252c12271
< __ret = (poly16x4_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 5); \
---
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 5)); \
12259,12261c12278,12280
< poly16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (poly16x4_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__rev1, __p2, 5); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> poly16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(poly16x4_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 5)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
12270c12289
< __ret = (poly8x16_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 36); \
---
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 36)); \
12277,12279c12296,12298
< poly8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly8x16_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 36); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(poly8x16_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 36)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
12288c12307
< __ret = (poly16x8_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 37); \
---
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 37)); \
12295,12297c12314,12316
< poly16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (poly16x8_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 37); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> poly16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(poly16x8_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 37)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
12306c12325
< __ret = (uint8x16_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 48); \
---
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 48)); \
12313,12315c12332,12334
< uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint8x16_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 48); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(uint8x16_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 48)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
12324c12343
< __ret = (uint32x4_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 50); \
---
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 50)); \
12331,12333c12350,12352
< uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (uint32x4_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 50); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> uint32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(uint32x4_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 50)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
12342c12361
< __ret = (uint64x2_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 51); \
---
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 51)); \
12349,12351c12368,12370
< uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 1, 0); \
< __ret = (uint64x2_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 51); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> uint64x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(uint64x2_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 51)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
12360c12379
< __ret = (uint16x8_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 49); \
---
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 49)); \
12367,12369c12386,12388
< uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint16x8_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 49); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(uint16x8_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 49)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
12378c12397
< __ret = (int8x16_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 32); \
---
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 32)); \
12385,12387c12404,12406
< int8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int8x16_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 32); \
< __ret = __builtin_shufflevector(__ret, __ret, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int8x16_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_8); \
> __ret = __builtin_bit_cast(int8x16_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 32)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_8); \
12396c12415
< __ret = (float32x4_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 41); \
---
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 41)); \
12403,12405c12422,12424
< float32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (float32x4_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 41); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> float32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(float32x4_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 41)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
12414c12433
< __ret = (int32x4_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 34); \
---
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 34)); \
12421,12423c12440,12442
< int32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (int32x4_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 34); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> int32x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_32); \
> __ret = __builtin_bit_cast(int32x4_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 34)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_32); \
12432c12451
< __ret = (int64x2_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 35); \
---
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 35)); \
12439,12441c12458,12460
< int64x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 1, 0); \
< __ret = (int64x2_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 35); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> int64x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_64); \
> __ret = __builtin_bit_cast(int64x2_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 35)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_64); \
12450c12469
< __ret = (int16x8_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__s1, __p2, 33); \
---
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __s1), __p2, 33)); \
12457,12459c12476,12478
< int16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int16x8_t) __builtin_neon_vld1q_lane_v(__p0, (int8x16_t)__rev1, __p2, 33); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int16x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_128_16); \
> __ret = __builtin_bit_cast(int16x8_t, __builtin_neon_vld1q_lane_v(__p0, __builtin_bit_cast(int8x16_t, __rev1), __p2, 33)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_128_16); \
12468c12487
< __ret = (uint8x8_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 16); \
---
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 16)); \
12475,12477c12494,12496
< uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (uint8x8_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__rev1, __p2, 16); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> uint8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(uint8x8_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 16)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
12486c12505
< __ret = (uint32x2_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 18); \
---
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 18)); \
12493,12495c12512,12514
< uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 1, 0); \
< __ret = (uint32x2_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__rev1, __p2, 18); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> uint32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(uint32x2_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 18)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
12503c12522
< __ret = (uint64x1_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 19); \
---
> __ret = __builtin_bit_cast(uint64x1_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 19)); \
12510c12529
< __ret = (uint16x4_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 17); \
---
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 17)); \
12517,12519c12536,12538
< uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (uint16x4_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__rev1, __p2, 17); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> uint16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(uint16x4_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 17)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
12528c12547
< __ret = (int8x8_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 0); \
---
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 0)); \
12535,12537c12554,12556
< int8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret = (int8x8_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__rev1, __p2, 0); \
< __ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> int8x8_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_8); \
> __ret = __builtin_bit_cast(int8x8_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 0)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); \
12546c12565
< __ret = (float32x2_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 9); \
---
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 9)); \
12553,12555c12572,12574
< float32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 1, 0); \
< __ret = (float32x2_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__rev1, __p2, 9); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> float32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(float32x2_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 9)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
12564c12583
< __ret = (int32x2_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 2); \
---
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 2)); \
12571,12573c12590,12592
< int32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 1, 0); \
< __ret = (int32x2_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__rev1, __p2, 2); \
< __ret = __builtin_shufflevector(__ret, __ret, 1, 0); \
---
> int32x2_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_32); \
> __ret = __builtin_bit_cast(int32x2_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 2)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_32); \
12581c12600
< __ret = (int64x1_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 3); \
---
> __ret = __builtin_bit_cast(int64x1_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 3)); \
12588c12607
< __ret = (int16x4_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__s1, __p2, 1); \
---
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __s1), __p2, 1)); \
12595,12597c12614,12616
< int16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, 3, 2, 1, 0); \
< __ret = (int16x4_t) __builtin_neon_vld1_lane_v(__p0, (int8x8_t)__rev1, __p2, 1); \
< __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
---
> int16x4_t __rev1; __rev1 = __builtin_shufflevector(__s1, __s1, __lane_reverse_64_16); \
> __ret = __builtin_bit_cast(int16x4_t, __builtin_neon_vld1_lane_v(__p0, __builtin_bit_cast(int8x8_t, __rev1), __p2, 1)); \
> __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_16); \
12613,12614c12632,12633
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
12630,12631c12649,12650
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
12647,12648c12666,12667
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
12664,12665c12683,12684
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
12681,12682c12700,12701
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
12698,12699c12717,12718
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
12715,12716c12734,12735
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
12732,12733c12751,12752
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
12749,12750c12768,12769
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
12766,12767c12785,12786
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
12783,12784c12802,12803
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
12800,12801c12819,12820
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
12817,12818c12836,12837
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
12834,12835c12853,12854
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
12851,12852c12870,12871
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
12873,12874c12892,12893
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
12890,12891c12909,12910
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
12907,12908c12926,12927
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
12924,12925c12943,12944
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
12946,12947c12965,12966
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
12963,12965c12982,12984
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
12981,12983c13000,13002
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
12999,13001c13018,13020
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
13017,13019c13036,13038
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
13035,13037c13054,13056
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
13053,13055c13072,13074
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
13071,13073c13090,13092
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_64); \
13089,13091c13108,13110
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
13107,13109c13126,13128
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
13125,13127c13144,13146
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
13143,13145c13162,13164
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
13161,13163c13180,13182
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_64); \
13179,13181c13198,13200
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
13197,13199c13216,13218
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
13215,13217c13234,13236
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
13238,13240c13257,13259
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
13256,13258c13275,13277
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
13274,13276c13293,13295
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
13292,13294c13311,13313
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
13315,13317c13334,13336
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
13333,13336c13352,13355
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
13352,13355c13371,13374
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
13371,13374c13390,13393
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_8); \
13390,13393c13409,13412
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
13409,13412c13428,13431
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_8); \
13428,13431c13447,13450
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
13447,13450c13466,13469
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_64); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_64); \
13466,13469c13485,13488
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
13485,13488c13504,13507
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_8); \
13504,13507c13523,13526
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
13523,13526c13542,13545
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
13542,13545c13561,13564
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_64); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_64); \
13561,13564c13580,13583
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
13580,13583c13599,13602
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
13599,13602c13618,13621
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_32); \
13623,13626c13642,13645
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
13642,13645c13661,13664
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
13661,13664c13680,13683
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_32); \
13680,13683c13699,13702
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_32); \
13704,13707c13723,13726
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
13723,13724c13742,13743
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
13740,13741c13759,13760
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
13757,13758c13776,13777
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
13774,13775c13793,13794
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
13791,13792c13810,13811
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
13808,13809c13827,13828
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
13825,13826c13844,13845
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
13842,13843c13861,13862
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
13859,13860c13878,13879
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
13876,13877c13895,13896
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
13893,13894c13912,13913
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
13910,13911c13929,13930
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
13927,13928c13946,13947
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
13949,13950c13968,13969
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
13966,13967c13985,13986
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
13983,13984c14002,14003
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
14000,14001c14019,14020
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
14022,14023c14041,14042
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
14039,14040c14058,14059
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
14056,14057c14075,14076
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
14073,14074c14092,14093
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
14090,14091c14109,14110
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
14107,14108c14126,14127
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
14124,14125c14143,14144
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
14141,14142c14160,14161
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
14158,14159c14177,14178
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
14175,14176c14194,14195
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
14192,14193c14211,14212
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
14209,14210c14228,14229
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
14226,14227c14245,14246
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
14243,14244c14262,14263
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
14260,14261c14279,14280
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
14277,14278c14296,14297
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
14299,14300c14318,14319
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
14316,14317c14335,14336
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
14333,14334c14352,14353
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
14350,14351c14369,14370
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
14372,14373c14391,14392
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
14382c14401
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 4); \
---
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 4); \
14390,14392c14409,14411
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 4); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_8); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_8); \
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 4); \
14394,14395c14413,14414
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
14404c14423
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 5); \
---
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 5); \
14412,14414c14431,14433
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 5); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 5); \
14416,14417c14435,14436
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
14426c14445
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], __p2, 37); \
---
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __p2, 37); \
14434,14436c14453,14455
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], __p2, 37); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __p2, 37); \
14438,14439c14457,14458
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
14448c14467
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], __p2, 50); \
---
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __p2, 50); \
14456,14458c14475,14477
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], __p2, 50); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_32); \
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __p2, 50); \
14460,14461c14479,14480
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
14470c14489
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], __p2, 49); \
---
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __p2, 49); \
14478,14480c14497,14499
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], __p2, 49); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __p2, 49); \
14482,14483c14501,14502
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
14492c14511
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], __p2, 41); \
---
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __p2, 41); \
14500,14502c14519,14521
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], __p2, 41); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_32); \
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __p2, 41); \
14504,14505c14523,14524
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
14514c14533
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], __p2, 34); \
---
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __p2, 34); \
14522,14524c14541,14543
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], __p2, 34); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_32); \
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __p2, 34); \
14526,14527c14545,14546
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
14536c14555
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], __p2, 33); \
---
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __p2, 33); \
14544,14546c14563,14565
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld2q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], __p2, 33); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __builtin_neon_vld2q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __p2, 33); \
14548,14549c14567,14568
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
14558c14577
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 16); \
---
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 16); \
14566,14568c14585,14587
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 16); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_8); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_8); \
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 16); \
14570,14571c14589,14590
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
14580c14599
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 18); \
---
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 18); \
14588,14590c14607,14609
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 1, 0); \
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 18); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_32); \
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 18); \
14592,14593c14611,14612
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
14602c14621
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 17); \
---
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 17); \
14610,14612c14629,14631
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 17); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 17); \
14614,14615c14633,14634
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
14624c14643
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 0); \
---
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 0); \
14632,14634c14651,14653
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 0); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_8); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_8); \
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 0); \
14636,14637c14655,14656
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
14646c14665
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 9); \
---
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 9); \
14654,14656c14673,14675
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 1, 0); \
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 9); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_32); \
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 9); \
14658,14659c14677,14678
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
14668c14687
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 2); \
---
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 2); \
14676,14678c14695,14697
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 1, 0); \
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 2); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_32); \
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 2); \
14680,14681c14699,14700
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
14690c14709
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], __p2, 1); \
---
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __p2, 1); \
14698,14700c14717,14719
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __builtin_neon_vld2_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], __p2, 1); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __builtin_neon_vld2_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __p2, 1); \
14702,14703c14721,14722
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
14719,14721c14738,14740
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
14737,14739c14756,14758
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
14755,14757c14774,14776
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
14773,14775c14792,14794
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
14791,14793c14810,14812
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
14809,14811c14828,14830
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
14827,14829c14846,14848
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
14845,14847c14864,14866
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
14863,14865c14882,14884
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
14881,14883c14900,14902
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
14899,14901c14918,14920
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
14917,14919c14936,14938
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
14935,14937c14954,14956
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
14958,14960c14977,14979
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
14976,14978c14995,14997
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
14994,14996c15013,15015
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
15012,15014c15031,15033
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
15035,15037c15054,15056
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
15053,15055c15072,15074
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
15071,15073c15090,15092
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
15089,15091c15108,15110
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
15107,15109c15126,15128
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
15125,15127c15144,15146
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
15143,15145c15162,15164
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
15161,15163c15180,15182
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_64); \
15179,15181c15198,15200
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
15197,15199c15216,15218
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
15215,15217c15234,15236
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
15233,15235c15252,15254
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
15251,15253c15270,15272
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_64); \
15269,15271c15288,15290
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
15287,15289c15306,15308
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
15305,15307c15324,15326
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
15328,15330c15347,15349
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
15346,15348c15365,15367
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
15364,15366c15383,15385
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
15382,15384c15401,15403
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
15405,15407c15424,15426
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
15416c15435
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 4); \
---
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 4); \
15424,15427c15443,15446
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 4); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_8); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_8); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_8); \
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 4); \
15429,15431c15448,15450
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
15440c15459
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 5); \
---
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 5); \
15448,15451c15467,15470
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 5); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 5); \
15453,15455c15472,15474
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
15464c15483
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], __p2, 37); \
---
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __p2, 37); \
15472,15475c15491,15494
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], __p2, 37); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __p2, 37); \
15477,15479c15496,15498
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
15488c15507
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], __p2, 50); \
---
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __p2, 50); \
15496,15499c15515,15518
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], __p2, 50); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_32); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_32); \
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __p2, 50); \
15501,15503c15520,15522
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
15512c15531
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], __p2, 49); \
---
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __p2, 49); \
15520,15523c15539,15542
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], __p2, 49); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __p2, 49); \
15525,15527c15544,15546
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
15536c15555
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], __p2, 41); \
---
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __p2, 41); \
15544,15547c15563,15566
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], __p2, 41); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_32); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_32); \
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __p2, 41); \
15549,15551c15568,15570
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
15560c15579
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], __p2, 34); \
---
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __p2, 34); \
15568,15571c15587,15590
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], __p2, 34); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_32); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_32); \
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __p2, 34); \
15573,15575c15592,15594
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
15584c15603
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], __p2, 33); \
---
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __p2, 33); \
15592,15595c15611,15614
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld3q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], __p2, 33); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __builtin_neon_vld3q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __p2, 33); \
15597,15599c15616,15618
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
15608c15627
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 16); \
---
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 16); \
15616,15619c15635,15638
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 16); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_8); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_8); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_8); \
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 16); \
15621,15623c15640,15642
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
15632c15651
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 18); \
---
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 18); \
15640,15643c15659,15662
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 1, 0); \
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 18); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_32); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_32); \
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 18); \
15645,15647c15664,15666
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
15656c15675
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 17); \
---
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 17); \
15664,15667c15683,15686
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 17); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 17); \
15669,15671c15688,15690
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
15680c15699
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 0); \
---
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 0); \
15688,15691c15707,15710
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 0); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_8); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_8); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_8); \
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 0); \
15693,15695c15712,15714
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
15704c15723
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 9); \
---
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 9); \
15712,15715c15731,15734
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 1, 0); \
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 9); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_32); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_32); \
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 9); \
15717,15719c15736,15738
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
15728c15747
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 2); \
---
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 2); \
15736,15739c15755,15758
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 1, 0); \
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 2); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_32); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_32); \
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 2); \
15741,15743c15760,15762
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
15752c15771
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], __p2, 1); \
---
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __p2, 1); \
15760,15763c15779,15782
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __builtin_neon_vld3_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], __p2, 1); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __builtin_neon_vld3_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __p2, 1); \
15765,15767c15784,15786
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
15783,15786c15802,15805
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
15802,15805c15821,15824
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
15821,15824c15840,15843
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_8); \
15840,15843c15859,15862
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
15859,15862c15878,15881
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_8); \
15878,15881c15897,15900
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
15897,15900c15916,15919
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
15916,15919c15935,15938
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_8); \
15935,15938c15954,15957
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
15954,15957c15973,15976
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
15973,15976c15992,15995
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
15992,15995c16011,16014
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
16011,16014c16030,16033
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_32); \
16035,16038c16054,16057
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
16054,16057c16073,16076
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
16073,16076c16092,16095
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_32); \
16092,16095c16111,16114
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_32); \
16116,16119c16135,16138
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
16135,16138c16154,16157
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
16154,16157c16173,16176
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
16173,16176c16192,16195
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_8); \
16192,16195c16211,16214
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
16211,16214c16230,16233
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_8); \
16230,16233c16249,16252
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
16249,16252c16268,16271
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_64); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_64); \
16268,16271c16287,16290
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
16287,16290c16306,16309
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_8); \
16306,16309c16325,16328
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
16325,16328c16344,16347
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
16344,16347c16363,16366
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_64); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_64); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_64); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_64); \
16363,16366c16382,16385
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
16382,16385c16401,16404
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
16401,16404c16420,16423
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_32); \
16425,16428c16444,16447
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
16444,16447c16463,16466
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
16463,16466c16482,16485
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_32); \
16482,16485c16501,16504
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_32); \
16506,16509c16525,16528
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
16518c16537
< __builtin_neon_vld4_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], (int8x8_t)__s1.val[3], __p2, 4); \
---
> __builtin_neon_vld4_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __builtin_bit_cast(int8x8_t, __s1.val[3]), __p2, 4); \
16526,16530c16545,16549
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld4_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], (int8x8_t)__rev1.val[3], __p2, 4); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_8); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_8); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_8); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_64_8); \
> __builtin_neon_vld4_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __builtin_bit_cast(int8x8_t, __rev1.val[3]), __p2, 4); \
16532,16535c16551,16554
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
16544c16563
< __builtin_neon_vld4_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], (int8x8_t)__s1.val[3], __p2, 5); \
---
> __builtin_neon_vld4_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __builtin_bit_cast(int8x8_t, __s1.val[3]), __p2, 5); \
16552,16556c16571,16575
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 3, 2, 1, 0); \
< __builtin_neon_vld4_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], (int8x8_t)__rev1.val[3], __p2, 5); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_64_16); \
> __builtin_neon_vld4_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __builtin_bit_cast(int8x8_t, __rev1.val[3]), __p2, 5); \
16558,16561c16577,16580
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_16); \
16570c16589
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], (int8x16_t)__s1.val[3], __p2, 37); \
---
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __builtin_bit_cast(int8x16_t, __s1.val[3]), __p2, 37); \
16578,16582c16597,16601
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], (int8x16_t)__rev1.val[3], __p2, 37); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_128_16); \
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __builtin_bit_cast(int8x16_t, __rev1.val[3]), __p2, 37); \
16584,16587c16603,16606
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
16596c16615
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], (int8x16_t)__s1.val[3], __p2, 50); \
---
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __builtin_bit_cast(int8x16_t, __s1.val[3]), __p2, 50); \
16604,16608c16623,16627
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 3, 2, 1, 0); \
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], (int8x16_t)__rev1.val[3], __p2, 50); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_32); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_32); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_128_32); \
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __builtin_bit_cast(int8x16_t, __rev1.val[3]), __p2, 50); \
16610,16613c16629,16632
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
16622c16641
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], (int8x16_t)__s1.val[3], __p2, 49); \
---
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __builtin_bit_cast(int8x16_t, __s1.val[3]), __p2, 49); \
16630,16634c16649,16653
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], (int8x16_t)__rev1.val[3], __p2, 49); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_128_16); \
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __builtin_bit_cast(int8x16_t, __rev1.val[3]), __p2, 49); \
16636,16639c16655,16658
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
16648c16667
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], (int8x16_t)__s1.val[3], __p2, 41); \
---
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __builtin_bit_cast(int8x16_t, __s1.val[3]), __p2, 41); \
16656,16660c16675,16679
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 3, 2, 1, 0); \
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], (int8x16_t)__rev1.val[3], __p2, 41); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_32); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_32); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_128_32); \
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __builtin_bit_cast(int8x16_t, __rev1.val[3]), __p2, 41); \
16662,16665c16681,16684
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
16674c16693
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], (int8x16_t)__s1.val[3], __p2, 34); \
---
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __builtin_bit_cast(int8x16_t, __s1.val[3]), __p2, 34); \
16682,16686c16701,16705
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 3, 2, 1, 0); \
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], (int8x16_t)__rev1.val[3], __p2, 34); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_32); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_32); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_128_32); \
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __builtin_bit_cast(int8x16_t, __rev1.val[3]), __p2, 34); \
16688,16691c16707,16710
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_32); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_32); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_32); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_32); \
16700c16719
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__s1.val[0], (int8x16_t)__s1.val[1], (int8x16_t)__s1.val[2], (int8x16_t)__s1.val[3], __p2, 33); \
---
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __s1.val[0]), __builtin_bit_cast(int8x16_t, __s1.val[1]), __builtin_bit_cast(int8x16_t, __s1.val[2]), __builtin_bit_cast(int8x16_t, __s1.val[3]), __p2, 33); \
16708,16712c16727,16731
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld4q_lane_v(&__ret, __p0, (int8x16_t)__rev1.val[0], (int8x16_t)__rev1.val[1], (int8x16_t)__rev1.val[2], (int8x16_t)__rev1.val[3], __p2, 33); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_128_16); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_128_16); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_128_16); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_128_16); \
> __builtin_neon_vld4q_lane_v(&__ret, __p0, __builtin_bit_cast(int8x16_t, __rev1.val[0]), __builtin_bit_cast(int8x16_t, __rev1.val[1]), __builtin_bit_cast(int8x16_t, __rev1.val[2]), __builtin_bit_cast(int8x16_t, __rev1.val[3]), __p2, 33); \
16714,16717c16733,16736
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_128_16); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_128_16); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_128_16); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_128_16); \
16726c16745
< __builtin_neon_vld4_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], (int8x8_t)__s1.val[3], __p2, 16); \
---
> __builtin_neon_vld4_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __builtin_bit_cast(int8x8_t, __s1.val[3]), __p2, 16); \
16734,16738c16753,16757
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
< __builtin_neon_vld4_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], (int8x8_t)__rev1.val[3], __p2, 16); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_8); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_8); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_8); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_64_8); \
> __builtin_neon_vld4_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __builtin_bit_cast(int8x8_t, __rev1.val[1]), __builtin_bit_cast(int8x8_t, __rev1.val[2]), __builtin_bit_cast(int8x8_t, __rev1.val[3]), __p2, 16); \
16740,16743c16759,16762
< __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], 7, 6, 5, 4, 3, 2, 1, 0); \
< __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], 7, 6, 5, 4, 3, 2, 1, 0); \
---
> __ret.val[0] = __builtin_shufflevector(__ret.val[0], __ret.val[0], __lane_reverse_64_8); \
> __ret.val[1] = __builtin_shufflevector(__ret.val[1], __ret.val[1], __lane_reverse_64_8); \
> __ret.val[2] = __builtin_shufflevector(__ret.val[2], __ret.val[2], __lane_reverse_64_8); \
> __ret.val[3] = __builtin_shufflevector(__ret.val[3], __ret.val[3], __lane_reverse_64_8); \
16752c16771
< __builtin_neon_vld4_lane_v(&__ret, __p0, (int8x8_t)__s1.val[0], (int8x8_t)__s1.val[1], (int8x8_t)__s1.val[2], (int8x8_t)__s1.val[3], __p2, 18); \
---
> __builtin_neon_vld4_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __s1.val[0]), __builtin_bit_cast(int8x8_t, __s1.val[1]), __builtin_bit_cast(int8x8_t, __s1.val[2]), __builtin_bit_cast(int8x8_t, __s1.val[3]), __p2, 18); \
16760,16764c16779,16783
< __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], 1, 0); \
< __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], 1, 0); \
< __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], 1, 0); \
< __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], 1, 0); \
< __builtin_neon_vld4_lane_v(&__ret, __p0, (int8x8_t)__rev1.val[0], (int8x8_t)__rev1.val[1], (int8x8_t)__rev1.val[2], (int8x8_t)__rev1.val[3], __p2, 18); \
---
> __rev1.val[0] = __builtin_shufflevector(__s1.val[0], __s1.val[0], __lane_reverse_64_32); \
> __rev1.val[1] = __builtin_shufflevector(__s1.val[1], __s1.val[1], __lane_reverse_64_32); \
> __rev1.val[2] = __builtin_shufflevector(__s1.val[2], __s1.val[2], __lane_reverse_64_32); \
> __rev1.val[3] = __builtin_shufflevector(__s1.val[3], __s1.val[3], __lane_reverse_64_32); \
> __builtin_neon_vld4_lane_v(&__ret, __p0, __builtin_bit_cast(int8x8_t, __rev1.val[0]), __
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment