Merge branch 'xmrig:master' into nodonate

2023-09-19 00:41:43 +08:00 · 2023-09-19 00:41:43 +08:00 · 3fd311a202
commit 3fd311a202
parent 6bb0028476 2e77faa80c
548 changed files with 108694 additions and 22872 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,193 @@
+# v6.20.0
+- Added new ARM CPU names.
+- [#2394](https://github.com/xmrig/xmrig/pull/2394) Added new CMake options `ARM_V8` and `ARM_V7`.
+- [#2830](https://github.com/xmrig/xmrig/pull/2830) Added API rebind polling.
+- [#2927](https://github.com/xmrig/xmrig/pull/2927) Fixed compatibility with hwloc 1.11.x.
+- [#3060](https://github.com/xmrig/xmrig/pull/3060) Added x86 to `README.md`.
+- [#3236](https://github.com/xmrig/xmrig/pull/3236) Fixed: receive CUDA loader error on Linux too.
+- [#3290](https://github.com/xmrig/xmrig/pull/3290) Added [Zephyr](https://www.zephyrprotocol.com/) coin support for solo mining.
+
+# v6.19.3
+- [#3245](https://github.com/xmrig/xmrig/issues/3245) Improved algorithm negotiation for donation rounds by sending extra information about current mining job.
+- [#3254](https://github.com/xmrig/xmrig/pull/3254) Tweaked auto-tuning for Intel CPUs.
+- [#3271](https://github.com/xmrig/xmrig/pull/3271) RandomX: optimized program generation.
+- [#3273](https://github.com/xmrig/xmrig/pull/3273) RandomX: fixed undefined behavior.
+- [#3275](https://github.com/xmrig/xmrig/pull/3275) RandomX: fixed `jccErratum` list.
+- [#3280](https://github.com/xmrig/xmrig/pull/3280) Updated example scripts.
+
+# v6.19.2
+- [#3230](https://github.com/xmrig/xmrig/pull/3230) Fixed parsing of `TX_EXTRA_MERGE_MINING_TAG`.
+- [#3232](https://github.com/xmrig/xmrig/pull/3232) Added new `X-Hash-Difficulty` HTTP header.
+- [#3240](https://github.com/xmrig/xmrig/pull/3240) Improved .cmd files when run by shortcuts on another drive.
+- [#3241](https://github.com/xmrig/xmrig/pull/3241) Added view tag calculation (fixes Wownero solo mining issue).
+
+# v6.19.1
+- Resolved deprecated methods warnings with OpenSSL 3.0.
+- [#3213](https://github.com/xmrig/xmrig/pull/3213) Fixed build with 32-bit clang 15.
+- [#3218](https://github.com/xmrig/xmrig/pull/3218) Fixed: `--randomx-wrmsr=-1` worked only on Intel.
+- [#3228](https://github.com/xmrig/xmrig/pull/3228) Fixed build with gcc 13.
+
+# v6.19.0
+- [#3144](https://github.com/xmrig/xmrig/pull/3144) Update to latest `sse2neon.h`.
+- [#3161](https://github.com/xmrig/xmrig/pull/3161) MSVC build: enabled parallel compilation.
+- [#3163](https://github.com/xmrig/xmrig/pull/3163) Improved Zen 3 MSR mod.
+- [#3176](https://github.com/xmrig/xmrig/pull/3176) Update cmake required version to 3.1.
+- [#3182](https://github.com/xmrig/xmrig/pull/3182) DragonflyBSD compilation fixes.
+- [#3196](https://github.com/xmrig/xmrig/pull/3196) Show IP address for failed connections.
+- [#3185](https://github.com/xmrig/xmrig/issues/3185) Fixed macOS DMI reader.
+- [#3198](https://github.com/xmrig/xmrig/pull/3198) Fixed broken RandomX light mode mining.
+- [#3202](https://github.com/xmrig/xmrig/pull/3202) Solo mining: added job timeout (default is 15 seconds).
+
+# v6.18.1
+- [#3129](https://github.com/xmrig/xmrig/pull/3129) Fix: protectRX flushed CPU cache only on MacOS/iOS.
+- [#3126](https://github.com/xmrig/xmrig/pull/3126) Don't reset when pool sends the same job blob.
+- [#3120](https://github.com/xmrig/xmrig/pull/3120) RandomX: optimized `CFROUND` elimination.
+- [#3109](https://github.com/xmrig/xmrig/pull/3109) RandomX: added Blake2 AVX2 version.
+- [#3082](https://github.com/xmrig/xmrig/pull/3082) Fixed GCC 12 warnings.
+- [#3075](https://github.com/xmrig/xmrig/pull/3075) Recognize `armv7ve` as valid ARMv7 target.
+- [#3132](https://github.com/xmrig/xmrig/pull/3132) RandomX: added MSR mod for Zen 4.
+- [#3134](https://github.com/xmrig/xmrig/pull/3134) Added Zen4 to `randomx_boost.sh`.
+
+# v6.18.0
+- [#3067](https://github.com/xmrig/xmrig/pull/3067) Monero v15 network upgrade support and more house keeping.
+  - Removed deprecated AstroBWTv1 and v2.
+  - Fixed debug GhostRider build.
+  - Monero v15 network upgrade support.
+  - Fixed ZMQ debug log.
+  - Improved daemon ZMQ mining stability.
+- [#3054](https://github.com/xmrig/xmrig/pull/3054) Fixes for 32-bit ARM.
+- [#3042](https://github.com/xmrig/xmrig/pull/3042) Fixed being unable to resume from `pause-on-battery`.
+- [#3031](https://github.com/xmrig/xmrig/pull/3031) Fixed `--cpu-priority` not working sometimes.
+- [#3020](https://github.com/xmrig/xmrig/pull/3020) Removed old AstroBWT algorithm.
+
+# v6.17.0
+- [#2954](https://github.com/xmrig/xmrig/pull/2954) **Dero HE fork support (`astrobwt/v2` algorithm).**
+  - [#2961](https://github.com/xmrig/xmrig/pull/2961) Dero HE (`astrobwt/v2`) CUDA config generator.
+  - [#2969](https://github.com/xmrig/xmrig/pull/2969) Dero HE (`astrobwt/v2`) OpenCL support.
+- Fixed displayed DMI memory information for empty slots.
+- [#2932](https://github.com/xmrig/xmrig/pull/2932) Fixed GhostRider with hwloc disabled.
+
+# v6.16.4
+- [#2904](https://github.com/xmrig/xmrig/pull/2904) Fixed unaligned memory accesses.
+- [#2908](https://github.com/xmrig/xmrig/pull/2908) Added MSVC/2022 to `version.h`.
+- [#2910](https://github.com/xmrig/xmrig/issues/2910) Fixed donation for GhostRider/RTM.
+
+# v6.16.3
+- [#2778](https://github.com/xmrig/xmrig/pull/2778) Fixed `READY threads X/X` display after algorithm switching.
+- [#2782](https://github.com/xmrig/xmrig/pull/2782) Updated GhostRider documentation.
+- [#2815](https://github.com/xmrig/xmrig/pull/2815) Fixed `cn-heavy` in 32-bit builds.
+- [#2827](https://github.com/xmrig/xmrig/pull/2827) GhostRider: set correct priority for helper threads.
+- [#2837](https://github.com/xmrig/xmrig/pull/2837) RandomX: don't restart mining threads when the seed changes.
+- [#2848](https://github.com/xmrig/xmrig/pull/2848) GhostRider: added support for `client.reconnect` method.
+- [#2856](https://github.com/xmrig/xmrig/pull/2856) Fix for short responses from some Raptoreum pools.
+- [#2873](https://github.com/xmrig/xmrig/pull/2873) Fixed GhostRider benchmark on single-core systems.
+- [#2882](https://github.com/xmrig/xmrig/pull/2882) Fixed ARMv7 compilation.
+- [#2893](https://github.com/xmrig/xmrig/pull/2893) KawPow OpenCL: use separate UV loop for building programs.
+
+# v6.16.2
+- [#2751](https://github.com/xmrig/xmrig/pull/2751) Fixed crash on CPUs supporting VAES and running GCC-compiled xmrig.
+- [#2761](https://github.com/xmrig/xmrig/pull/2761) Fixed broken auto-tuning in GCC Windows build.
+- [#2771](https://github.com/xmrig/xmrig/issues/2771) Fixed environment variables support for GhostRider and KawPow. 
+- [#2769](https://github.com/xmrig/xmrig/pull/2769) Performance fixes:
+  - Fixed several performance bottlenecks introduced in v6.16.1.
+  - Fixed overall GCC-compiled build performance, it's the same speed as MSVC build now.
+  - **Linux builds are up to 10% faster now compared to v6.16.0 GCC build.**
+  - **Windows builds are up to 5% faster now compared to v6.16.0 MSVC build.**
+
+# v6.16.1
+- [#2729](https://github.com/xmrig/xmrig/pull/2729) GhostRider fixes:
+  - Added average hashrate display.
+  - Fixed the number of threads shown at startup.
+  - Fixed `--threads` or `-t` command line option (but `--cpu-max-threads-hint` is recommended to use).
+- [#2738](https://github.com/xmrig/xmrig/pull/2738) GhostRider fixes:
+  - Fixed "difficulty is not a number" error when diff is high on some pools.
+  - Fixed GhostRider compilation when `WITH_KAWPOW=OFF`.
+- [#2740](https://github.com/xmrig/xmrig/pull/2740) Added VAES support for Cryptonight variants **+4% speedup on Zen3**.
+  - VAES instructions are available on Intel Ice Lake/AMD Zen3 and newer CPUs.
+  - +4% speedup on Ryzen 5 5600X.
+
+# v6.16.0
+- [#2712](https://github.com/xmrig/xmrig/pull/2712) **GhostRider algorithm (Raptoreum) support**: read the [RELEASE NOTES](src/crypto/ghostrider/README.md) for quick start guide and performance comparisons.
+- [#2682](https://github.com/xmrig/xmrig/pull/2682) Fixed: use cn-heavy optimization only for Vermeer CPUs.
+- [#2684](https://github.com/xmrig/xmrig/pull/2684) MSR mod: fix for error 183.
+
+# v6.15.3
+- [#2614](https://github.com/xmrig/xmrig/pull/2614) OpenCL fixes for non-AMD platforms.
+- [#2623](https://github.com/xmrig/xmrig/pull/2623) Fixed compiling without kawpow.
+- [#2636](https://github.com/xmrig/xmrig/pull/2636) [#2639](https://github.com/xmrig/xmrig/pull/2639) AstroBWT speedup (up to +35%).
+- [#2646](https://github.com/xmrig/xmrig/pull/2646) Fixed MSVC compilation error.
+
+# v6.15.2
+- [#2606](https://github.com/xmrig/xmrig/pull/2606) Fixed: AstroBWT auto-config ignored `max-threads-hint`.
+- Fixed possible crash on Windows (regression in v6.15.1).
+
+# v6.15.1
+- [#2586](https://github.com/xmrig/xmrig/pull/2586) Fixed Windows 7 compatibility.
+- [#2594](https://github.com/xmrig/xmrig/pull/2594) Added Windows taskbar icon colors.
+
+# v6.15.0
+- [#2548](https://github.com/xmrig/xmrig/pull/2548) Added automatic coin detection for daemon mining.
+- [#2563](https://github.com/xmrig/xmrig/pull/2563) Added new algorithm RandomX Graft (`rx/graft`).
+- [#2565](https://github.com/xmrig/xmrig/pull/2565) AstroBWT: added AVX2 Salsa20 implementation.
+- Added support for new CUDA plugin API (previous API still supported).
+
+# v6.14.1
+- [#2532](https://github.com/xmrig/xmrig/pull/2532) Refactoring: stable (persistent) algorithms IDs.
+- [#2537](https://github.com/xmrig/xmrig/pull/2537) Fixed Termux build.
+
+# v6.14.0
+- [#2484](https://github.com/xmrig/xmrig/pull/2484) Added ZeroMQ support for solo mining.
+- [#2476](https://github.com/xmrig/xmrig/issues/2476) Fixed crash in DMI memory reader.
+- [#2492](https://github.com/xmrig/xmrig/issues/2492) Added missing `--huge-pages-jit` command line option.
+- [#2512](https://github.com/xmrig/xmrig/pull/2512) Added show the number of transactions in pool job.
+
+# v6.13.1
+- [#2468](https://github.com/xmrig/xmrig/pull/2468) Fixed regression in previous version: don't send miner signature during regular mining.
+
+# v6.13.0
+- [#2445](https://github.com/xmrig/xmrig/pull/2445) Added support for solo mining with miner signatures for the upcoming Wownero fork.
+
+# v6.12.2
+- [#2280](https://github.com/xmrig/xmrig/issues/2280) GPU backends are now disabled in benchmark mode.
+- [#2322](https://github.com/xmrig/xmrig/pull/2322) Improved MSR compatibility with recent Linux kernels and updated `randomx_boost.sh`.
+- [#2340](https://github.com/xmrig/xmrig/pull/2340) Fixed AES detection on FreeBSD on ARM.
+- [#2341](https://github.com/xmrig/xmrig/pull/2341) `sse2neon` updated to the latest version.
+- [#2351](https://github.com/xmrig/xmrig/issues/2351) Fixed help output for `--cpu-priority` and `--cpu-affinity` option.
+- [#2375](https://github.com/xmrig/xmrig/pull/2375) Fixed macOS CUDA backend default loader name.
+- [#2378](https://github.com/xmrig/xmrig/pull/2378) Fixed broken light mode mining on x86.
+- [#2379](https://github.com/xmrig/xmrig/pull/2379) Fixed CL code for KawPow where it assumes everything is AMD.
+- [#2386](https://github.com/xmrig/xmrig/pull/2386) RandomX: enabled `IMUL_RCP` optimization for light mode mining.
+- [#2393](https://github.com/xmrig/xmrig/pull/2393) RandomX: added BMI2 version for scratchpad prefetch.
+- [#2395](https://github.com/xmrig/xmrig/pull/2395) RandomX: rewrote dataset read code.
+- [#2398](https://github.com/xmrig/xmrig/pull/2398) RandomX: optimized ARMv8 dataset read.
+- Added `argon2/ninja` alias for `argon2/wrkz` algorithm.
+
+# v6.12.1
+- [#2296](https://github.com/xmrig/xmrig/pull/2296) Fixed Zen3 assembly code for `cn/upx2` algorithm.
+
+# v6.12.0
+- [#2276](https://github.com/xmrig/xmrig/pull/2276) Added support for Uplexa (`cn/upx2` algorithm).
+- [#2261](https://github.com/xmrig/xmrig/pull/2261) Show total hashrate if compiled without OpenCL.
+- [#2289](https://github.com/xmrig/xmrig/pull/2289) RandomX: optimized `IMUL_RCP` instruction.
+- Added support for `--user` command line option for online benchmark.
+
+# v6.11.2
+- [#2207](https://github.com/xmrig/xmrig/issues/2207) Fixed regression in HTTP parser and llhttp updated to v5.1.0.
+
+# v6.11.1
+- [#2239](https://github.com/xmrig/xmrig/pull/2239) Fixed broken `coin` setting functionality.
+
+# v6.11.0
+- [#2196](https://github.com/xmrig/xmrig/pull/2196) Improved DNS subsystem and added new DNS specific options.
+- [#2172](https://github.com/xmrig/xmrig/pull/2172) Fixed build on Alpine 3.13.
+- [#2177](https://github.com/xmrig/xmrig/pull/2177) Fixed ARM specific compilation error with GCC 10.2.
+- [#2214](https://github.com/xmrig/xmrig/pull/2214) [#2216](https://github.com/xmrig/xmrig/pull/2216) [#2235](https://github.com/xmrig/xmrig/pull/2235) Optimized `cn-heavy` algorithm.
+- [#2217](https://github.com/xmrig/xmrig/pull/2217) Fixed mining job creation sequence.
+- [#2225](https://github.com/xmrig/xmrig/pull/2225) Fixed build without OpenCL support on some systems.
+- [#2229](https://github.com/xmrig/xmrig/pull/2229) Don't use RandomX JIT if `WITH_ASM=OFF`.
+- [#2228](https://github.com/xmrig/xmrig/pull/2228) Removed useless code for cryptonight algorithms.
+- [#2234](https://github.com/xmrig/xmrig/pull/2234) Fixed build error on gcc 4.8.
+
 # v6.10.0
 - [#2122](https://github.com/xmrig/xmrig/pull/2122) Fixed pause logic when both pause on battery and user activity are enabled.
 - [#2123](https://github.com/xmrig/xmrig/issues/2123) Fixed compatibility with gcc 4.8.
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -1,14 +1,15 @@
-cmake_minimum_required(VERSION 2.8.12)
+cmake_minimum_required(VERSION 3.1)
 project(xmrig)

 option(WITH_HWLOC           "Enable hwloc support" ON)
 option(WITH_CN_LITE         "Enable CryptoNight-Lite algorithms family" ON)
 option(WITH_CN_HEAVY        "Enable CryptoNight-Heavy algorithms family" ON)
 option(WITH_CN_PICO         "Enable CryptoNight-Pico algorithm" ON)
+option(WITH_CN_FEMTO        "Enable CryptoNight-UPX2 algorithm" ON)
 option(WITH_RANDOMX         "Enable RandomX algorithms family" ON)
 option(WITH_ARGON2          "Enable Argon2 algorithms family" ON)
-option(WITH_ASTROBWT        "Enable AstroBWT algorithms family" ON)
 option(WITH_KAWPOW          "Enable KawPow algorithms family" ON)
+option(WITH_GHOSTRIDER      "Enable GhostRider algorithm" ON)
 option(WITH_HTTP            "Enable HTTP protocol support (client/server)" ON)
 option(WITH_DEBUG_LOG       "Enable debug log output" OFF)
 option(WITH_TLS             "Enable OpenSSL support" ON)
@ -17,6 +18,8 @@ option(WITH_MSR             "Enable MSR mod & 1st-gen Ryzen fix" ON)
 option(WITH_ENV_VARS        "Enable environment variables support in config file" ON)
 option(WITH_EMBEDDED_CONFIG "Enable internal embedded JSON config" OFF)
 option(WITH_OPENCL          "Enable OpenCL backend" ON)
+set(WITH_OPENCL_VERSION 200 CACHE STRING "Target OpenCL version")
+set_property(CACHE WITH_OPENCL_VERSION PROPERTY STRINGS 120 200 210 220)
 option(WITH_CUDA            "Enable CUDA backend" ON)
 option(WITH_NVML            "Enable NVML (NVIDIA Management Library) support (only if CUDA backend enabled)" ON)
 option(WITH_ADL             "Enable ADL (AMD Display Library) or sysfs support (only if OpenCL backend enabled)" ON)
@ -24,12 +27,15 @@ option(WITH_STRICT_CACHE    "Enable strict checks for OpenCL cache" ON)
 option(WITH_INTERLEAVE_DEBUG_LOG "Enable debug log for threads interleave" OFF)
 option(WITH_PROFILING       "Enable profiling for developers" OFF)
 option(WITH_SSE4_1          "Enable SSE 4.1 for Blake2" ON)
+option(WITH_AVX2            "Enable AVX2 for Blake2" ON)
+option(WITH_VAES            "Enable VAES instructions for Cryptonight" ON)
 option(WITH_BENCHMARK       "Enable builtin RandomX benchmark and stress test" ON)
 option(WITH_SECURE_JIT      "Enable secure access to JIT memory" OFF)
 option(WITH_DMI             "Enable DMI/SMBIOS reader" ON)

 option(BUILD_STATIC         "Build static binary" OFF)
-option(ARM_TARGET           "Force use specific ARM target 8 or 7" 0)
+option(ARM_V8               "Force ARMv8 (64 bit) architecture, use with caution if automatic detection fails, but you sure it may work" OFF)
+option(ARM_V7               "Force ARMv7 (32 bit) architecture, use with caution if automatic detection fails, but you sure it may work" OFF)
 option(HWLOC_DEBUG          "Enable hwloc debug helpers and log" OFF)


@ -55,6 +61,7 @@ set(HEADERS
    src/core/config/usage.h
    src/core/Controller.h
    src/core/Miner.h
+    src/core/Taskbar.h
    src/net/interfaces/IJobResultListener.h
    src/net/JobResult.h
    src/net/JobResults.h
@ -103,6 +110,7 @@ set(SOURCES
    src/core/config/ConfigTransform.cpp
    src/core/Controller.cpp
    src/core/Miner.cpp
+    src/core/Taskbar.cpp
    src/net/JobResults.cpp
    src/net/Network.cpp
    src/net/strategies/DonateStrategy.cpp
@ -123,6 +131,19 @@ set(SOURCES_CRYPTO
    src/crypto/common/VirtualMemory.cpp
   )

+if (CMAKE_C_COMPILER_ID MATCHES GNU)
+    set_source_files_properties(src/crypto/cn/CnHash.cpp PROPERTIES COMPILE_FLAGS "-Ofast -fno-tree-vectorize")
+endif()
+
+if (WITH_VAES)
+    add_definitions(-DXMRIG_VAES)
+    set(HEADERS_CRYPTO "${HEADERS_CRYPTO}" src/crypto/cn/CryptoNight_x86_vaes.h)
+    set(SOURCES_CRYPTO "${SOURCES_CRYPTO}" src/crypto/cn/CryptoNight_x86_vaes.cpp)
+    if (CMAKE_C_COMPILER_ID MATCHES GNU OR CMAKE_C_COMPILER_ID MATCHES Clang)
+        set_source_files_properties(src/crypto/cn/CryptoNight_x86_vaes.cpp PROPERTIES COMPILE_FLAGS "-Ofast -fno-tree-vectorize -mavx2 -mvaes")
+    endif()
+endif()
+
 if (WITH_HWLOC)
    list(APPEND HEADERS_CRYPTO
        src/crypto/common/NUMAMemoryPool.h
@ -179,8 +200,8 @@ find_package(UV REQUIRED)
 include(cmake/flags.cmake)
 include(cmake/randomx.cmake)
 include(cmake/argon2.cmake)
-include(cmake/astrobwt.cmake)
 include(cmake/kawpow.cmake)
+include(cmake/ghostrider.cmake)
 include(cmake/OpenSSL.cmake)
 include(cmake/asm.cmake)

@ -196,6 +217,10 @@ if (WITH_CN_PICO)
    add_definitions(/DXMRIG_ALGO_CN_PICO)
 endif()

+if (WITH_CN_FEMTO)
+    add_definitions(/DXMRIG_ALGO_CN_FEMTO)
+endif()
+
 if (WITH_EMBEDDED_CONFIG)
    add_definitions(/DXMRIG_FEATURE_EMBEDDED_CONFIG)
 endif()
@ -212,7 +237,7 @@ if (WITH_DEBUG_LOG)
 endif()

 add_executable(${CMAKE_PROJECT_NAME} ${HEADERS} ${SOURCES} ${SOURCES_OS} ${HEADERS_CRYPTO} ${SOURCES_CRYPTO} ${SOURCES_SYSLOG} ${TLS_SOURCES} ${XMRIG_ASM_SOURCES})
-target_link_libraries(${CMAKE_PROJECT_NAME} ${XMRIG_ASM_LIBRARY} ${OPENSSL_LIBRARIES} ${UV_LIBRARIES} ${EXTRA_LIBS} ${CPUID_LIB} ${ARGON2_LIBRARY} ${ETHASH_LIBRARY})
+target_link_libraries(${CMAKE_PROJECT_NAME} ${XMRIG_ASM_LIBRARY} ${OPENSSL_LIBRARIES} ${UV_LIBRARIES} ${EXTRA_LIBS} ${CPUID_LIB} ${ARGON2_LIBRARY} ${ETHASH_LIBRARY} ${GHOSTRIDER_LIBRARY})

 if (WIN32)
    add_custom_command(TARGET ${CMAKE_PROJECT_NAME} POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_SOURCE_DIR}/bin/WinRing0/WinRing0x64.sys" $<TARGET_FILE_DIR:${CMAKE_PROJECT_NAME}>)
@ -220,6 +245,7 @@ if (WIN32)
    add_custom_command(TARGET ${CMAKE_PROJECT_NAME} POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_SOURCE_DIR}/scripts/benchmark_10M.cmd" $<TARGET_FILE_DIR:${CMAKE_PROJECT_NAME}>)
    add_custom_command(TARGET ${CMAKE_PROJECT_NAME} POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_SOURCE_DIR}/scripts/pool_mine_example.cmd" $<TARGET_FILE_DIR:${CMAKE_PROJECT_NAME}>)
    add_custom_command(TARGET ${CMAKE_PROJECT_NAME} POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_SOURCE_DIR}/scripts/solo_mine_example.cmd" $<TARGET_FILE_DIR:${CMAKE_PROJECT_NAME}>)
+    add_custom_command(TARGET ${CMAKE_PROJECT_NAME} POST_BUILD COMMAND ${CMAKE_COMMAND} -E copy_if_different "${CMAKE_SOURCE_DIR}/scripts/rtm_ghostrider_example.cmd" $<TARGET_FILE_DIR:${CMAKE_PROJECT_NAME}>)
 endif()

 if (CMAKE_CXX_COMPILER_ID MATCHES Clang AND CMAKE_BUILD_TYPE STREQUAL Release AND NOT CMAKE_GENERATOR STREQUAL Xcode)
--- a/README.md
+++ b/README.md
@ -7,10 +7,10 @@
 [![GitHub stars](https://img.shields.io/github/stars/xmrig/xmrig.svg)](https://github.com/xmrig/xmrig/stargazers)
 [![GitHub forks](https://img.shields.io/github/forks/xmrig/xmrig.svg)](https://github.com/xmrig/xmrig/network)

-XMRig is a high performance, open source, cross platform RandomX, KawPow, CryptoNight and AstroBWT unified CPU/GPU miner and [RandomX benchmark](https://xmrig.com/benchmark). Official binaries are available for Windows, Linux, macOS and FreeBSD.
+XMRig is a high performance, open source, cross platform RandomX, KawPow, CryptoNight and [GhostRider](https://github.com/xmrig/xmrig/tree/master/src/crypto/ghostrider#readme) unified CPU/GPU miner and [RandomX benchmark](https://xmrig.com/benchmark). Official binaries are available for Windows, Linux, macOS and FreeBSD.

 ## Mining backends
- **CPU** (x64/ARMv8)
+- **CPU** (x86/x64/ARMv7/ARMv8)
 - **OpenCL** for AMD GPUs.
 - **CUDA** for NVIDIA GPUs via external [CUDA plugin](https://github.com/xmrig/xmrig-cuda).

--- a/cmake/astrobwt.cmake
+++ b/cmake/astrobwt.cmake
@ -1,45 +0,0 @@
-if (WITH_ASTROBWT)
-    add_definitions(/DXMRIG_ALGO_ASTROBWT)
-
-    list(APPEND HEADERS_CRYPTO
-        src/crypto/astrobwt/AstroBWT.h
-    )
-
-    list(APPEND SOURCES_CRYPTO
-        src/crypto/astrobwt/AstroBWT.cpp
-    )
-
-    if (XMRIG_ARM)
-        list(APPEND HEADERS_CRYPTO
-            src/crypto/astrobwt/salsa20_ref/ecrypt-config.h
-            src/crypto/astrobwt/salsa20_ref/ecrypt-machine.h
-            src/crypto/astrobwt/salsa20_ref/ecrypt-portable.h
-            src/crypto/astrobwt/salsa20_ref/ecrypt-sync.h
-        )
-
-        list(APPEND SOURCES_CRYPTO
-            src/crypto/astrobwt/salsa20_ref/salsa20.c
-        )
-    else()
-        if (CMAKE_SIZEOF_VOID_P EQUAL 8)
-            add_definitions(/DASTROBWT_AVX2)
-            if (CMAKE_C_COMPILER_ID MATCHES MSVC)
-                enable_language(ASM_MASM)
-                list(APPEND SOURCES_CRYPTO src/crypto/astrobwt/sha3_256_avx2.asm)
-            else()
-                enable_language(ASM)
-                list(APPEND SOURCES_CRYPTO src/crypto/astrobwt/sha3_256_avx2.S)
-            endif()
-        endif()
-
-        list(APPEND HEADERS_CRYPTO
-            src/crypto/astrobwt/Salsa20.hpp
-        )
-
-        list(APPEND SOURCES_CRYPTO
-            src/crypto/astrobwt/Salsa20.cpp
-        )
-    endif()
-else()
-    remove_definitions(/DXMRIG_ALGO_ASTROBWT)
-endif()
--- a/cmake/cpu.cmake
+++ b/cmake/cpu.cmake
@ -1,47 +1,70 @@
+if (CMAKE_SIZEOF_VOID_P EQUAL 8)
+    set(XMRIG_64_BIT ON)
+    add_definitions(-DXMRIG_64_BIT)
+else()
+    set(XMRIG_64_BIT OFF)
+endif()
+
 if (NOT CMAKE_SYSTEM_PROCESSOR)
    message(WARNING "CMAKE_SYSTEM_PROCESSOR not defined")
 endif()

-if (CMAKE_SYSTEM_PROCESSOR MATCHES "^(x86_64|AMD64)$" AND CMAKE_SIZEOF_VOID_P EQUAL 8)
-    add_definitions(/DRAPIDJSON_SSE2)
+include(CheckCXXCompilerFlag)
+
+if (CMAKE_CXX_COMPILER_ID MATCHES MSVC)
+    set(VAES_SUPPORTED ON)
+else()
+    CHECK_CXX_COMPILER_FLAG("-mavx2 -mvaes" VAES_SUPPORTED)
+endif()
+
+if (NOT VAES_SUPPORTED)
+    set(WITH_VAES OFF)
+endif()
+
+if (XMRIG_64_BIT AND CMAKE_SYSTEM_PROCESSOR MATCHES "^(x86_64|AMD64)$")
+    add_definitions(-DRAPIDJSON_SSE2)
 else()
    set(WITH_SSE4_1 OFF)
+    set(WITH_AVX2 OFF)
+    set(WITH_VAES OFF)
+endif()
+
+if (ARM_V8)
+    set(ARM_TARGET 8)
+elseif (ARM_V7)
+    set(ARM_TARGET 7)
 endif()

 if (NOT ARM_TARGET)
    if (CMAKE_SYSTEM_PROCESSOR MATCHES "^(aarch64|arm64|armv8-a)$")
        set(ARM_TARGET 8)
-    elseif (CMAKE_SYSTEM_PROCESSOR MATCHES "^(armv7|armv7f|armv7s|armv7k|armv7-a|armv7l)$")
+    elseif (CMAKE_SYSTEM_PROCESSOR MATCHES "^(armv7|armv7f|armv7s|armv7k|armv7-a|armv7l|armv7ve)$")
        set(ARM_TARGET 7)
    endif()
 endif()

 if (ARM_TARGET AND ARM_TARGET GREATER 6)
-    set(XMRIG_ARM     ON)
-    add_definitions(/DXMRIG_ARM)
+    set(XMRIG_ARM ON)
+    add_definitions(-DXMRIG_ARM=${ARM_TARGET})

    message(STATUS "Use ARM_TARGET=${ARM_TARGET} (${CMAKE_SYSTEM_PROCESSOR})")

-    include(CheckCXXCompilerFlag)
-
    if (ARM_TARGET EQUAL 8)
-        set(XMRIG_ARMv8 ON)
-        add_definitions(/DXMRIG_ARMv8)
-
        CHECK_CXX_COMPILER_FLAG(-march=armv8-a+crypto XMRIG_ARM_CRYPTO)

        if (XMRIG_ARM_CRYPTO)
-            add_definitions(/DXMRIG_ARM_CRYPTO)
+            add_definitions(-DXMRIG_ARM_CRYPTO)
            set(ARM8_CXX_FLAGS "-march=armv8-a+crypto")
        else()
            set(ARM8_CXX_FLAGS "-march=armv8-a")
        endif()
-    elseif (ARM_TARGET EQUAL 7)
-        set(XMRIG_ARMv7 ON)
-        add_definitions(/DXMRIG_ARMv7)
    endif()
 endif()

 if (WITH_SSE4_1)
-    add_definitions(/DXMRIG_FEATURE_SSE4_1)
+    add_definitions(-DXMRIG_FEATURE_SSE4_1)
+endif()
+
+if (WITH_AVX2)
+    add_definitions(-DXMRIG_FEATURE_AVX2)
 endif()
--- a/cmake/flags.cmake
+++ b/cmake/flags.cmake
@ -10,7 +10,7 @@ if ("${CMAKE_BUILD_TYPE}" STREQUAL "")
 endif()

 if (CMAKE_BUILD_TYPE STREQUAL "Release")
-    add_definitions(/DNDEBUG)
+    add_definitions(-DNDEBUG)
 endif()

 include(CheckSymbolExists)
@ -22,17 +22,17 @@ if (CMAKE_CXX_COMPILER_ID MATCHES GNU)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -fexceptions -fno-rtti -Wno-strict-aliasing -Wno-class-memaccess")
    set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -Ofast -s")

-    if (XMRIG_ARMv8)
+    if (ARM_TARGET EQUAL 8)
        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARM8_CXX_FLAGS}")
        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ARM8_CXX_FLAGS} -flax-vector-conversions")
-    elseif (XMRIG_ARMv7)
-        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mfpu=neon")
-        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfpu=neon -flax-vector-conversions")
+    elseif (ARM_TARGET EQUAL 7)
+        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -march=armv7-a -mfpu=neon")
+        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=armv7-a -mfpu=neon -flax-vector-conversions")
    else()
        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -maes")
        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -maes")

-        add_definitions(/DHAVE_ROTR)
+        add_definitions(-DHAVE_ROTR)
    endif()

    if (WIN32)
@ -49,28 +49,16 @@ if (CMAKE_CXX_COMPILER_ID MATCHES GNU)
        set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -static")
    endif()

-    add_definitions(/D_GNU_SOURCE)
-
-    if (${CMAKE_VERSION} VERSION_LESS "3.1.0")
-        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -std=c99")
-        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
-    endif()
-
-    #set(CMAKE_C_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -gdwarf-2")
-
-    add_definitions(/DHAVE_BUILTIN_CLEAR_CACHE)
+    add_definitions(-D_GNU_SOURCE -DHAVE_BUILTIN_CLEAR_CACHE)

 elseif (CMAKE_CXX_COMPILER_ID MATCHES MSVC)
-    set(CMAKE_C_FLAGS_RELEASE "/MT /O2 /Oi /DNDEBUG /GL")
-    set(CMAKE_CXX_FLAGS_RELEASE "/MT /O2 /Oi /DNDEBUG /GL")
+    set(CMAKE_C_FLAGS_RELEASE "/MP /MT /O2 /Oi /DNDEBUG /GL")
+    set(CMAKE_CXX_FLAGS_RELEASE "/MP /MT /O2 /Oi /DNDEBUG /GL")

-    set(CMAKE_C_FLAGS_RELWITHDEBINFO "/Ob1 /Zi /DRELWITHDEBINFO")
-    set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "/Ob1 /Zi /DRELWITHDEBINFO")
+    set(CMAKE_C_FLAGS_RELWITHDEBINFO "/MP /Ob1 /Zi /DRELWITHDEBINFO")
+    set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "/MP /Ob1 /Zi /DRELWITHDEBINFO")

-    add_definitions(/D_CRT_SECURE_NO_WARNINGS)
-    add_definitions(/D_CRT_NONSTDC_NO_WARNINGS)
-    add_definitions(/DNOMINMAX)
-    add_definitions(/DHAVE_ROTR)
+    add_definitions(-D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_WARNINGS -DNOMINMAX -DHAVE_ROTR)

 elseif (CMAKE_CXX_COMPILER_ID MATCHES Clang)

@ -80,10 +68,10 @@ elseif (CMAKE_CXX_COMPILER_ID MATCHES Clang)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -fexceptions -fno-rtti -Wno-missing-braces")
    set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -Ofast -funroll-loops -fmerge-all-constants")

-    if (XMRIG_ARMv8)
+    if (ARM_TARGET EQUAL 8)
        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARM8_CXX_FLAGS}")
        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ARM8_CXX_FLAGS}")
-    elseif (XMRIG_ARMv7)
+    elseif (ARM_TARGET EQUAL 7)
        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mfpu=neon -march=${CMAKE_SYSTEM_PROCESSOR}")
        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfpu=neon -march=${CMAKE_SYSTEM_PROCESSOR}")
    else()
@ -92,7 +80,7 @@ elseif (CMAKE_CXX_COMPILER_ID MATCHES Clang)

        check_symbol_exists("_rotr" "x86intrin.h" HAVE_ROTR)
        if (HAVE_ROTR)
-            add_definitions(/DHAVE_ROTR)
+            add_definitions(-DHAVE_ROTR)
        endif()
    endif()

@ -105,6 +93,6 @@ endif()
 if (NOT WIN32)
    check_symbol_exists("__builtin___clear_cache" "stdlib.h" HAVE_BUILTIN_CLEAR_CACHE)
    if (HAVE_BUILTIN_CLEAR_CACHE)
-        add_definitions(/DHAVE_BUILTIN_CLEAR_CACHE)
+        add_definitions(-DHAVE_BUILTIN_CLEAR_CACHE)
    endif()
 endif()
--- a/cmake/ghostrider.cmake
+++ b/cmake/ghostrider.cmake
@ -0,0 +1,8 @@
+if (WITH_GHOSTRIDER)
+    add_definitions(/DXMRIG_ALGO_GHOSTRIDER)
+    add_subdirectory(src/crypto/ghostrider)
+    set(GHOSTRIDER_LIBRARY ghostrider)
+else()
+    remove_definitions(/DXMRIG_ALGO_GHOSTRIDER)
+    set(GHOSTRIDER_LIBRARY "")
+endif()
--- a/cmake/os.cmake
+++ b/cmake/os.cmake
@ -15,39 +15,38 @@ else()
        set(XMRIG_OS_ANDROID ON)
    elseif(CMAKE_SYSTEM_NAME MATCHES "Linux")
        set(XMRIG_OS_LINUX ON)
-    elseif(CMAKE_SYSTEM_NAME STREQUAL FreeBSD)
+    elseif(CMAKE_SYSTEM_NAME STREQUAL FreeBSD OR CMAKE_SYSTEM_NAME STREQUAL DragonFly)
        set(XMRIG_OS_FREEBSD ON)
    endif()
 endif()


 if (XMRIG_OS_WIN)
-    add_definitions(/DWIN32)
-    add_definitions(/DXMRIG_OS_WIN)
+    add_definitions(-DWIN32 -DXMRIG_OS_WIN)
 elseif(XMRIG_OS_APPLE)
-    add_definitions(/DXMRIG_OS_APPLE)
+    add_definitions(-DXMRIG_OS_APPLE)

    if (XMRIG_OS_IOS)
-        add_definitions(/DXMRIG_OS_IOS)
+        add_definitions(-DXMRIG_OS_IOS)
    else()
-        add_definitions(/DXMRIG_OS_MACOS)
+        add_definitions(-DXMRIG_OS_MACOS)
    endif()

    if (XMRIG_ARM)
        set(WITH_SECURE_JIT ON)
    endif()
 elseif(XMRIG_OS_UNIX)
-    add_definitions(/DXMRIG_OS_UNIX)
+    add_definitions(-DXMRIG_OS_UNIX)

    if (XMRIG_OS_ANDROID)
-        add_definitions(/DXMRIG_OS_ANDROID)
+        add_definitions(-DXMRIG_OS_ANDROID)
    elseif (XMRIG_OS_LINUX)
-        add_definitions(/DXMRIG_OS_LINUX)
+        add_definitions(-DXMRIG_OS_LINUX)
    elseif (XMRIG_OS_FREEBSD)
-        add_definitions(/DXMRIG_OS_FREEBSD)
+        add_definitions(-DXMRIG_OS_FREEBSD)
    endif()
 endif()

 if (WITH_SECURE_JIT)
-    add_definitions(/DXMRIG_SECURE_JIT)
+    add_definitions(-DXMRIG_SECURE_JIT)
 endif()
--- a/cmake/randomx.cmake
+++ b/cmake/randomx.cmake
@ -42,13 +42,13 @@ if (WITH_RANDOMX)
        src/crypto/rx/RxVm.cpp
    )

-    if (CMAKE_C_COMPILER_ID MATCHES MSVC)
+    if (WITH_ASM AND CMAKE_C_COMPILER_ID MATCHES MSVC)
        enable_language(ASM_MASM)
        list(APPEND SOURCES_CRYPTO
             src/crypto/randomx/jit_compiler_x86_static.asm
             src/crypto/randomx/jit_compiler_x86.cpp
            )
-    elseif (NOT XMRIG_ARM AND CMAKE_SIZEOF_VOID_P EQUAL 8)
+    elseif (WITH_ASM AND NOT XMRIG_ARM AND CMAKE_SIZEOF_VOID_P EQUAL 8)
        list(APPEND SOURCES_CRYPTO
             src/crypto/randomx/jit_compiler_x86_static.S
             src/crypto/randomx/jit_compiler_x86.cpp
@ -76,7 +76,15 @@ if (WITH_RANDOMX)
        list(APPEND SOURCES_CRYPTO src/crypto/randomx/blake2/blake2b_sse41.c)

        if (CMAKE_C_COMPILER_ID MATCHES GNU OR CMAKE_C_COMPILER_ID MATCHES Clang)
-            set_source_files_properties(src/crypto/randomx/blake2/blake2b_sse41.c PROPERTIES COMPILE_FLAGS -msse4.1)
+            set_source_files_properties(src/crypto/randomx/blake2/blake2b_sse41.c PROPERTIES COMPILE_FLAGS "-Ofast -msse4.1")
+        endif()
+    endif()
+
+    if (WITH_AVX2)
+        list(APPEND SOURCES_CRYPTO src/crypto/randomx/blake2/avx2/blake2b_avx2.c)
+
+        if (CMAKE_C_COMPILER_ID MATCHES GNU OR CMAKE_C_COMPILER_ID MATCHES Clang)
+            set_source_files_properties(src/crypto/randomx/blake2/avx2/blake2b_avx2.c PROPERTIES COMPILE_FLAGS "-Ofast -mavx2")
        endif()
    endif()

--- a/scripts/benchmark_10M.cmd
+++ b/scripts/benchmark_10M.cmd
@ -1,4 +1,4 @@
@echo off
-cd %~dp0
+cd /d "%~dp0"
 xmrig.exe --bench=10M --submit
 pause
--- a/scripts/benchmark_1M.cmd
+++ b/scripts/benchmark_1M.cmd
@ -1,4 +1,4 @@
@echo off
-cd %~dp0
+cd /d "%~dp0"
 xmrig.exe --bench=1M --submit
 pause
--- a/scripts/build.hwloc.sh
+++ b/scripts/build.hwloc.sh
@ -1,6 +1,10 @@
 #!/bin/bash -e

-HWLOC_VERSION="2.4.1"
+HWLOC_VERSION_MAJOR="2"
+HWLOC_VERSION_MINOR="9"
+HWLOC_VERSION_PATCH="0"
+
+HWLOC_VERSION="${HWLOC_VERSION_MAJOR}.${HWLOC_VERSION_MINOR}.${HWLOC_VERSION_PATCH}"

 mkdir -p deps
 mkdir -p deps/include
@ -8,7 +12,7 @@ mkdir -p deps/lib

 mkdir -p build && cd build

-wget https://download.open-mpi.org/release/hwloc/v2.4/hwloc-${HWLOC_VERSION}.tar.gz -O hwloc-${HWLOC_VERSION}.tar.gz
+wget https://download.open-mpi.org/release/hwloc/v${HWLOC_VERSION_MAJOR}.${HWLOC_VERSION_MINOR}/hwloc-${HWLOC_VERSION}.tar.gz -O hwloc-${HWLOC_VERSION}.tar.gz
 tar -xzf hwloc-${HWLOC_VERSION}.tar.gz

 cd hwloc-${HWLOC_VERSION}
@ -16,4 +20,4 @@ cd hwloc-${HWLOC_VERSION}
 make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)
 cp -fr include ../../deps
 cp hwloc/.libs/libhwloc.a ../../deps/lib
-cd ..
+cd ..
--- a/scripts/build.libressl.sh
+++ b/scripts/build.libressl.sh
@ -1,6 +1,6 @@
 #!/bin/bash -e

-LIBRESSL_VERSION="3.0.2"
+LIBRESSL_VERSION="3.5.2"

 mkdir -p deps
 mkdir -p deps/include
@ -17,4 +17,4 @@ make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)
 cp -fr include ../../deps
 cp crypto/.libs/libcrypto.a ../../deps/lib
 cp ssl/.libs/libssl.a ../../deps/lib
-cd ..
+cd ..
--- a/scripts/build.openssl.sh
+++ b/scripts/build.openssl.sh
@ -1,6 +1,6 @@
 #!/bin/bash -e

-OPENSSL_VERSION="1.1.1j"
+OPENSSL_VERSION="1.1.1s"

 mkdir -p deps
 mkdir -p deps/include
@ -17,4 +17,4 @@ make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)
 cp -fr include ../../deps
 cp libcrypto.a ../../deps/lib
 cp libssl.a ../../deps/lib
-cd ..
+cd ..
--- a/scripts/build.openssl3.sh
+++ b/scripts/build.openssl3.sh
@ -0,0 +1,20 @@
+#!/bin/bash -e
+
+OPENSSL_VERSION="3.0.7"
+
+mkdir -p deps
+mkdir -p deps/include
+mkdir -p deps/lib
+
+mkdir -p build && cd build
+
+wget https://www.openssl.org/source/openssl-${OPENSSL_VERSION}.tar.gz -O openssl-${OPENSSL_VERSION}.tar.gz
+tar -xzf openssl-${OPENSSL_VERSION}.tar.gz
+
+cd openssl-${OPENSSL_VERSION}
+./config -no-shared -no-asm -no-zlib -no-comp -no-dgram -no-filenames -no-cms
+make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)
+cp -fr include ../../deps
+cp libcrypto.a ../../deps/lib
+cp libssl.a ../../deps/lib
+cd ..
--- a/scripts/build.uv.sh
+++ b/scripts/build.uv.sh
@ -1,6 +1,6 @@
 #!/bin/bash -e

-UV_VERSION="1.41.0"
+UV_VERSION="1.44.2"

 mkdir -p deps
 mkdir -p deps/include
@ -17,4 +17,4 @@ sh autogen.sh
 make -j$(nproc || sysctl -n hw.ncpu || sysctl -n hw.logicalcpu)
 cp -fr include ../../deps
 cp .libs/libuv.a ../../deps/lib
-cd ..
+cd ..
--- a/scripts/generate_cl.js
+++ b/scripts/generate_cl.js
@ -51,6 +51,7 @@ function rx()
        'randomx_constants_wow.h',
        'randomx_constants_arqma.h',
        'randomx_constants_keva.h',
+        'randomx_constants_graft.h',
        'aes.cl',
        'blake2b.cl',
        'randomx_vm.cl',
@ -66,15 +67,6 @@ function rx()
 }


-function astrobwt()
-{
-    const astrobwt = opencl_minify(addIncludes('astrobwt.cl', [ 'BWT.cl', 'salsa20.cl', 'sha3.cl' ]));
-
-    // fs.writeFileSync('astrobwt_gen.cl', astrobwt);
-    fs.writeFileSync('astrobwt_cl.h', text2h(astrobwt, 'xmrig', 'astrobwt_cl'));
-}
-
-
 function kawpow()
 {
    const kawpow = opencl_minify(addIncludes('kawpow.cl', [ 'defs.h' ]));
@ -96,11 +88,6 @@ process.chdir(path.resolve('src/backend/opencl/cl/rx'));

 rx();

-process.chdir(cwd);
-process.chdir(path.resolve('src/backend/opencl/cl/astrobwt'));
-
-astrobwt();
-
 process.chdir(cwd);
 process.chdir(path.resolve('src/backend/opencl/cl/kawpow'));

--- a/scripts/pool_mine_example.cmd
+++ b/scripts/pool_mine_example.cmd
@ -15,6 +15,6 @@
 :: Choose pools outside of top 5 to help Monero network be more decentralized!
 :: Smaller pools also often have smaller fees/payout limits.

-cd %~dp0
-xmrig.exe -o pool.hashvault.pro:3333 -u 48edfHu7V9Z84YzzMa6fUueoELZ9ZRXq9VetWzYGzKt52XU5xvqgzYnDK9URnRoJMk1j8nLwEVsaSWJ4fhdUyZijBGUicoD -p x
+cd /d "%~dp0"
+xmrig.exe -o xmrpool.eu:3333 -u 48edfHu7V9Z84YzzMa6fUueoELZ9ZRXq9VetWzYGzKt52XU5xvqgzYnDK9URnRoJMk1j8nLwEVsaSWJ4fhdUyZijBGUicoD -p x
 pause
--- a/scripts/randomx_boost.sh
+++ b/scripts/randomx_boost.sh
@ -1,28 +1,44 @@
-#!/bin/bash
+#!/bin/sh -e

-modprobe msr
+MSR_FILE=/sys/module/msr/parameters/allow_writes

-if cat /proc/cpuinfo | grep "AMD Ryzen" > /dev/null;
+if test -e "$MSR_FILE"; then
+	echo on > $MSR_FILE
+else
+	modprobe msr allow_writes=on
+fi
+
+if grep -E 'AMD Ryzen|AMD EPYC' /proc/cpuinfo > /dev/null;
 	then
-	if cat /proc/cpuinfo | grep "cpu family[[:space:]]:[[:space:]]25" > /dev/null;
+	if grep "cpu family[[:space:]]\{1,\}:[[:space:]]25" /proc/cpuinfo > /dev/null;
 		then
-			echo "Detected Ryzen (Zen3)"
-			wrmsr -a 0xc0011020 0x4480000000000
-			wrmsr -a 0xc0011021 0x1c000200000040
-			wrmsr -a 0xc0011022 0xc000000401500000
-			wrmsr -a 0xc001102b 0x2000cc14
-			echo "MSR register values for Ryzen (Zen3) applied"
+			if grep "model[[:space:]]\{1,\}:[[:space:]]97" /proc/cpuinfo > /dev/null;
+				then
+					echo "Detected Zen4 CPU"
+					wrmsr -a 0xc0011020 0x4400000000000
+					wrmsr -a 0xc0011021 0x4000000000040
+					wrmsr -a 0xc0011022 0x8680000401570000
+					wrmsr -a 0xc001102b 0x2040cc10
+					echo "MSR register values for Zen4 applied"
+				else
+					echo "Detected Zen3 CPU"
+					wrmsr -a 0xc0011020 0x4480000000000
+					wrmsr -a 0xc0011021 0x1c000200000040
+					wrmsr -a 0xc0011022 0xc000000401570000
+					wrmsr -a 0xc001102b 0x2000cc10
+					echo "MSR register values for Zen3 applied"
+				fi
 		else
-			echo "Detected Ryzen (Zen1/Zen2)"
+			echo "Detected Zen1/Zen2 CPU"
 			wrmsr -a 0xc0011020 0
 			wrmsr -a 0xc0011021 0x40
 			wrmsr -a 0xc0011022 0x1510000
 			wrmsr -a 0xc001102b 0x2000cc16
-			echo "MSR register values for Ryzen (Zen1/Zen2) applied"
+			echo "MSR register values for Zen1/Zen2 applied"
 		fi
-elif cat /proc/cpuinfo | grep "Intel" > /dev/null;
+elif grep "Intel" /proc/cpuinfo > /dev/null;
 	then
-		echo "Detected Intel"
+		echo "Detected Intel CPU"
 		wrmsr -a 0x1a4 0xf
 		echo "MSR register values for Intel applied"
 else
--- a/scripts/rtm_ghostrider_example.cmd
+++ b/scripts/rtm_ghostrider_example.cmd
@ -0,0 +1,23 @@
+:: Example batch file for mining Raptoreum at a pool
+::
+:: Format:
+::      xmrig.exe -a gr -o <pool address>:<pool port> -u <pool username/wallet> -p <pool password>
+::
+:: Fields:
+::      pool address            The host name of the pool stratum or its IP address, for example raptoreumemporium.com
+::      pool port               The port of the pool's stratum to connect to, for example 3333. Check your pool's getting started page.
+::      pool username/wallet    For most pools, this is the wallet address you want to mine to. Some pools require a username
+::      pool password           For most pools this can be just 'x'. For pools using usernames, you may need to provide a password as configured on the pool.
+::
+:: List of Raptoreum mining pools:
+::      https://miningpoolstats.stream/raptoreum
+::
+:: Choose pools outside of top 5 to help Raptoreum network be more decentralized!
+:: Smaller pools also often have smaller fees/payout limits.
+
+cd /d "%~dp0"
+:: Use this command line to connect to non-SSL port
+xmrig.exe -a gr -o raptoreumemporium.com:3008 -u WALLET_ADDRESS -p x
+:: Or use this command line to connect to an SSL port
+:: xmrig.exe -a gr -o rtm.suprnova.cc:4273 --tls -u WALLET_ADDRESS -p x
+pause
--- a/scripts/solo_mine_example.cmd
+++ b/scripts/solo_mine_example.cmd
@ -11,6 +11,6 @@
 :: Mining solo is the best way to help Monero network be more decentralized!
 :: But you will only get a payout when you find a block which can take more than a year for a single low-end PC.

-cd %~dp0
-xmrig.exe -o node.xmr.to:18081 -a rx/0 -u 48edfHu7V9Z84YzzMa6fUueoELZ9ZRXq9VetWzYGzKt52XU5xvqgzYnDK9URnRoJMk1j8nLwEVsaSWJ4fhdUyZijBGUicoD --daemon
+cd /d "%~dp0"
+xmrig.exe -o YOUR_NODE_IP:18081 -a rx/0 -u 48edfHu7V9Z84YzzMa6fUueoELZ9ZRXq9VetWzYGzKt52XU5xvqgzYnDK9URnRoJMk1j8nLwEVsaSWJ4fhdUyZijBGUicoD --daemon
 pause
--- a/src/3rdparty/CL/cl_dx9_media_sharing.h
+++ b/src/3rdparty/CL/cl_dx9_media_sharing.h
@ -44,7 +44,7 @@ extern "C" {

 typedef cl_uint             cl_dx9_media_adapter_type_khr;
 typedef cl_uint             cl_dx9_media_adapter_set_khr;
-    
+
 #if defined(_WIN32)
 #include <d3d9.h>
 typedef struct _cl_dx9_surface_info_khr
@ -105,7 +105,7 @@ typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromDX9MediaSurfaceKHR_fn)(
    cl_mem_flags                  flags,
    cl_dx9_media_adapter_type_khr adapter_type,
    void *                        surface_info,
-    cl_uint                       plane,                                                                          
+    cl_uint                       plane,
    cl_int *                      errcode_ret) CL_API_SUFFIX__VERSION_1_2;

 typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireDX9MediaSurfacesKHR_fn)(
--- a/src/3rdparty/CL/cl_gl_ext.h
+++ b/src/3rdparty/CL/cl_gl_ext.h
@ -35,7 +35,7 @@ extern "C" {

 #include <CL/cl_gl.h>

-/* 
+/*
 *  cl_khr_gl_event extension
 */
 #define CL_COMMAND_GL_FENCE_SYNC_OBJECT_KHR     0x200D
--- a/src/3rdparty/adl/adl_defines.h
+++ b/src/3rdparty/adl/adl_defines.h
@ -1471,7 +1471,7 @@ typedef enum _ADLProfilePropertyType
 #define ADL_HDR_FREESYNC_HDR    0x0004      ///< FreeSync HDR supported
 /// @}

-/// \defgroup define_FreesyncFlags ADLDDCInfo2 Freesync HDR flags 
+/// \defgroup define_FreesyncFlags ADLDDCInfo2 Freesync HDR flags
 /// @{
 /// defines for iFreesyncFlags in ADLDDCInfo2
 #define ADL_HDR_FREESYNC_BACKLIGHT_SUPPORT           0x0001      ///< Global backlight control supported
@ -1738,7 +1738,7 @@ enum ADLODNDPMMaskType
     ADL_ODN_DPM_MASK                = 1 << 2,
 };

-//ODN features Bits for ADLODNCapabilitiesX2 
+//ODN features Bits for ADLODNCapabilitiesX2
 enum ADLODNFeatureControl
 {
     ADL_ODN_SCLK_DPM                = 1 << 0,
@ -1764,7 +1764,7 @@ enum ADLODNFeatureControl

 //If any new feature is added, PPLIB only needs to add ext feature ID and Item ID(Seeting ID). These IDs should match the drive defined in CWDDEPM.h
 enum ADLODNExtFeatureControl
-{	
+{
 	ADL_ODN_EXT_FEATURE_MEMORY_TIMING_TUNE = 1 << 0,
 	ADL_ODN_EXT_FEATURE_FAN_ZERO_RPM_CONTROL = 1 << 1,
 	ADL_ODN_EXT_FEATURE_AUTO_UV_ENGINE = 1 << 2,   //Auto under voltage
@ -1794,7 +1794,7 @@ enum ADLODNExtSettingId
 	ADL_ODN_PARAMETER_FAN_CURVE_SPEED_5,
    ADL_ODN_POWERGAUGE,
 	ODN_COUNT
-	
+
 } ;

 //OD8 Capability features bits
@ -1811,7 +1811,7 @@ enum ADLOD8FeatureControl
    ADL_OD8_MEMORY_TIMING_TUNE = 1 << 8,
    ADL_OD8_FAN_ZERO_RPM_CONTROL = 1 << 9 ,
 	ADL_OD8_AUTO_UV_ENGINE = 1 << 10,  //Auto under voltage
-	ADL_OD8_AUTO_OC_ENGINE = 1 << 11,  //Auto overclock engine     
+	ADL_OD8_AUTO_OC_ENGINE = 1 << 11,  //Auto overclock engine
 	ADL_OD8_AUTO_OC_MEMORY = 1 << 12,  //Auto overclock memory
 	ADL_OD8_FAN_CURVE = 1 << 13,   //Fan curve
 	ADL_OD8_WS_AUTO_FAN_ACOUSTIC_LIMIT = 1 << 14, //Workstation Manual Fan controller
@ -1888,7 +1888,7 @@ typedef enum _ADLSensorType
 	PMLOG_TEMPERATURE_VRSOC = 24,
 	PMLOG_TEMPERATURE_VRMVDD0 = 25,
 	PMLOG_TEMPERATURE_VRMVDD1 = 26,
-	PMLOG_TEMPERATURE_HOTSPOT = 27,    
+	PMLOG_TEMPERATURE_HOTSPOT = 27,
        PMLOG_TEMPERATURE_GFX = 28,
        PMLOG_TEMPERATURE_SOC = 29,
        PMLOG_GFX_POWER = 30,
--- a/src/3rdparty/adl/adl_sdk.h
+++ b/src/3rdparty/adl/adl_sdk.h
@ -37,7 +37,7 @@
 #define __stdcall
 #endif /* (LINUX) */

-/// Memory Allocation Call back 
+/// Memory Allocation Call back
 typedef void* ( __stdcall *ADL_MAIN_MALLOC_CALLBACK )( int );


--- a/src/3rdparty/adl/adl_structures.h
+++ b/src/3rdparty/adl/adl_structures.h
@ -1753,7 +1753,7 @@ typedef struct ADLPXConfigCaps
 ///\brief Enum containing PX or HG type
 ///
 /// This enum is used to get PX or hG type
-/// 
+///
 /// \nosubgrouping
 //////////////////////////////////////////////////////////////////////////////////////////
 enum ADLPxType
--- a/src/3rdparty/argon2/CMakeLists.txt
+++ b/src/3rdparty/argon2/CMakeLists.txt
@ -1,4 +1,4 @@
-cmake_minimum_required(VERSION 2.8.12)
+cmake_minimum_required(VERSION 3.1)

 project(argon2 C)
 set(CMAKE_C_STANDARD 99)
--- a/src/3rdparty/epee/LICENSE.txt
+++ b/src/3rdparty/epee/LICENSE.txt
@ -0,0 +1,25 @@
+Copyright (c) 2006-2013, Andrey N. Sabelnikov, www.sabelnikov.net
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+    * Redistributions of source code must retain the above copyright
+      notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+      notice, this list of conditions and the following disclaimer in the
+      documentation and/or other materials provided with the distribution.
+    * Neither the name of the Andrey N. Sabelnikov nor the
+      names of its contributors may be used to endorse or promote products
+      derived from this software without specific prior written permission.
+
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL Andrey N. Sabelnikov BE LIABLE FOR ANY
+DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/src/3rdparty/epee/README.md
+++ b/src/3rdparty/epee/README.md
@ -0,0 +1 @@
+epee -  is a small library of helpers, wrappers, tools and and so on, used to make my life easier.
--- a/src/3rdparty/epee/span.h
+++ b/src/3rdparty/epee/span.h
@ -0,0 +1,176 @@
+// Copyright (c) 2017-2020, The Monero Project
+//
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without modification, are
+// permitted provided that the following conditions are met:
+//
+// 1. Redistributions of source code must retain the above copyright notice, this list of
+//    conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright notice, this list
+//    of conditions and the following disclaimer in the documentation and/or other
+//    materials provided with the distribution.
+//
+// 3. Neither the name of the copyright holder nor the names of its contributors may be
+//    used to endorse or promote products derived from this software without specific
+//    prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+// THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+// THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#pragma once
+
+#include <algorithm>
+#include <cstdint>
+#include <memory>
+#include <string>
+#include <type_traits>
+
+namespace epee
+{
+  /*!
+    \brief Non-owning sequence of data. Does not deep copy
+
+    Inspired by `gsl::span` and/or `boost::iterator_range`. This class is
+    intended to be used as a parameter type for functions that need to take a
+    writable or read-only sequence of data. Most common cases are `span<char>`
+    and `span<std::uint8_t>`. Using as a class member is only recommended if
+    clearly documented as not doing a deep-copy. C-arrays are easily convertible
+    to this type.
+
+    \note Conversion from C string literal to `span<const char>` will include
+      the NULL-terminator.
+    \note Never allows derived-to-base pointer conversion; an array of derived
+      types is not an array of base types.
+   */
+  template<typename T>
+  class span
+  {
+    template<typename U>
+    static constexpr bool safe_conversion() noexcept
+    {
+      // Allow exact matches or `T*` -> `const T*`.
+      using with_const = typename std::add_const<U>::type;
+      return std::is_same<T, U>() ||
+        (std::is_const<T>() && std::is_same<T, with_const>());
+    }
+
+  public:
+    using value_type = T;
+    using size_type = std::size_t;
+    using difference_type = std::ptrdiff_t;
+    using pointer = T*;
+    using const_pointer = const T*;
+    using reference = T&;
+    using const_reference = const T&;
+    using iterator = pointer;
+    using const_iterator = const_pointer;
+
+    constexpr span() noexcept : ptr(nullptr), len(0) {}
+    constexpr span(std::nullptr_t) noexcept : span() {}
+
+    //! Prevent derived-to-base conversions; invalid in this context.
+    template<typename U, typename = typename std::enable_if<safe_conversion<U>()>::type>
+    constexpr span(U* const src_ptr, const std::size_t count) noexcept
+      : ptr(src_ptr), len(count) {}
+
+    //! Conversion from C-array. Prevents common bugs with sizeof + arrays.
+    template<std::size_t N>
+    constexpr span(T (&src)[N]) noexcept : span(src, N) {}
+
+    constexpr span(const span&) noexcept = default;
+    span& operator=(const span&) noexcept = default;
+
+    /*! Try to remove `amount` elements from beginning of span.
+    \return Number of elements removed. */
+    std::size_t remove_prefix(std::size_t amount) noexcept
+    {
+        amount = std::min(len, amount);
+        ptr += amount;
+        len -= amount;
+        return amount;
+    }
+
+    constexpr iterator begin() const noexcept { return ptr; }
+    constexpr const_iterator cbegin() const noexcept { return ptr; }
+
+    constexpr iterator end() const noexcept { return begin() + size(); }
+    constexpr const_iterator cend() const noexcept { return cbegin() + size(); }
+
+    constexpr bool empty() const noexcept { return size() == 0; }
+    constexpr pointer data() const noexcept { return ptr; }
+    constexpr std::size_t size() const noexcept { return len; }
+    constexpr std::size_t size_bytes() const noexcept { return size() * sizeof(value_type); }
+
+    T &operator[](size_t idx) noexcept { return ptr[idx]; }
+    const T &operator[](size_t idx) const noexcept { return ptr[idx]; }
+
+  private:
+    T* ptr;
+    std::size_t len;
+  };
+
+  //! \return `span<const T::value_type>` from a STL compatible `src`.
+  template<typename T>
+  constexpr span<const typename T::value_type> to_span(const T& src)
+  {
+    // compiler provides diagnostic if size() is not size_t.
+    return {src.data(), src.size()};
+  }
+
+  //! \return `span<T::value_type>` from a STL compatible `src`.
+  template<typename T>
+  constexpr span<typename T::value_type> to_mut_span(T& src)
+  {
+    // compiler provides diagnostic if size() is not size_t.
+    return {src.data(), src.size()};
+  }
+
+  template<typename T>
+  constexpr bool has_padding() noexcept
+  {
+    return !std::is_standard_layout<T>() || alignof(T) != 1;
+  }
+
+  //! \return Cast data from `src` as `span<const std::uint8_t>`.
+  template<typename T>
+  span<const std::uint8_t> to_byte_span(const span<const T> src) noexcept
+  {
+    static_assert(!has_padding<T>(), "source type may have padding");
+    return {reinterpret_cast<const std::uint8_t*>(src.data()), src.size_bytes()};
+  }
+
+  //! \return `span<const std::uint8_t>` which represents the bytes at `&src`.
+  template<typename T>
+  span<const std::uint8_t> as_byte_span(const T& src) noexcept
+  {
+    static_assert(!std::is_empty<T>(), "empty types will not work -> sizeof == 1");
+    static_assert(!has_padding<T>(), "source type may have padding");
+    return {reinterpret_cast<const std::uint8_t*>(std::addressof(src)), sizeof(T)};
+  }
+
+  //! \return `span<std::uint8_t>` which represents the bytes at `&src`.
+  template<typename T>
+  span<std::uint8_t> as_mut_byte_span(T& src) noexcept
+  {
+    static_assert(!std::is_empty<T>(), "empty types will not work -> sizeof == 1");
+    static_assert(!has_padding<T>(), "source type may have padding");
+    return {reinterpret_cast<std::uint8_t*>(std::addressof(src)), sizeof(T)};
+  }
+
+  //! make a span from a std::string
+  template<typename T>
+  span<const T> strspan(const std::string &s) noexcept
+  {
+    static_assert(std::is_same<T, char>() || std::is_same<T, unsigned char>() || std::is_same<T, int8_t>() || std::is_same<T, uint8_t>(), "Unexpected type");
+    return {reinterpret_cast<const T*>(s.data()), s.size()};
+  }
+}
--- a/src/3rdparty/fmt/README.rst
+++ b/src/3rdparty/fmt/README.rst
@ -81,7 +81,7 @@ Examples
 .. code:: c++

    #include <fmt/core.h>
-    
+
    int main() {
      fmt::print("Hello, world!\n");
    }
@ -293,11 +293,11 @@ Projects using this library
  An open-source library for mathematical programming

 * `Aseprite <https://github.com/aseprite/aseprite>`_:
-  Animated sprite editor & pixel art tool 
+  Animated sprite editor & pixel art tool

 * `AvioBook <https://www.aviobook.aero/en>`_: A comprehensive aircraft
  operations suite
-  
+
 * `Celestia <https://celestia.space/>`_: Real-time 3D visualization of space

 * `Ceph <https://ceph.com/>`_: A scalable distributed storage system
@ -351,7 +351,7 @@ Projects using this library

 * `quasardb <https://www.quasardb.net/>`_: A distributed, high-performance,
  associative database
-  
+
 * `Quill <https://github.com/odygrd/quill>`_: Asynchronous low-latency logging library

 * `QKW <https://github.com/ravijanjam/qkw>`_: Generalizing aliasing to simplify
--- a/src/3rdparty/getopt/getopt.h
+++ b/src/3rdparty/getopt/getopt.h
@ -3,9 +3,9 @@
 * DISCLAIMER
 * This file is part of the mingw-w64 runtime package.
 *
- * The mingw-w64 runtime package and its code is distributed in the hope that it 
- * will be useful but WITHOUT ANY WARRANTY.  ALL WARRANTIES, EXPRESSED OR 
- * IMPLIED ARE HEREBY DISCLAIMED.  This includes but is not limited to 
+ * The mingw-w64 runtime package and its code is distributed in the hope that it
+ * will be useful but WITHOUT ANY WARRANTY.  ALL WARRANTIES, EXPRESSED OR
+ * IMPLIED ARE HEREBY DISCLAIMED.  This includes but is not limited to
 * warranties of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 */
 /*
@ -109,11 +109,7 @@ char    *optarg;		/* argument associated with option */
 extern char __declspec(dllimport) *__progname;
 #endif

-#ifdef __CYGWIN__
 static char EMSG[] = "";
-#else
-#define	EMSG		""
-#endif

 static int getopt_internal(int, char * const *, const char *,
 			   const struct option *, int *, int);
--- a/src/3rdparty/hwloc/CMakeLists.txt
+++ b/src/3rdparty/hwloc/CMakeLists.txt
@ -1,4 +1,4 @@
-cmake_minimum_required (VERSION  2.8.12)
+cmake_minimum_required(VERSION 3.1)
 project (hwloc C)

 include_directories(include)
--- a/src/3rdparty/hwloc/NEWS
+++ b/src/3rdparty/hwloc/NEWS
@ -1,5 +1,5 @@
 Copyright © 2009 CNRS
-Copyright © 2009-2020 Inria.  All rights reserved.
+Copyright © 2009-2022 Inria.  All rights reserved.
 Copyright © 2009-2013 Université Bordeaux
 Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 Copyright © 2020 Hewlett Packard Enterprise.  All rights reserved.
@ -17,6 +17,202 @@ bug fixes (and other actions) for each version of hwloc since version
 0.9.


+Version 2.9.0
+-------------
+* Backends
+  + Expose the memory size of CXL memory devices (Type 3) on Linux.
+  + The LevelZero backend now reports the "XeLinkBandwidth" distance
+    matrix between L0 devices (and subdevices) when available.
+  + Add support for CUDA compute capability up to 9.0.
+* Tools
+  + lstopo now switches to console mode when its output is redirected.
+    Graphical window mode may be forced back with --of window.
+  + hwloc-calc now accepts "numa" in -H, and I/O subtypes such as "gpu"
+    in -I and -N.
+
+
+Version 2.8.0
+-------------
+* API
+  + Add HWLOC_TOPOLOGY_FLAG_NO_DISTANCES, _NO_MEMATTRS and _NO_CPUKINDS
+    to reduce the overhead when unneeded.
+  + Add separate Read/Write Bandwidth/Latency memory attributes and
+    implement them on Linux.
+* Backends
+  + NUMA nodes may now have a subtype such as DRAM, HBM, SPM, or NVM
+    on heterogeneous memory platforms on Linux.
+    - Add DAXType and DAXParent attributes on Linux to tell where a
+      DAX device or its corresponding NUMA node come from (SPM for
+      Specific-Purpose or NVM for Non-Volatile Memory).
+  + Detect heterogeneous caches in hybrid CPUs on MacOS X,
+    thanks to Paul Bone for the help.
+  + Max frequencies are not ignored in Linux cpukinds anymore (they were
+    ignored in hwloc 2.7.0), but they may be slightly adjusted to avoid
+    reporting hybrid CPUs because Intel Turbo Boost Max 3.0.
+    - See the documentation of environment variable HWLOC_CPUKINDS_MAXFREQ.
+  + Hardwire the PCI locality of HPE Cray EX235a nodes.
+* Tools
+  + lstopo and other tools may now load Linux and x86 cpuid topology files
+    from a tarball.
+  + lstopo may now replace the P# and L# index prefixes with custom strings
+    thanks to --os-index-prefix and --logical-index-prefix options.
+* Misc
+  + Add --disable-readme to avoid regenerating the top-level hwloc README
+    file from the documentation.
+
+
+Version 2.7.1
+-------------
+* Workaround crashes when virtual machines report incoherent x86 CPUID
+  information about numbers of cores and threads.
+  Thanks to Peter Bense for the report.
+* Use setenv() instead of putenv() when trying to force enable oneAPI L0
+  support, to avoid issues with applications that touch the environment,
+  thanks to Josh Hursey for the patch.
+* Add some warnings at the end of configure when GPU libraries are
+  missing on the system or their path is missing in the environment.
+
+
+Version 2.7.0
+-------------
+* Backends
+  + Add support for NUMA nodes and caches with more than 64 PUs across
+    multiple processor groups on Windows 11 and Windows Server 2022.
+  + Group objects are not created for Windows processor groups anymore,
+    except if HWLOC_WINDOWS_PROCESSOR_GROUP_OBJS=1 in the environment.
+  + Expose "Cluster" group objects on Linux kernel 5.16+ for CPUs
+    that share some internal cache or bus. This can be equivalent
+    to the L2 Cache level on some platforms (e.g. x86) or a specific
+    level between L2 and L3 on others (e.g. ARM Kungpeng 920).
+    Thanks to Jonathan Cameron for the help.
+    - HWLOC_DONT_MERGE_CLUSTER_GROUPS=1 may be set in the environment
+      to prevent these groups from being merged with identical caches, etc.
+  + Improve the oneAPI LevelZero backend:
+    - Expose subdevices such as "ze0.1" inside root OS devices ("ze0")
+      when the hardware contains multiple subdevices.
+    - Add many new attributes to describe device type, and the
+      numbers of slices, subslices, execution units and threads.
+    - Expose the memory information as LevelZeroHBM/DDR/MemorySize infos.
+  + Ignore the max frequencies of cores in Linux cpukinds when the
+    base frequencies are available (to avoid exposing hybrid CPUs
+    when Intel Turbo Boost Max 3.0 gives slightly different max
+    frequencies to CPU cores).
+    - May be reverted by setting HWLOC_CPUKINDS_MAXFREQ=1 in the environment.
+* Tools
+  + Add --grey and --palette options to switch lstopo to greyscale or
+    white-background-only graphics, or to tune individual colors.
+* Build
+  + Windows CMake builds now support non-MSVC compilers, detect several
+    features at build time, can build/run tests, etc.
+    Thanks to Michael Hirsch and Alexander Neumann .
+
+
+Version 2.6.0
+-------------
+* Backends
+  + Expose two cpukinds for energy-efficient cores (icestorm) and
+    high-performance cores (firestorm) on Apple M1 on Mac OS X.
+  + Use sysfs CPU "capacity" to rank hybrid cores by efficiency
+    on Linux when available (mostly on recent ARM platforms for now).
+  + Improve HWLOC_MEMBIND_BIND (without the STRICT flag) on Linux kernel
+    >= 5.15: If more than one node is given, the kernel may now use all
+    of them instead of only the first one before falling back to others.
+  + Expose cache os_index when available on Linux, it may be needed
+    when using resctrl to configure cache partitioning, memory bandwidth
+    monitoring, etc.
+  + Add a "XGMIHops" distances matrix in the RSMI backend for AMD GPU
+    interconnected through XGMI links.
+  + Expose AMD GPU memory information (VRAM and GTT) in the RSMI backend.
+  + Add OS devices such as "bxi0" for Atos/Bull BXI HCAs on Linux.
+* Tools
+  + lstopo has a better placement algorithm with respect to I/O
+    objects, see --children-order in the manpage for details.
+  + hwloc-annotate may now change object subtypes and cache or memory
+    sizes.
+* Build
+  + Allow to specify the ROCm installation for building the RSMI backend:
+    - Use a custom installation path if specified with --with-rocm=<dir>.
+    - Use /opt/rocm-<version> if specified with --with-rocm-version=<version>
+      or the ROCM_VERSION environment variable.
+    - Try /opt/rocm if it exists.
+    - See "How do I enable ROCm SMI and select which version to use?"
+      in the FAQ for details.
+  + Add a CMakeLists for Windows under contrib/windows-cmake/ .
+* Documentation
+  + Add FAQ entry "How do I create a custom heterogeneous and
+     asymmetric topology?"
+
+
+Version 2.5.0
+-------------
+* API
+  + Add hwloc/windows.h to query Windows processor groups.
+  + Add hwloc_get_obj_with_same_locality() to convert between objects
+    with same locality, for instance NUMA nodes and Packages,
+    or OS devices within a PCI device.
+  + Add hwloc_distances_transform() to modify distances structures.
+    - hwloc-annotate and lstopo have new distances-transform options.
+  + hwloc_distances_add() is replaced with _add_create() followed by
+    _add_values() and _add_commit(). See hwloc/distances.h for details.
+  + Add topology flags to mitigate binding modifications during
+    hwloc discovery, especially on Windows:
+    - HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING and _MEMBINDING
+      restrict discovery to PUs and NUMA nodes inside the binding.
+    - HWLOC_TOPOLOGY_FLAG_DONT_CHANGE_BINDING prevents from ever
+      changing the binding during discovery.
+* Backends
+  + Add a levelzero backend for oneAPI L0 devices, exposed as OS devices
+    of subtype "LevelZero" and name such as "ze0".
+    - Add hwloc/levelzero.h for interoperability between converting
+      between L0 API devices and hwloc cpusets or OS devices.
+  + Expose NEC Vector Engine cards on Linux as OS devices of subtype
+    "VectorEngine" and name "ve0", etc.
+    Thanks to Anara Kozhokanova, Tim Cramer and Erich Focht for the help.
+  + Add a NVLinkBandwidth distances structure between NVIDIA GPUs
+    (and POWER processor or NVSwitches) in the NVML backend,
+    and a XGMIBandwidth distances structure between AMD GPUs
+    in the RSMI backends.
+    - See "Topology Attributes: Distances, Memory Attributes and CPU Kinds"
+      in the documentation for details about these new distances.
+  + Add support for NUMA node 0 being offline in Linux, thanks to Jirka Hladky.
+* Build
+  + Add --with-cuda-version=<version> or look at the CUDA_VERSION
+    environment variable to find the appropriate CUDA pkg-config files.
+    Thanks to Stephen Herbein for the suggestion.
+    - Also add --with-cuda=<dir> to specify the CUDA installation path
+      manually (and its NVML and OpenCL components).
+      Thanks to Andrea Bocci for the suggestion.
+    - See "How do I enable CUDA and select which CUDA version to use?"
+      in the FAQ for details.
+* Tools
+  + lstopo now has a --windows-processor-groups option on Windows.
+  + hwloc-ps now has a --short-name option to avoid long/truncated
+    command path.
+  + hwloc-ps now has a --single-ancestor option to return a single
+    (possibly too large) object where a process is bound.
+  + hwloc-ps --pid-cmd may now query environment variables,
+    including MPI-specific variables to find out process ranks.
+
+
+Version 2.4.1
+-------------
+* Fix AMD OpenCL device locality when PCI bus or device number >= 128.
+  Thanks to Edgar Leon for reporting the issue.
+  + Applications using any of the following inline functions must
+    be recompiled to get the fix: hwloc_opencl_get_device_pci_busid()
+    hwloc_opencl_get_device_cpuset(), hwloc_opencl_get_device_osdev().
+* Fix the ranking of cpukinds on non-Windows systems,
+  thanks to Ivan Kochin for the report.
+* Fix the insertion of custom Groups after loading the topology,
+  thanks to Scott Hicks.
+* Add support for CPU0 being offline in Linux, thanks to Garrett Clay.
+* Fix missing x86 Package and Core objects FreeBSD/NetBSD.
+  Thanks to Thibault Payet and Yuri Victorovich for the report.
+* Fix the import of very large distances with heterogeneous object types.
+* Fix a memory leak in the Linux backend,
+  thanks to Perceval Anichini.
+
+
 Version 2.4.0
 -------------
 * API
--- a/src/3rdparty/hwloc/README
+++ b/src/3rdparty/hwloc/README
@ -78,7 +78,7 @@ debug and report issues.
 Questions may be sent to the users or developers mailing lists (https://
 www.open-mpi.org/community/lists/hwloc.php).

-There is also a #hwloc IRC channel on Freenode (irc.freenode.net).
+There is also a #hwloc IRC channel on Libera Chat (irc.libera.chat).



--- a/src/3rdparty/hwloc/VERSION
+++ b/src/3rdparty/hwloc/VERSION
@ -8,7 +8,7 @@
 # Please update HWLOC_VERSION* in contrib/windows/hwloc_config.h too.

 major=2
-minor=4
+minor=9
 release=0

 # greek is used for alpha or beta release tags.  If it is non-empty,
@ -22,7 +22,7 @@ greek=

 # The date when this release was created

-date="Nov 26, 2020"
+date="Dec 14, 2022"

 # If snapshot=1, then use the value from snapshot_version as the
 # entire hwloc version (i.e., ignore major, minor, release, and
@ -41,7 +41,7 @@ snapshot_version=${major}.${minor}.${release}${greek}-git
 # 2. Version numbers are described in the Libtool current:revision:age
 # format.

-libhwloc_so_version=19:0:4
+libhwloc_so_version=21:1:6
 libnetloc_so_version=0:0:0

 # Please also update the <TargetName> lines in contrib/windows/libhwloc.vcxproj
--- a/src/3rdparty/hwloc/include/hwloc.h
+++ b/src/3rdparty/hwloc/include/hwloc.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2021 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2009-2012 Université Bordeaux
 * Copyright © 2009-2020 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -29,7 +29,7 @@
 * THAT IS IN THE PDF/HTML THAT IS ***NOT*** IN hwloc.h!
 *
 * There are entire paragraph-length descriptions, discussions, and
- * pretty prictures to explain subtle corner cases, provide concrete
+ * pretty pictures to explain subtle corner cases, provide concrete
 * examples, etc.
 *
 * Please, go read the documentation.  :-)
@ -93,7 +93,7 @@ extern "C" {
 * Two stable releases of the same series usually have the same ::HWLOC_API_VERSION
 * even if their HWLOC_VERSION are different.
 */
-#define HWLOC_API_VERSION 0x00020400
+#define HWLOC_API_VERSION 0x00020800

 /** \brief Indicate at runtime which hwloc API version was used at build time.
 *
@ -346,7 +346,8 @@ typedef enum hwloc_obj_osdev_type_e {
 				  * For instance the "eth0" interface on Linux. */
  HWLOC_OBJ_OSDEV_OPENFABRICS,	/**< \brief Operating system openfabrics device.
 				  * For instance the "mlx4_0" InfiniBand HCA,
-				  * or "hfi1_0" Omni-Path interface on Linux. */
+				  * "hfi1_0" Omni-Path interface,
+				  * or "bxi0" Atos/Bull BXI HCA on Linux. */
  HWLOC_OBJ_OSDEV_DMA,		/**< \brief Operating system dma engine device.
 				  * For instance the "dma0chan0" DMA channel on Linux. */
  HWLOC_OBJ_OSDEV_COPROC	/**< \brief Operating system co-processor device.
@ -516,7 +517,7 @@ struct hwloc_obj {
                                          * objects).
                                          *
                                          * If the ::HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED configuration flag is set,
-                                          * some of these CPUs may not be allowed for binding,
+                                          * some of these CPUs may be online but not allowed for binding,
                                          * see hwloc_topology_get_allowed_cpuset().
                                          *
 					  * \note All objects have non-NULL CPU and node sets except Misc and I/O objects.
@ -548,7 +549,7 @@ struct hwloc_obj {
                                          * nodes more precisely.
                                          *
                                          * If the ::HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED configuration flag is set,
-                                          * some of these nodes may not be allowed for allocation,
+                                          * some of these nodes may be online but not allowed for allocation,
                                          * see hwloc_topology_get_allowed_nodeset().
                                          *
                                          * If there are no NUMA nodes in the machine, all the memory is close to this
@ -641,7 +642,7 @@ union hwloc_obj_attr_u {
    unsigned char revision;
    float linkspeed; /* in GB/s */
  } pcidev;
-  /** \brief Bridge specific Object Attribues */
+  /** \brief Bridge specific Object Attributes */
  struct hwloc_bridge_attr_s {
    union {
      struct hwloc_pcidev_attr_s pci;
@ -970,7 +971,7 @@ HWLOC_DECLSPEC const char * hwloc_obj_type_string (hwloc_obj_type_t type) __hwlo
 *
 * If \p size is 0, \p string may safely be \c NULL.
 *
- * \return the number of character that were actually written if not truncating,
+ * \return the number of characters that were actually written if not truncating,
 * or that would have been written (not including the ending \\0).
 */
 HWLOC_DECLSPEC int hwloc_obj_type_snprintf(char * __hwloc_restrict string, size_t size,
@ -985,7 +986,7 @@ HWLOC_DECLSPEC int hwloc_obj_type_snprintf(char * __hwloc_restrict string, size_
 *
 * If \p size is 0, \p string may safely be \c NULL.
 *
- * \return the number of character that were actually written if not truncating,
+ * \return the number of characters that were actually written if not truncating,
 * or that would have been written (not including the ending \\0).
 */
 HWLOC_DECLSPEC int hwloc_obj_attr_snprintf(char * __hwloc_restrict string, size_t size,
@ -1088,7 +1089,7 @@ HWLOC_DECLSPEC int hwloc_obj_add_info(hwloc_obj_t obj, const char *name, const c
 *
 * Some operating systems only support binding threads or processes to a single PU.
 * Others allow binding to larger sets such as entire Cores or Packages or
- * even random sets of invididual PUs. In such operating system, the scheduler
+ * even random sets of individual PUs. In such operating system, the scheduler
 * is free to run the task on one of these PU, then migrate it to another PU, etc.
 * It is often useful to call hwloc_bitmap_singlify() on the target CPU set before
 * passing it to the binding function to avoid these expensive migrations.
@ -1166,7 +1167,7 @@ typedef enum {
   * CPUs are idle, operating systems may execute the thread/process
   * on those other CPUs instead of the designated CPUs, to let them
   * progress anyway.  Strict binding means that the thread/process
-   * will _never_ execute on other cpus than the designated CPUs, even
+   * will _never_ execute on other CPUs than the designated CPUs, even
   * when those are busy with other tasks and other CPUs are idle.
   *
   * \note Depending on the operating system, strict binding may not
@ -1203,7 +1204,7 @@ typedef enum {
  HWLOC_CPUBIND_NOMEMBIND = (1<<3)
 } hwloc_cpubind_flags_t;

-/** \brief Bind current process or thread on cpus given in physical bitmap \p set.
+/** \brief Bind current process or thread on CPUs given in physical bitmap \p set.
 *
 * \return -1 with errno set to ENOSYS if the action is not supported
 * \return -1 with errno set to EXDEV if the binding cannot be enforced
@ -1212,12 +1213,13 @@ HWLOC_DECLSPEC int hwloc_set_cpubind(hwloc_topology_t topology, hwloc_const_cpus

 /** \brief Get current process or thread binding.
 *
- * Writes into \p set the physical cpuset which the process or thread (according to \e
- * flags) was last bound to.
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the process or
+ * thread (according to \e flags) was last bound to.
 */
 HWLOC_DECLSPEC int hwloc_get_cpubind(hwloc_topology_t topology, hwloc_cpuset_t set, int flags);

-/** \brief Bind a process \p pid on cpus given in physical bitmap \p set.
+/** \brief Bind a process \p pid on CPUs given in physical bitmap \p set.
 *
 * \note \p hwloc_pid_t is \p pid_t on Unix platforms,
 * and \p HANDLE on native Windows platforms.
@ -1231,6 +1233,10 @@ HWLOC_DECLSPEC int hwloc_get_cpubind(hwloc_topology_t topology, hwloc_cpuset_t s
 HWLOC_DECLSPEC int hwloc_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_cpuset_t set, int flags);

 /** \brief Get the current physical binding of process \p pid.
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the process
+ * was last bound to.
 *
 * \note \p hwloc_pid_t is \p pid_t on Unix platforms,
 * and \p HANDLE on native Windows platforms.
@ -1244,7 +1250,7 @@ HWLOC_DECLSPEC int hwloc_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t
 HWLOC_DECLSPEC int hwloc_get_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, int flags);

 #ifdef hwloc_thread_t
-/** \brief Bind a thread \p thread on cpus given in physical bitmap \p set.
+/** \brief Bind a thread \p thread on CPUs given in physical bitmap \p set.
 *
 * \note \p hwloc_thread_t is \p pthread_t on Unix platforms,
 * and \p HANDLE on native Windows platforms.
@ -1256,6 +1262,10 @@ HWLOC_DECLSPEC int hwloc_set_thread_cpubind(hwloc_topology_t topology, hwloc_thr

 #ifdef hwloc_thread_t
 /** \brief Get the current physical binding of thread \p tid.
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the thread
+ * was last bound to.
 *
 * \note \p hwloc_thread_t is \p pthread_t on Unix platforms,
 * and \p HANDLE on native Windows platforms.
@ -1266,6 +1276,10 @@ HWLOC_DECLSPEC int hwloc_get_thread_cpubind(hwloc_topology_t topology, hwloc_thr
 #endif

 /** \brief Get the last physical CPU where the current process or thread ran.
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the process or
+ * thread (according to \e flags) last ran on.
 *
 * The operating system may move some tasks from one processor
 * to another at any time according to their binding,
@ -1281,6 +1295,10 @@ HWLOC_DECLSPEC int hwloc_get_thread_cpubind(hwloc_topology_t topology, hwloc_thr
 HWLOC_DECLSPEC int hwloc_get_last_cpu_location(hwloc_topology_t topology, hwloc_cpuset_t set, int flags);

 /** \brief Get the last physical CPU where a process ran.
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the process
+ * last ran on.
 *
 * The operating system may move some tasks from one processor
 * to another at any time according to their binding,
@ -1511,6 +1529,9 @@ HWLOC_DECLSPEC int hwloc_set_membind(hwloc_topology_t topology, hwloc_const_bitm
 /** \brief Query the default memory binding policy and physical locality of the
 * current process or thread.
 *
+ * The bitmap \p set (previously allocated by the caller)
+ * is filled with the process or thread memory binding.
+ *
 * This function has two output parameters: \p set and \p policy.
 * The values returned in these parameters depend on both the \p flags
 * passed in and the current memory binding policies and nodesets in
@ -1571,6 +1592,9 @@ HWLOC_DECLSPEC int hwloc_set_proc_membind(hwloc_topology_t topology, hwloc_pid_t
 /** \brief Query the default memory binding policy and physical locality of the
 * specified process.
 *
+ * The bitmap \p set (previously allocated by the caller)
+ * is filled with the process memory binding.
+ *
 * This function has two output parameters: \p set and \p policy.
 * The values returned in these parameters depend on both the \p flags
 * passed in and the current memory binding policies and nodesets in
@ -1624,6 +1648,9 @@ HWLOC_DECLSPEC int hwloc_set_area_membind(hwloc_topology_t topology, const void
 /** \brief Query the CPUs near the physical NUMA node(s) and binding policy of
 * the memory identified by (\p addr, \p len ).
 *
+ * The bitmap \p set (previously allocated by the caller)
+ * is filled with the memory area binding.
+ *
 * This function has two output parameters: \p set and \p policy.
 * The values returned in these parameters depend on both the \p flags
 * passed in and the memory binding policies and nodesets of the pages
@ -1652,7 +1679,8 @@ HWLOC_DECLSPEC int hwloc_get_area_membind(hwloc_topology_t topology, const void

 /** \brief Get the NUMA nodes where memory identified by (\p addr, \p len ) is physically allocated.
 *
- * Fills \p set according to the NUMA nodes where the memory area pages
+ * The bitmap \p set (previously allocated by the caller)
+ * is filled according to the NUMA nodes where the memory area pages
 * are physically allocated. If no page is actually allocated yet,
 * \p set may be empty.
 *
@ -1698,9 +1726,12 @@ HWLOC_DECLSPEC void *hwloc_alloc_membind(hwloc_topology_t topology, size_t len,

 /** \brief Allocate some memory on NUMA memory nodes specified by \p set
 *
- * This is similar to hwloc_alloc_membind_nodeset() except that it is allowed to change
- * the current memory binding policy, thus providing more binding support, at
- * the expense of changing the current state.
+ * First, try to allocate properly with hwloc_alloc_membind().
+ * On failure, the current process or thread memory binding policy
+ * is changed with hwloc_set_membind() before allocating memory.
+ * Thus this function works in more cases, at the expense of changing
+ * the current state (possibly affecting future allocations that
+ * would not specify any policy).
 *
 * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset.
 * Otherwise it's a cpuset.
@ -1883,8 +1914,9 @@ HWLOC_DECLSPEC int hwloc_topology_set_components(hwloc_topology_t __hwloc_restri
 enum hwloc_topology_flags_e {
 /** \brief Detect the whole system, ignore reservations, include disallowed objects.
   *
-   * Gather all resources, even if some were disabled by the administrator.
+   * Gather all online resources, even if some were disabled by the administrator.
   * For instance, ignore Linux Cgroup/Cpusets and gather all processors and memory nodes.
+   * However offline PUs and NUMA nodes are still ignored.
   *
   * When this flag is not set, PUs and NUMA nodes that are disallowed are not added to the topology.
   * Parent objects (package, core, cache, etc.) are added only if some of their children are allowed.
@ -1966,17 +1998,100 @@ enum hwloc_topology_flags_e {
   * hwloc and machine support.
   *
   */
-  HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT = (1UL<<3)
+  HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT = (1UL<<3),
+
+  /** \brief Do not consider resources outside of the process CPU binding.
+   *
+   * If the binding of the process is limited to a subset of cores,
+   * ignore the other cores during discovery.
+   *
+   * The resulting topology is identical to what a call to hwloc_topology_restrict()
+   * would generate, but this flag also prevents hwloc from ever touching other
+   * resources during the discovery.
+   *
+   * This flag especially tells the x86 backend to never temporarily
+   * rebind a thread on any excluded core. This is useful on Windows
+   * because such temporary rebinding can change the process binding.
+   * Another use-case is to avoid cores that would not be able to
+   * perform the hwloc discovery anytime soon because they are busy
+   * executing some high-priority real-time tasks.
+   *
+   * If process CPU binding is not supported,
+   * the thread CPU binding is considered instead if supported,
+   * or the flag is ignored.
+   *
+   * This flag requires ::HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM as well
+   * since binding support is required.
+   */
+  HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING = (1UL<<4),
+
+  /** \brief Do not consider resources outside of the process memory binding.
+   *
+   * If the binding of the process is limited to a subset of NUMA nodes,
+   * ignore the other NUMA nodes during discovery.
+   *
+   * The resulting topology is identical to what a call to hwloc_topology_restrict()
+   * would generate, but this flag also prevents hwloc from ever touching other
+   * resources during the discovery.
+   *
+   * This flag is meant to be used together with
+   * ::HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING when both cores
+   * and NUMA nodes should be ignored outside of the process binding.
+   *
+   * If process memory binding is not supported,
+   * the thread memory binding is considered instead if supported,
+   * or the flag is ignored.
+   *
+   * This flag requires ::HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM as well
+   * since binding support is required.
+   */
+  HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING = (1UL<<5),
+
+  /** \brief Do not ever modify the process or thread binding during discovery.
+   *
+   * This flag disables all hwloc discovery steps that require a change of
+   * the process or thread binding. This currently only affects the x86
+   * backend which gets entirely disabled.
+   *
+   * This is useful when hwloc_topology_load() is called while the
+   * application also creates additional threads or modifies the binding.
+   *
+   * This flag is also a strict way to make sure the process binding will
+   * not change to due thread binding changes on Windows
+   * (see ::HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING).
+   */
+  HWLOC_TOPOLOGY_FLAG_DONT_CHANGE_BINDING = (1UL<<6),
+
+  /** \brief Ignore distances.
+   *
+   * Ignore distance information from the operating systems (and from XML)
+   * and hence do not use distances for grouping.
+   */
+  HWLOC_TOPOLOGY_FLAG_NO_DISTANCES = (1UL<<7),
+
+  /** \brief Ignore memory attributes.
+   *
+   * Ignore memory attribues from the operating systems (and from XML).
+   */
+  HWLOC_TOPOLOGY_FLAG_NO_MEMATTRS = (1UL<<8),
+
+  /** \brief Ignore CPU Kinds.
+   *
+   * Ignore CPU kind information from the operating systems (and from XML).
+   */
+  HWLOC_TOPOLOGY_FLAG_NO_CPUKINDS = (1UL<<9)
 };

 /** \brief Set OR'ed flags to non-yet-loaded topology.
 *
 * Set a OR'ed set of ::hwloc_topology_flags_e onto a topology that was not yet loaded.
 *
- * If this function is called multiple times, the last invokation will erase
+ * If this function is called multiple times, the last invocation will erase
 * and replace the set of flags that was previously set.
 *
- * The flags set in a topology may be retrieved with hwloc_topology_get_flags()
+ * By default, no flags are set (\c 0).
+ *
+ * The flags set in a topology may be retrieved with hwloc_topology_get_flags().
 */
 HWLOC_DECLSPEC int hwloc_topology_set_flags (hwloc_topology_t topology, unsigned long flags);

@ -1984,6 +2099,9 @@ HWLOC_DECLSPEC int hwloc_topology_set_flags (hwloc_topology_t topology, unsigned
 *
 * Get the OR'ed set of ::hwloc_topology_flags_e of a topology.
 *
+ * If hwloc_topology_set_flags() was not called earlier,
+ * no flags are set (\c 0 is returned).
+ *
 * \return the flags previously set with hwloc_topology_set_flags().
 */
 HWLOC_DECLSPEC unsigned long hwloc_topology_get_flags (hwloc_topology_t topology);
--- a/src/3rdparty/hwloc/include/hwloc/autogen/config.h
+++ b/src/3rdparty/hwloc/include/hwloc/autogen/config.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2009-2012 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -11,10 +11,10 @@
 #ifndef HWLOC_CONFIG_H
 #define HWLOC_CONFIG_H

-#define HWLOC_VERSION "2.4.1"
+#define HWLOC_VERSION "2.9.0"
 #define HWLOC_VERSION_MAJOR 2
-#define HWLOC_VERSION_MINOR 4
-#define HWLOC_VERSION_RELEASE 1
+#define HWLOC_VERSION_MINOR 9
+#define HWLOC_VERSION_RELEASE 0
 #define HWLOC_VERSION_GREEK ""

 #define __hwloc_restrict
--- a/src/3rdparty/hwloc/include/hwloc/bitmap.h
+++ b/src/3rdparty/hwloc/include/hwloc/bitmap.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2009-2012 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -112,7 +112,7 @@ HWLOC_DECLSPEC int hwloc_bitmap_copy(hwloc_bitmap_t dst, hwloc_const_bitmap_t sr
 *
 * If \p buflen is 0, \p buf may safely be \c NULL.
 *
- * \return the number of character that were actually written if not truncating,
+ * \return the number of characters that were actually written if not truncating,
 * or that would have been written (not including the ending \\0).
 */
 HWLOC_DECLSPEC int hwloc_bitmap_snprintf(char * __hwloc_restrict buf, size_t buflen, hwloc_const_bitmap_t bitmap);
@ -137,7 +137,7 @@ HWLOC_DECLSPEC int hwloc_bitmap_sscanf(hwloc_bitmap_t bitmap, const char * __hwl
 *
 * If \p buflen is 0, \p buf may safely be \c NULL.
 *
- * \return the number of character that were actually written if not truncating,
+ * \return the number of characters that were actually written if not truncating,
 * or that would have been written (not including the ending \\0).
 */
 HWLOC_DECLSPEC int hwloc_bitmap_list_snprintf(char * __hwloc_restrict buf, size_t buflen, hwloc_const_bitmap_t bitmap);
@ -161,7 +161,7 @@ HWLOC_DECLSPEC int hwloc_bitmap_list_sscanf(hwloc_bitmap_t bitmap, const char *
 *
 * If \p buflen is 0, \p buf may safely be \c NULL.
 *
- * \return the number of character that were actually written if not truncating,
+ * \return the number of characters that were actually written if not truncating,
 * or that would have been written (not including the ending \\0).
 */
 HWLOC_DECLSPEC int hwloc_bitmap_taskset_snprintf(char * __hwloc_restrict buf, size_t buflen, hwloc_const_bitmap_t bitmap);
@ -357,11 +357,11 @@ HWLOC_DECLSPEC int hwloc_bitmap_last_unset(hwloc_const_bitmap_t bitmap) __hwloc_
 * The loop must start with hwloc_bitmap_foreach_begin() and end
 * with hwloc_bitmap_foreach_end() followed by a terminating ';'.
 *
- * \p index is the loop variable; it should be an unsigned int.  The
- * first iteration will set \p index to the lowest index in the bitmap.
+ * \p id is the loop variable; it should be an unsigned int.  The
+ * first iteration will set \p id to the lowest index in the bitmap.
 * Successive iterations will iterate through, in order, all remaining
 * indexes set in the bitmap.  To be specific: each iteration will return a
- * value for \p index such that hwloc_bitmap_isset(bitmap, index) is true.
+ * value for \p id such that hwloc_bitmap_isset(bitmap, id) is true.
 *
 * The assert prevents the loop from being infinite if the bitmap is infinitely set.
 *
--- a/src/3rdparty/hwloc/include/hwloc/cpukinds.h
+++ b/src/3rdparty/hwloc/include/hwloc/cpukinds.h
@ -1,5 +1,5 @@
 /*
- * Copyright © 2020 Inria.  All rights reserved.
+ * Copyright © 2020-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@ -42,18 +42,23 @@ extern "C" {
 * (for instance the "CoreType" and "FrequencyMaxMHz",
 *  see \ref topoattrs_cpukinds).
 *
- * A higher efficiency value means intrinsic greater performance
+ * A higher efficiency value means greater intrinsic performance
 * (and possibly less performance/power efficiency).
- * Kinds with lower efficiency are ranked first:
+ * Kinds with lower efficiency values are ranked first:
 * Passing 0 as \p kind_index to hwloc_cpukinds_get_info() will
- * return information about the less efficient CPU kind.
+ * return information about the CPU kind with lower performance
+ * but higher energy-efficiency.
+ * Higher \p kind_index values would rather return information
+ * about power-hungry high-performance cores.
 *
- * When available, efficiency values are gathered from the operating
- * system (when \p cpukind_efficiency is set in the
- * struct hwloc_topology_discovery_support array, only on Windows 10 for now).
- * Otherwise hwloc tries to compute efficiencies
- * by comparing CPU kinds using frequencies (on ARM),
- * or core types and frequencies (on other architectures).
+ * When available, efficiency values are gathered from the operating system.
+ * If so, \p cpukind_efficiency is set in the struct hwloc_topology_discovery_support array.
+ * This is currently available on Windows 10, Mac OS X (Darwin),
+ * and on some Linux platforms where core "capacity" is exposed in sysfs.
+ *
+ * If the operating system does not expose core efficiencies natively,
+ * hwloc tries to compute efficiencies by comparing CPU kinds using
+ * frequencies (on ARM), or core types and frequencies (on other architectures).
 * The environment variable HWLOC_CPUKINDS_RANKING may be used
 * to change this heuristics, see \ref envvar.
 *
--- a/src/3rdparty/hwloc/include/hwloc/cuda.h
+++ b/src/3rdparty/hwloc/include/hwloc/cuda.h
@ -1,5 +1,5 @@
 /*
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2021 Inria.  All rights reserved.
 * Copyright © 2010-2011 Université Bordeaux
 * Copyright © 2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -75,7 +75,7 @@ hwloc_cuda_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unused
 /** \brief Get the CPU set of processors that are physically
 * close to device \p cudevice.
 *
- * Return the CPU set describing the locality of the CUDA device \p cudevice.
+ * Store in \p set the CPU-set describing the locality of the CUDA device \p cudevice.
 *
 * Topology \p topology and device \p cudevice must match the local machine.
 * I/O devices detection and the CUDA component are not needed in the topology.
@ -120,8 +120,8 @@ hwloc_cuda_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
 /** \brief Get the hwloc PCI device object corresponding to the
 * CUDA device \p cudevice.
 *
- * Return the PCI device object describing the CUDA device \p cudevice.
- * Return NULL if there is none.
+ * \return The hwloc PCI device object describing the CUDA device \p cudevice.
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p cudevice must match the local machine.
 * I/O devices detection must be enabled in topology \p topology.
@ -140,8 +140,8 @@ hwloc_cuda_get_device_pcidev(hwloc_topology_t topology, CUdevice cudevice)

 /** \brief Get the hwloc OS device object corresponding to CUDA device \p cudevice.
 *
- * Return the hwloc OS device object that describes the given
- * CUDA device \p cudevice. Return NULL if there is none.
+ * \return The hwloc OS device object that describes the given CUDA device \p cudevice.
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p cudevice must match the local machine.
 * I/O devices detection and the CUDA component must be enabled in the topology.
@ -183,8 +183,8 @@ hwloc_cuda_get_device_osdev(hwloc_topology_t topology, CUdevice cudevice)
 /** \brief Get the hwloc OS device object corresponding to the
 * CUDA device whose index is \p idx.
 *
- * Return the OS device object describing the CUDA device whose
- * index is \p idx. Return NULL if there is none.
+ * \return The hwloc OS device object describing the CUDA device whose index is \p idx.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
--- a/src/3rdparty/hwloc/include/hwloc/cudart.h
+++ b/src/3rdparty/hwloc/include/hwloc/cudart.h
@ -1,5 +1,5 @@
 /*
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2021 Inria.  All rights reserved.
 * Copyright © 2010-2011 Université Bordeaux
 * Copyright © 2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -72,7 +72,7 @@ hwloc_cudart_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unus
 /** \brief Get the CPU set of processors that are physically
 * close to device \p idx.
 *
- * Return the CPU set describing the locality of the CUDA device
+ * Store in \p set the CPU-set describing the locality of the CUDA device
 * whose index is \p idx.
 *
 * Topology \p topology and device \p idx must match the local machine.
@ -117,8 +117,8 @@ hwloc_cudart_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unuse
 /** \brief Get the hwloc PCI device object corresponding to the
 * CUDA device whose index is \p idx.
 *
- * Return the PCI device object describing the CUDA device whose
- * index is \p idx. Return NULL if there is none.
+ * \return The hwloc PCI device object describing the CUDA device whose index is \p idx.
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p idx must match the local machine.
 * I/O devices detection must be enabled in topology \p topology.
@ -138,8 +138,8 @@ hwloc_cudart_get_device_pcidev(hwloc_topology_t topology, int idx)
 /** \brief Get the hwloc OS device object corresponding to the
 * CUDA device whose index is \p idx.
 *
- * Return the OS device object describing the CUDA device whose
- * index is \p idx. Return NULL if there is none.
+ * \return The hwloc OS device object describing the CUDA device whose index is \p idx.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
--- a/src/3rdparty/hwloc/include/hwloc/deprecated.h
+++ b/src/3rdparty/hwloc/include/hwloc/deprecated.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2018 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2009-2012 Université Bordeaux
 * Copyright © 2009-2010 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -30,6 +30,15 @@ extern "C" {
 /* backward compat with v1.10 before Node->NUMANode clarification */
 #define HWLOC_OBJ_NODE HWLOC_OBJ_NUMANODE

+/** \brief Add a distances structure.
+ *
+ * Superseded by hwloc_distances_add_create()+hwloc_distances_add_values()+hwloc_distances_add_commit()
+ * in v2.5.
+ */
+HWLOC_DECLSPEC int hwloc_distances_add(hwloc_topology_t topology,
+				       unsigned nbobjs, hwloc_obj_t *objs, hwloc_uint64_t *values,
+				       unsigned long kind, unsigned long flags) __hwloc_attribute_deprecated;
+
 /** \brief Insert a misc object by parent.
 *
 * Identical to hwloc_topology_insert_misc_object().
@ -46,7 +55,7 @@ hwloc_topology_insert_misc_object_by_parent(hwloc_topology_t topology, hwloc_obj
 *
 * If \p size is 0, \p string may safely be \c NULL.
 *
- * \return the number of character that were actually written if not truncating,
+ * \return the number of characters that were actually written if not truncating,
 * or that would have been written (not including the ending \\0).
 */
 static __hwloc_inline int
--- a/src/3rdparty/hwloc/include/hwloc/distances.h
+++ b/src/3rdparty/hwloc/include/hwloc/distances.h
@ -1,5 +1,5 @@
 /*
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2022 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@ -35,9 +35,20 @@ extern "C" {
 * from a core in another node.
 * The corresponding kind is ::HWLOC_DISTANCES_KIND_FROM_OS | ::HWLOC_DISTANCES_KIND_FROM_USER.
 * The name of this distances structure is "NUMALatency".
+ * Others distance structures include and "XGMIBandwidth", "XGMIHops",
+ * "XeLinkBandwidth" and "NVLinkBandwidth".
 *
 * The matrix may also contain bandwidths between random sets of objects,
 * possibly provided by the user, as specified in the \p kind attribute.
+ *
+ * Pointers \p objs and \p values should not be replaced, reallocated, freed, etc.
+ * However callers are allowed to modify \p kind as well as the contents
+ * of \p objs and \p values arrays.
+ * For instance, if there is a single NUMA node per Package,
+ * hwloc_get_obj_with_same_locality() may be used to convert between them
+ * and replace NUMA nodes in the \p objs array with the corresponding Packages.
+ * See also hwloc_distances_transform() for applying some transformations
+ * to the structure.
 */
 struct hwloc_distances_s {
  unsigned nbobjs;		/**< \brief Number of objects described by the distance matrix. */
@ -91,6 +102,8 @@ enum hwloc_distances_kind_e {
  HWLOC_DISTANCES_KIND_MEANS_BANDWIDTH = (1UL<<3),

  /** \brief This distances structure covers objects of different types.
+   * This may apply to the "NVLinkBandwidth" structure in presence
+   * of a NVSwitch or POWER processor NVLink port.
   * \hideinitializer
   */
  HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES = (1UL<<4)
@ -147,6 +160,8 @@ hwloc_distances_get_by_type(hwloc_topology_t topology, hwloc_obj_type_t type,
 * Usually only one distances structure may match a given name.
 *
 * The name of the most common structure is "NUMALatency".
+ * Others include "XGMIBandwidth", "XGMIHops", "XeLinkBandwidth",
+ * and "NVLinkBandwidth".
 */
 HWLOC_DECLSPEC int
 hwloc_distances_get_by_name(hwloc_topology_t topology, const char *name,
@ -168,6 +183,85 @@ hwloc_distances_get_name(hwloc_topology_t topology, struct hwloc_distances_s *di
 HWLOC_DECLSPEC void
 hwloc_distances_release(hwloc_topology_t topology, struct hwloc_distances_s *distances);

+/** \brief Transformations of distances structures. */
+enum hwloc_distances_transform_e {
+  /** \brief Remove \c NULL objects from the distances structure.
+   *
+   * Every object that was replaced with \c NULL in the \p objs array
+   * is removed and the \p values array is updated accordingly.
+   *
+   * At least \c 2 objects must remain, otherwise hwloc_distances_transform()
+   * will return \c -1 with \p errno set to \c EINVAL.
+   *
+   * \p kind will be updated with or without ::HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES
+   * according to the remaining objects.
+   *
+   * \hideinitializer
+   */
+  HWLOC_DISTANCES_TRANSFORM_REMOVE_NULL = 0,
+
+  /** \brief Replace bandwidth values with a number of links.
+   *
+   * Usually all values will be either \c 0 (no link) or \c 1 (one link).
+   * However some matrices could get larger values if some pairs of
+   * peers are connected by different numbers of links.
+   *
+   * Values on the diagonal are set to \c 0.
+   *
+   * This transformation only applies to bandwidth matrices.
+   *
+   * \hideinitializer
+   */
+  HWLOC_DISTANCES_TRANSFORM_LINKS = 1,
+
+  /** \brief Merge switches with multiple ports into a single object.
+   * This currently only applies to NVSwitches where GPUs seem connected to different
+   * separate switch ports in the NVLinkBandwidth matrix. This transformation will
+   * replace all of them with the same port connected to all GPUs.
+   * Other ports are removed by applying ::HWLOC_DISTANCES_TRANSFORM_REMOVE_NULL internally.
+   * \hideinitializer
+   */
+  HWLOC_DISTANCES_TRANSFORM_MERGE_SWITCH_PORTS = 2,
+
+  /** \brief Apply a transitive closure to the matrix to connect objects across switches.
+   * This currently only applies to GPUs and NVSwitches in the NVLinkBandwidth matrix.
+   * All pairs of GPUs will be reported as directly connected.
+   * \hideinitializer
+   */
+  HWLOC_DISTANCES_TRANSFORM_TRANSITIVE_CLOSURE = 3
+};
+
+/** \brief Apply a transformation to a distances structure.
+ *
+ * Modify a distances structure that was previously obtained with
+ * hwloc_distances_get() or one of its variants.
+ *
+ * This modifies the local copy of the distances structures but does
+ * not modify the distances information stored inside the topology
+ * (retrieved by another call to hwloc_distances_get() or exported to XML).
+ * To do so, one should add a new distances structure with same
+ * name, kind, objects and values (see \ref hwlocality_distances_add)
+ * and then remove this old one with hwloc_distances_release_remove().
+ *
+ * \p transform must be one of the transformations listed
+ * in ::hwloc_distances_transform_e.
+ *
+ * These transformations may modify the contents of the \p objs or \p values arrays.
+ *
+ * \p transform_attr must be \c NULL for now.
+ *
+ * \p flags must be \c 0 for now.
+ *
+ * \note Objects in distances array \p objs may be directly modified
+ * in place without using hwloc_distances_transform().
+ * One may use hwloc_get_obj_with_same_locality() to easily convert
+ * between similar objects of different types.
+ */
+HWLOC_DECLSPEC int hwloc_distances_transform(hwloc_topology_t topology, struct hwloc_distances_s *distances,
+                                             enum hwloc_distances_transform_e transform,
+                                             void *transform_attr,
+                                             unsigned long flags);
+
 /** @} */


@ -215,13 +309,84 @@ hwloc_distances_obj_pair_values(struct hwloc_distances_s *distances,



-/** \defgroup hwlocality_distances_add Add or remove distances between objects
+/** \defgroup hwlocality_distances_add Add distances between objects
+ *
+ * The usual way to add distances is:
+ * \code
+ * hwloc_distances_add_handle_t handle;
+ * int err = -1;
+ * handle = hwloc_distances_add_create(topology, "name", kind, 0);
+ * if (handle) {
+ *   err = hwloc_distances_add_values(topology, handle, nbobjs, objs, values, 0);
+ *   if (!err)
+ *     err = hwloc_distances_add_commit(topology, handle, flags);
+ * }
+ * \endcode
+ * If \p err is \c 0 at the end, then addition was successful.
+ *
 * @{
 */

+/** \brief Handle to a new distances structure during its addition to the topology. */
+typedef void * hwloc_distances_add_handle_t;
+
+/** \brief Create a new empty distances structure.
+ *
+ * Create an empty distances structure
+ * to be filled with hwloc_distances_add_values()
+ * and then committed with hwloc_distances_add_commit().
+ *
+ * Parameter \p name is optional, it may be \c NULL.
+ * Otherwise, it will be copied internally and may later be freed by the caller.
+ *
+ * \p kind specifies the kind of distance as a OR'ed set of ::hwloc_distances_kind_e.
+ * Kind ::HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES will be automatically set
+ * according to objects having different types in hwloc_distances_add_values().
+ *
+ * \p flags must be \c 0 for now.
+ *
+ * \return A hwloc_distances_add_handle_t that should then be passed
+ * to hwloc_distances_add_values() and hwloc_distances_add_commit().
+ *
+ * \return \c NULL on error.
+ */
+HWLOC_DECLSPEC hwloc_distances_add_handle_t
+hwloc_distances_add_create(hwloc_topology_t topology,
+                           const char *name, unsigned long kind,
+                           unsigned long flags);
+
+/** \brief Specify the objects and values in a new empty distances structure.
+ *
+ * Specify the objects and values for a new distances structure
+ * that was returned as a handle by hwloc_distances_add_create().
+ * The structure must then be committed with hwloc_distances_add_commit().
+ *
+ * The number of objects is \p nbobjs and the array of objects is \p objs.
+ * Distance values are stored as a one-dimension array in \p values.
+ * The distance from object i to object j is in slot i*nbobjs+j.
+ *
+ * \p nbobjs must be at least 2.
+ *
+ * Arrays \p objs and \p values will be copied internally,
+ * they may later be freed by the caller.
+ *
+ * On error, the temporary distances structure and its content are destroyed.
+ *
+ * \p flags must be \c 0 for now.
+ *
+ * \return \c 0 on success.
+ * \return \c -1 on error.
+ */
+HWLOC_DECLSPEC int hwloc_distances_add_values(hwloc_topology_t topology,
+                                              hwloc_distances_add_handle_t handle,
+                                              unsigned nbobjs, hwloc_obj_t *objs,
+                                              hwloc_uint64_t *values,
+                                              unsigned long flags);
+
 /** \brief Flags for adding a new distances to a topology. */
 enum hwloc_distances_add_flag_e {
  /** \brief Try to group objects based on the newly provided distance information.
+   * This is ignored for distances between objects of different types.
   * \hideinitializer
   */
  HWLOC_DISTANCES_ADD_FLAG_GROUP = (1UL<<0),
@ -233,23 +398,33 @@ enum hwloc_distances_add_flag_e {
  HWLOC_DISTANCES_ADD_FLAG_GROUP_INACCURATE = (1UL<<1)
 };

-/** \brief Provide a new distance matrix.
+/** \brief Commit a new distances structure.
 *
- * Provide the matrix of distances between a set of objects given by \p nbobjs
- * and the \p objs array. \p nbobjs must be at least 2.
- * The distances are stored as a one-dimension array in \p values.
- * The distance from object i to object j is in slot i*nbobjs+j.
+ * This function finalizes the distances structure and inserts in it the topology.
 *
- * \p kind specifies the kind of distance as a OR'ed set of ::hwloc_distances_kind_e.
- * Kind ::HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES will be automatically added
- * if objects of different types are given.
+ * Parameter \p handle was previously returned by hwloc_distances_add_create().
+ * Then objects and values were specified with hwloc_distances_add_values().
 *
 * \p flags configures the behavior of the function using an optional OR'ed set of
 * ::hwloc_distances_add_flag_e.
+ * It may be used to request the grouping of existing objects based on distances.
+ *
+ * On error, the temporary distances structure and its content are destroyed.
+ *
+ * \return \c 0 on success.
+ * \return \c -1 on error.
+ */
+HWLOC_DECLSPEC int hwloc_distances_add_commit(hwloc_topology_t topology,
+                                              hwloc_distances_add_handle_t handle,
+                                              unsigned long flags);
+
+/** @} */
+
+
+
+/** \defgroup hwlocality_distances_remove Remove distances between objects
+ * @{
 */
-HWLOC_DECLSPEC int hwloc_distances_add(hwloc_topology_t topology,
-				       unsigned nbobjs, hwloc_obj_t *objs, hwloc_uint64_t *values,
-				       unsigned long kind, unsigned long flags);

 /** \brief Remove all distance matrices from a topology.
 *
--- a/src/3rdparty/hwloc/include/hwloc/gl.h
+++ b/src/3rdparty/hwloc/include/hwloc/gl.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2012 Blue Brain Project, EPFL. All rights reserved.
- * Copyright © 2012-2013 Inria.  All rights reserved.
+ * Copyright © 2012-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@ -39,9 +39,9 @@ extern "C" {
 /** \brief Get the hwloc OS device object corresponding to the
 * OpenGL display given by port and device index.
 *
- * Return the OS device object describing the OpenGL display
+ * \return The hwloc OS device object describing the OpenGL display
 * whose port (server) is \p port and device (screen) is \p device.
- * Return NULL if there is none.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
@ -70,9 +70,9 @@ hwloc_gl_get_display_osdev_by_port_device(hwloc_topology_t topology,
 /** \brief Get the hwloc OS device object corresponding to the
 * OpenGL display given by name.
 *
- * Return the OS device object describing the OpenGL display
+ * \return The hwloc OS device object describing the OpenGL display
 * whose name is \p name, built as ":port.device" such as ":0.0" .
- * Return NULL if there is none.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
@ -99,9 +99,10 @@ hwloc_gl_get_display_osdev_by_name(hwloc_topology_t topology,
 /** \brief Get the OpenGL display port and device corresponding
 * to the given hwloc OS object.
 *
- * Return the OpenGL display port (server) in \p port and device (screen)
+ * Retrieves the OpenGL display port (server) in \p port and device (screen)
 * in \p screen that correspond to the given hwloc OS device object.
- * Return \c -1 if there is none.
+ *
+ * \return \c -1 if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
--- a/src/3rdparty/hwloc/include/hwloc/helper.h
+++ b/src/3rdparty/hwloc/include/hwloc/helper.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2009-2012 Université Bordeaux
 * Copyright © 2009-2010 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -807,6 +807,49 @@ hwloc_get_obj_below_array_by_type (hwloc_topology_t topology, int nr, hwloc_obj_
  return obj;
 }

+/** \brief Return an object of a different type with same locality.
+ *
+ * If the source object \p src is a normal or memory type,
+ * this function returns an object of type \p type with same
+ * CPU and node sets, either below or above in the hierarchy.
+ *
+ * If the source object \p src is a PCI or an OS device within a PCI
+ * device, the function may either return that PCI device, or another
+ * OS device in the same PCI parent.
+ * This may for instance be useful for converting between OS devices
+ * such as "nvml0" or "rsmi1" used in distance structures into the
+ * the PCI device, or the CUDA or OpenCL OS device that correspond
+ * to the same physical card.
+ *
+ * If not \c NULL, parameter \p subtype only select objects whose
+ * subtype attribute exists and is \p subtype (case-insensitively),
+ * for instance "OpenCL" or "CUDA".
+ *
+ * If not \c NULL, parameter \p nameprefix only selects objects whose
+ * name attribute exists and starts with \p nameprefix (case-insensitively),
+ * for instance "rsmi" for matching "rsmi0".
+ *
+ * If multiple objects match, the first one is returned.
+ *
+ * This function will not walk the hierarchy across bridges since
+ * the PCI locality may become different.
+ * This function cannot also convert between normal/memory objects
+ * and I/O or Misc objects.
+ *
+ * \p flags must be \c 0 for now.
+ *
+ * \return An object with identical locality,
+ * matching \p subtype and \p nameprefix if any.
+ *
+ * \return \c NULL if no matching object could be found,
+ * or if the source object and target type are incompatible,
+ * for instance if converting between CPU and I/O objects.
+ */
+HWLOC_DECLSPEC hwloc_obj_t
+hwloc_get_obj_with_same_locality(hwloc_topology_t topology, hwloc_obj_t src,
+                                 hwloc_obj_type_t type, const char *subtype, const char *nameprefix,
+                                 unsigned long flags);
+
 /** @} */


@ -843,9 +886,6 @@ enum hwloc_distrib_flags_e {
 * \p flags should be 0 or a OR'ed set of ::hwloc_distrib_flags_e.
 *
 * \note This function requires the \p roots objects to have a CPU set.
- *
- * \note This function replaces the now deprecated hwloc_distribute()
- * and hwloc_distributev() functions.
 */
 static __hwloc_inline int
 hwloc_distrib(hwloc_topology_t topology,
--- a/src/3rdparty/hwloc/include/hwloc/intel-mic.h
+++ b/src/3rdparty/hwloc/include/hwloc/intel-mic.h
@ -1,136 +0,0 @@
-/*
- * Copyright © 2013-2016 Inria.  All rights reserved.
- * See COPYING in top-level directory.
- */
-
-/** \file
- * \brief Macros to help interaction between hwloc and Intel Xeon Phi (MIC).
- *
- * Applications that use both hwloc and Intel Xeon Phi (MIC) may want to
- * include this file so as to get topology information for MIC devices.
- */
-
-#ifndef HWLOC_INTEL_MIC_H
-#define HWLOC_INTEL_MIC_H
-
-#include "hwloc.h"
-#include "hwloc/autogen/config.h"
-#include "hwloc/helper.h"
-
-#ifdef HWLOC_LINUX_SYS
-#include "hwloc/linux.h"
-
-#include <dirent.h>
-#include <string.h>
-#endif
-
-#include <stdio.h>
-#include <stdlib.h>
-
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-
-/** \defgroup hwlocality_intel_mic Interoperability with Intel Xeon Phi (MIC)
- *
- * This interface offers ways to retrieve topology information about
- * Intel Xeon Phi (MIC) devices.
- *
- * @{
- */
-
-/** \brief Get the CPU set of logical processors that are physically
- * close to MIC device whose index is \p idx.
- *
- * Return the CPU set describing the locality of the MIC device whose index is \p idx.
- *
- * Topology \p topology and device index \p idx must match the local machine.
- * I/O devices detection is not needed in the topology.
- *
- * The function only returns the locality of the device.
- * If more information about the device is needed, OS objects should
- * be used instead, see hwloc_intel_mic_get_device_osdev_by_index().
- *
- * This function is currently only implemented in a meaningful way for
- * Linux; other systems will simply get a full cpuset.
- */
-static __hwloc_inline int
-hwloc_intel_mic_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
-				  int idx __hwloc_attribute_unused,
-				  hwloc_cpuset_t set)
-{
-#ifdef HWLOC_LINUX_SYS
-	/* If we're on Linux, use the sysfs mechanism to get the local cpus */
-#define HWLOC_INTEL_MIC_DEVICE_SYSFS_PATH_MAX 128
-	char path[HWLOC_INTEL_MIC_DEVICE_SYSFS_PATH_MAX];
-	DIR *sysdir = NULL;
-	struct dirent *dirent;
-	unsigned pcibus, pcidev, pcifunc;
-
-	if (!hwloc_topology_is_thissystem(topology)) {
-		errno = EINVAL;
-		return -1;
-	}
-
-	sprintf(path, "/sys/class/mic/mic%d", idx);
-	sysdir = opendir(path);
-	if (!sysdir)
-		return -1;
-
-	while ((dirent = readdir(sysdir)) != NULL) {
-		if (sscanf(dirent->d_name, "pci_%02x:%02x.%02x", &pcibus, &pcidev, &pcifunc) == 3) {
-			sprintf(path, "/sys/class/mic/mic%d/pci_%02x:%02x.%02x/local_cpus", idx, pcibus, pcidev, pcifunc);
-			if (hwloc_linux_read_path_as_cpumask(path, set) < 0
-			    || hwloc_bitmap_iszero(set))
-				hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology));
-			break;
-		}
-	}
-
-	closedir(sysdir);
-#else
-	/* Non-Linux systems simply get a full cpuset */
-	hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology));
-#endif
-	return 0;
-}
-
-/** \brief Get the hwloc OS device object corresponding to the
- * MIC device for the given index.
- *
- * Return the OS device object describing the MIC device whose index is \p idx.
- * Return NULL if there is none.
- *
- * The topology \p topology does not necessarily have to match the current
- * machine. For instance the topology may be an XML import of a remote host.
- * I/O devices detection must be enabled in the topology.
- *
- * \note The corresponding PCI device object can be obtained by looking
- * at the OS device parent object.
- */
-static __hwloc_inline hwloc_obj_t
-hwloc_intel_mic_get_device_osdev_by_index(hwloc_topology_t topology,
-					  unsigned idx)
-{
-	hwloc_obj_t osdev = NULL;
-	while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) {
-		if (HWLOC_OBJ_OSDEV_COPROC == osdev->attr->osdev.type
-                    && osdev->name
-		    && !strncmp("mic", osdev->name, 3)
-		    && atoi(osdev->name + 3) == (int) idx)
-                        return osdev;
-        }
-        return NULL;
-}
-
-/** @} */
-
-
-#ifdef __cplusplus
-} /* extern "C" */
-#endif
-
-
-#endif /* HWLOC_INTEL_MIC_H */
--- a/src/3rdparty/hwloc/include/hwloc/levelzero.h
+++ b/src/3rdparty/hwloc/include/hwloc/levelzero.h
@ -0,0 +1,157 @@
+/*
+ * Copyright © 2021 Inria.  All rights reserved.
+ * See COPYING in top-level directory.
+ */
+
+/** \file
+ * \brief Macros to help interaction between hwloc and the oneAPI Level Zero interface.
+ *
+ * Applications that use both hwloc and Level Zero may want to
+ * include this file so as to get topology information for L0 devices.
+ */
+
+#ifndef HWLOC_LEVELZERO_H
+#define HWLOC_LEVELZERO_H
+
+#include "hwloc.h"
+#include "hwloc/autogen/config.h"
+#include "hwloc/helper.h"
+#ifdef HWLOC_LINUX_SYS
+#include "hwloc/linux.h"
+#endif
+
+#include <level_zero/ze_api.h>
+#include <level_zero/zes_api.h>
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/** \defgroup hwlocality_levelzero Interoperability with the oneAPI Level Zero interface.
+ *
+ * This interface offers ways to retrieve topology information about
+ * devices managed by the Level Zero API.
+ *
+ * @{
+ */
+
+/** \brief Get the CPU set of logical processors that are physically
+ * close to the Level Zero device \p device
+ *
+ * Store in \p set the CPU-set describing the locality of
+ * the Level Zero device \p device.
+ *
+ * Topology \p topology and device \p device must match the local machine.
+ * The Level Zero must have been initialized with Sysman enabled
+ * (ZES_ENABLE_SYSMAN=1 in the environment).
+ * I/O devices detection and the Level Zero component are not needed in the
+ * topology.
+ *
+ * The function only returns the locality of the device.
+ * If more information about the device is needed, OS objects should
+ * be used instead, see hwloc_levelzero_get_device_osdev().
+ *
+ * This function is currently only implemented in a meaningful way for
+ * Linux; other systems will simply get a full cpuset.
+ */
+static __hwloc_inline int
+hwloc_levelzero_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
+                                  ze_device_handle_t device, hwloc_cpuset_t set)
+{
+#ifdef HWLOC_LINUX_SYS
+  /* If we're on Linux, use the sysfs mechanism to get the local cpus */
+#define HWLOC_LEVELZERO_DEVICE_SYSFS_PATH_MAX 128
+  char path[HWLOC_LEVELZERO_DEVICE_SYSFS_PATH_MAX];
+  zes_pci_properties_t pci;
+  zes_device_handle_t sdevice = device;
+  ze_result_t res;
+
+  if (!hwloc_topology_is_thissystem(topology)) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  res = zesDevicePciGetProperties(sdevice, &pci);
+  if (res != ZE_RESULT_SUCCESS) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  sprintf(path, "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/local_cpus",
+          pci.address.domain, pci.address.bus, pci.address.device, pci.address.function);
+  if (hwloc_linux_read_path_as_cpumask(path, set) < 0
+      || hwloc_bitmap_iszero(set))
+    hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology));
+#else
+  /* Non-Linux systems simply get a full cpuset */
+  hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology));
+#endif
+  return 0;
+}
+
+/** \brief Get the hwloc OS device object corresponding to Level Zero device
+ * \p device.
+ *
+ * \return The hwloc OS device object that describes the given Level Zero device \p device.
+ * \return \c NULL if none could be found.
+ *
+ * Topology \p topology and device \p dv_ind must match the local machine.
+ * I/O devices detection and the Level Zero component must be enabled in the
+ * topology. If not, the locality of the object may still be found using
+ * hwloc_levelzero_get_device_cpuset().
+ *
+ * \note The corresponding hwloc PCI device may be found by looking
+ * at the result parent pointer (unless PCI devices are filtered out).
+ */
+static __hwloc_inline hwloc_obj_t
+hwloc_levelzero_get_device_osdev(hwloc_topology_t topology, ze_device_handle_t device)
+{
+  zes_device_handle_t sdevice = device;
+  zes_pci_properties_t pci;
+  ze_result_t res;
+  hwloc_obj_t osdev;
+
+  if (!hwloc_topology_is_thissystem(topology)) {
+    errno = EINVAL;
+    return NULL;
+  }
+
+  res = zesDevicePciGetProperties(sdevice, &pci);
+  if (res != ZE_RESULT_SUCCESS) {
+    /* L0 was likely initialized without sysman, don't bother */
+    errno = EINVAL;
+    return NULL;
+  }
+
+  osdev = NULL;
+  while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) {
+    hwloc_obj_t pcidev = osdev->parent;
+
+    if (strncmp(osdev->name, "ze", 2))
+      continue;
+
+    if (pcidev
+      && pcidev->type == HWLOC_OBJ_PCI_DEVICE
+      && pcidev->attr->pcidev.domain == pci.address.domain
+      && pcidev->attr->pcidev.bus == pci.address.bus
+      && pcidev->attr->pcidev.dev == pci.address.device
+      && pcidev->attr->pcidev.func == pci.address.function)
+      return osdev;
+
+    /* FIXME: when we'll have serialnumber, try it in case PCI is filtered-out */
+  }
+
+  return NULL;
+}
+
+/** @} */
+
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+
+#endif /* HWLOC_LEVELZERO_H */
--- a/src/3rdparty/hwloc/include/hwloc/linux.h
+++ b/src/3rdparty/hwloc/include/hwloc/linux.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2016 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2011 Université Bordeaux
 * See COPYING in top-level directory.
 */
@ -44,6 +44,10 @@ extern "C" {
 HWLOC_DECLSPEC int hwloc_linux_set_tid_cpubind(hwloc_topology_t topology, pid_t tid, hwloc_const_cpuset_t set);

 /** \brief Get the current binding of thread \p tid
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the list of PUs which the thread
+ * was last bound to.
 *
 * The behavior is exactly the same as the Linux sched_getaffinity system call,
 * but uses a hwloc cpuset.
@ -54,6 +58,9 @@ HWLOC_DECLSPEC int hwloc_linux_set_tid_cpubind(hwloc_topology_t topology, pid_t
 HWLOC_DECLSPEC int hwloc_linux_get_tid_cpubind(hwloc_topology_t topology, pid_t tid, hwloc_cpuset_t set);

 /** \brief Get the last physical CPU where thread \p tid ran.
+ *
+ * The CPU-set \p set (previously allocated by the caller)
+ * is filled with the PU which the thread last ran on.
 *
 * \note This is equivalent to calling hwloc_get_proc_last_cpu_location() with
 * ::HWLOC_CPUBIND_THREAD as flags.
--- a/src/3rdparty/hwloc/include/hwloc/memattrs.h
+++ b/src/3rdparty/hwloc/include/hwloc/memattrs.h
@ -1,5 +1,5 @@
 /*
- * Copyright © 2019-2020 Inria.  All rights reserved.
+ * Copyright © 2019-2022 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@ -54,6 +54,8 @@ extern "C" {
 * Attribute values for these nodes, if any, may then be obtained with
 * hwloc_memattr_get_value() and manually compared with the desired criteria.
 *
+ * \sa An example is available in doc/examples/memory-attributes.c in the source tree.
+ *
 * \note The API also supports specific objects as initiator,
 * but it is currently not used internally by hwloc.
 * Users may for instance use it to provide custom performance
@ -65,19 +67,19 @@ extern "C" {

 /** \brief Memory node attributes. */
 enum hwloc_memattr_id_e {
-  /** \brief "Capacity".
-   * The capacity is returned in bytes
-   * (local_memory attribute in objects).
+  /** \brief
+   * The \"Capacity\" is returned in bytes (local_memory attribute in objects).
   *
   * Best capacity nodes are nodes with <b>higher capacity</b>.
   *
   * No initiator is involved when looking at this attribute.
   * The corresponding attribute flags are ::HWLOC_MEMATTR_FLAG_HIGHER_FIRST.
+   * \hideinitializer
   */
  HWLOC_MEMATTR_ID_CAPACITY = 0,

-  /** \brief "Locality".
-   * The locality is returned as the number of PUs in that locality
+  /** \brief
+   * The \"Locality\" is returned as the number of PUs in that locality
   * (e.g. the weight of its cpuset).
   *
   * Best locality nodes are nodes with <b>smaller locality</b>
@ -87,26 +89,87 @@ enum hwloc_memattr_id_e {
   *
   * No initiator is involved when looking at this attribute.
   * The corresponding attribute flags are ::HWLOC_MEMATTR_FLAG_HIGHER_FIRST.
+   * \hideinitializer
   */
  HWLOC_MEMATTR_ID_LOCALITY = 1,

-  /** \brief "Bandwidth".
-   * The bandwidth is returned in MiB/s, as seen from the given initiator location.
+  /** \brief
+   * The \"Bandwidth\" is returned in MiB/s, as seen from the given initiator location.
+   *
   * Best bandwidth nodes are nodes with <b>higher bandwidth</b>.
+   *
   * The corresponding attribute flags are ::HWLOC_MEMATTR_FLAG_HIGHER_FIRST
   * and ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR.
+   *
+   * This is the average bandwidth for read and write accesses. If the platform
+   * provides individual read and write bandwidths but no explicit average value,
+   * hwloc computes and returns the average.
+   * \hideinitializer
   */
  HWLOC_MEMATTR_ID_BANDWIDTH = 2,

-  /** \brief "Latency".
-   * The latency is returned as nanoseconds, as seen from the given initiator location.
+  /** \brief
+   * The \"ReadBandwidth\" is returned in MiB/s, as seen from the given initiator location.
+   *
+   * Best bandwidth nodes are nodes with <b>higher bandwidth</b>.
+   *
+   * The corresponding attribute flags are ::HWLOC_MEMATTR_FLAG_HIGHER_FIRST
+   * and ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR.
+   * \hideinitializer
+   */
+  HWLOC_MEMATTR_ID_READ_BANDWIDTH = 4,
+
+  /** \brief
+   * The \"WriteBandwidth\" is returned in MiB/s, as seen from the given initiator location.
+   *
+   * Best bandwidth nodes are nodes with <b>higher bandwidth</b>.
+   *
+   * The corresponding attribute flags are ::HWLOC_MEMATTR_FLAG_HIGHER_FIRST
+   * and ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR.
+   * \hideinitializer
+   */
+  HWLOC_MEMATTR_ID_WRITE_BANDWIDTH = 5,
+
+  /** \brief
+   * The \"Latency\" is returned as nanoseconds, as seen from the given initiator location.
+   *
   * Best latency nodes are nodes with <b>smaller latency</b>.
+   *
   * The corresponding attribute flags are ::HWLOC_MEMATTR_FLAG_LOWER_FIRST
   * and ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR.
+   *
+   * This is the average latency for read and write accesses. If the platform
+   * provides individual read and write latencies but no explicit average value,
+   * hwloc computes and returns the average.
+   * \hideinitializer
   */
-  HWLOC_MEMATTR_ID_LATENCY = 3
+  HWLOC_MEMATTR_ID_LATENCY = 3,

-  /* TODO read vs write, persistence? */
+  /** \brief
+   * The \"ReadLatency\" is returned as nanoseconds, as seen from the given initiator location.
+   *
+   * Best latency nodes are nodes with <b>smaller latency</b>.
+   *
+   * The corresponding attribute flags are ::HWLOC_MEMATTR_FLAG_LOWER_FIRST
+   * and ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR.
+   * \hideinitializer
+   */
+  HWLOC_MEMATTR_ID_READ_LATENCY = 6,
+
+  /** \brief
+   * The \"WriteLatency\" is returned as nanoseconds, as seen from the given initiator location.
+   *
+   * Best latency nodes are nodes with <b>smaller latency</b>.
+   *
+   * The corresponding attribute flags are ::HWLOC_MEMATTR_FLAG_LOWER_FIRST
+   * and ::HWLOC_MEMATTR_FLAG_NEED_INITIATOR.
+   * \hideinitializer
+   */
+  HWLOC_MEMATTR_ID_WRITE_LATENCY = 7,
+
+  /* TODO persistence? */
+
+  HWLOC_MEMATTR_ID_MAX /**< \private Sentinel value */
 };

 /** \brief A memory attribute identifier.
@ -354,7 +417,7 @@ hwloc_memattr_register(hwloc_topology_t topology,
 * \p flags must be \c 0 for now.
 *
 * \note The initiator \p initiator should be of type ::HWLOC_LOCATION_TYPE_CPUSET
- * when refering to accesses performed by CPU cores.
+ * when referring to accesses performed by CPU cores.
 * ::HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc,
 * but users may for instance use it to provide custom information about
 * host memory accesses performed by GPUs.
@ -398,7 +461,7 @@ hwloc_memattr_set_value(hwloc_topology_t topology,
 * values.
 *
 * \note The initiator \p initiator should be of type ::HWLOC_LOCATION_TYPE_CPUSET
- * when refering to accesses performed by CPU cores.
+ * when referring to accesses performed by CPU cores.
 * ::HWLOC_LOCATION_TYPE_OBJECT is currently unused internally by hwloc,
 * but users may for instance use it to provide custom information about
 * host memory accesses performed by GPUs.
@ -408,7 +471,7 @@ hwloc_memattr_get_targets(hwloc_topology_t topology,
                          hwloc_memattr_id_t attribute,
                          struct hwloc_location *initiator,
                          unsigned long flags,
-                          unsigned *nrp, hwloc_obj_t *targets, hwloc_uint64_t *values);
+                          unsigned *nr, hwloc_obj_t *targets, hwloc_uint64_t *values);

 /** \brief Return the initiators that have values for a given attribute for a specific target NUMA node.
 *
--- a/src/3rdparty/hwloc/include/hwloc/nvml.h
+++ b/src/3rdparty/hwloc/include/hwloc/nvml.h
@ -1,5 +1,5 @@
 /*
- * Copyright © 2012-2020 Inria.  All rights reserved.
+ * Copyright © 2012-2021 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@ -39,7 +39,7 @@ extern "C" {
 /** \brief Get the CPU set of processors that are physically
 * close to NVML device \p device.
 *
- * Return the CPU set describing the locality of the NVML device \p device.
+ * Store in \p set the CPU-set describing the locality of the NVML device \p device.
 *
 * Topology \p topology and device \p device must match the local machine.
 * I/O devices detection and the NVML component are not needed in the topology.
@ -88,8 +88,8 @@ hwloc_nvml_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
 /** \brief Get the hwloc OS device object corresponding to the
 * NVML device whose index is \p idx.
 *
- * Return the OS device object describing the NVML device whose
- * index is \p idx. Returns NULL if there is none.
+ * \return The hwloc OS device object describing the NVML device whose index is \p idx.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
@ -114,8 +114,8 @@ hwloc_nvml_get_device_osdev_by_index(hwloc_topology_t topology, unsigned idx)

 /** \brief Get the hwloc OS device object corresponding to NVML device \p device.
 *
- * Return the hwloc OS device object that describes the given
- * NVML device \p device. Return NULL if there is none.
+ * \return The hwloc OS device object that describes the given NVML device \p device.
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p device must match the local machine.
 * I/O devices detection and the NVML component must be enabled in the topology.
--- a/src/3rdparty/hwloc/include/hwloc/opencl.h
+++ b/src/3rdparty/hwloc/include/hwloc/opencl.h
@ -113,7 +113,7 @@ hwloc_opencl_get_device_pci_busid(cl_device_id device,
 /** \brief Get the CPU set of processors that are physically
 * close to OpenCL device \p device.
 *
- * Return the CPU set describing the locality of the OpenCL device \p device.
+ * Store in \p set the CPU-set describing the locality of the OpenCL device \p device.
 *
 * Topology \p topology and device \p device must match the local machine.
 * I/O devices detection and the OpenCL component are not needed in the topology.
@ -162,10 +162,10 @@ hwloc_opencl_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unuse
 /** \brief Get the hwloc OS device object corresponding to the
 * OpenCL device for the given indexes.
 *
- * Return the OS device object describing the OpenCL device
+ * \return The hwloc OS device object describing the OpenCL device
 * whose platform index is \p platform_index,
 * and whose device index within this platform if \p device_index.
- * Return NULL if there is none.
+ * \return \c NULL if there is none.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
@ -192,8 +192,9 @@ hwloc_opencl_get_device_osdev_by_index(hwloc_topology_t topology,

 /** \brief Get the hwloc OS device object corresponding to OpenCL device \p deviceX.
 *
- * Use OpenCL device attributes to find the corresponding hwloc OS device object.
- * Return NULL if there is none or if useful attributes are not available.
+ * \return The hwloc OS device object corresponding to the given OpenCL device \p device.
+ * \return \c NULL if none could be found, for instance
+ * if required OpenCL attributes are not available.
 *
 * This function currently only works on AMD and NVIDIA OpenCL devices that support
 * relevant OpenCL extensions. hwloc_opencl_get_device_osdev_by_index()
--- a/src/3rdparty/hwloc/include/hwloc/openfabrics-verbs.h
+++ b/src/3rdparty/hwloc/include/hwloc/openfabrics-verbs.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2010 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -44,7 +44,7 @@ extern "C" {
 /** \brief Get the CPU set of processors that are physically
 * close to device \p ibdev.
 *
- * Return the CPU set describing the locality of the OpenFabrics
+ * Store in \p set the CPU-set describing the locality of the OpenFabrics
 * device \p ibdev (InfiniBand, etc).
 *
 * Topology \p topology and device \p ibdev must match the local machine.
@ -88,10 +88,11 @@ hwloc_ibv_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
 /** \brief Get the hwloc OS device object corresponding to the OpenFabrics
 * device named \p ibname.
 *
- * Return the OS device object describing the OpenFabrics device
+ * \return The hwloc OS device object describing the OpenFabrics device
 * (InfiniBand, Omni-Path, usNIC, etc) whose name is \p ibname
 * (mlx5_0, hfi1_0, usnic_0, qib0, etc).
- * Returns NULL if there is none.
+ * \return \c NULL if none could be found.
+ *
 * The name \p ibname is usually obtained from ibv_get_device_name().
 *
 * The topology \p topology does not necessarily have to match the current
@ -117,8 +118,9 @@ hwloc_ibv_get_device_osdev_by_name(hwloc_topology_t topology,
 /** \brief Get the hwloc OS device object corresponding to the OpenFabrics
 * device \p ibdev.
 *
- * Return the OS device object describing the OpenFabrics device \p ibdev
- * (InfiniBand, etc). Returns NULL if there is none.
+ * \return The hwloc OS device object describing the OpenFabrics
+ * device \p ibdev (InfiniBand, etc).
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p ibdev must match the local machine.
 * I/O devices detection must be enabled in the topology.
--- a/src/3rdparty/hwloc/include/hwloc/plugins.h
+++ b/src/3rdparty/hwloc/include/hwloc/plugins.h
@ -1,5 +1,5 @@
 /*
- * Copyright © 2013-2020 Inria.  All rights reserved.
+ * Copyright © 2013-2022 Inria.  All rights reserved.
 * Copyright © 2016 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
 */
@ -27,6 +27,9 @@ struct hwloc_backend;


 /** \defgroup hwlocality_disc_components Components and Plugins: Discovery components
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

@ -93,6 +96,9 @@ struct hwloc_disc_component {


 /** \defgroup hwlocality_disc_backends Components and Plugins: Discovery backends
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

@ -241,6 +247,9 @@ HWLOC_DECLSPEC int hwloc_backend_enable(struct hwloc_backend *backend);


 /** \defgroup hwlocality_generic_components Components and Plugins: Generic components
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

@ -310,12 +319,34 @@ struct hwloc_component {


 /** \defgroup hwlocality_components_core_funcs Components and Plugins: Core functions to be used by components
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

-/** \brief Check whether insertion errors are hidden */
+/** \brief Check whether error messages are hidden.
+ *
+ * Callers should print critical error messages
+ * (e.g. invalid hw topo info, invalid config)
+ * only if this function returns strictly less than 2.
+ *
+ * Callers should print non-critical error messages
+ * (e.g. failure to initialize CUDA)
+ * if this function returns 0.
+ *
+ * This function return 1 by default (show critical only),
+ * 0 in lstopo (show all),
+ * or anything set in HWLOC_HIDE_ERRORS in the environment.
+ *
+ * Use macros HWLOC_SHOW_CRITICAL_ERRORS() and HWLOC_SHOW_ALL_ERRORS()
+ * for clarity.
+ */
 HWLOC_DECLSPEC int hwloc_hide_errors(void);

+#define HWLOC_SHOW_CRITICAL_ERRORS() (hwloc_hide_errors() < 2)
+#define HWLOC_SHOW_ALL_ERRORS() (hwloc_hide_errors() == 0)
+
 /** \brief Add an object to the topology.
 *
 * Insert new object \p obj in the topology starting under existing object \p root
@ -455,6 +486,9 @@ hwloc_plugin_check_namespace(const char *pluginname __hwloc_attribute_unused, co


 /** \defgroup hwlocality_components_filtering Components and Plugins: Filtering objects
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

@ -469,9 +503,12 @@ hwloc_filter_check_pcidev_subtype_important(unsigned classid)
  return (baseclass == 0x03 /* PCI_BASE_CLASS_DISPLAY */
 	  || baseclass == 0x02 /* PCI_BASE_CLASS_NETWORK */
 	  || baseclass == 0x01 /* PCI_BASE_CLASS_STORAGE */
+	  || baseclass == 0x00 /* Unclassified, for Atos/Bull BXI */
 	  || baseclass == 0x0b /* PCI_BASE_CLASS_PROCESSOR */
 	  || classid == 0x0c04 /* PCI_CLASS_SERIAL_FIBER */
 	  || classid == 0x0c06 /* PCI_CLASS_SERIAL_INFINIBAND */
+          || classid == 0x0502 /* PCI_CLASS_MEMORY_CXL */
+          || baseclass == 0x06 /* PCI_BASE_CLASS_BRIDGE with non-PCI downstream. the core will drop the useless ones later */
 	  || baseclass == 0x12 /* Processing Accelerators */);
 }

@ -527,6 +564,9 @@ hwloc_filter_check_keep_object(hwloc_topology_t topology, hwloc_obj_t obj)


 /** \defgroup hwlocality_components_pcidisc Components and Plugins: helpers for PCI discovery
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

@ -578,18 +618,76 @@ HWLOC_DECLSPEC int hwloc_pcidisc_tree_attach(struct hwloc_topology *topology, st


 /** \defgroup hwlocality_components_pcifind Components and Plugins: finding PCI objects during other discoveries
+ *
+ * \note These structures and functions may change when ::HWLOC_COMPONENT_ABI is modified.
+ *
 * @{
 */

-/** \brief Find the normal parent of a PCI bus ID.
+/** \brief Find the object or a parent of a PCI bus ID.
 *
- * Look at PCI affinity to find out where the given PCI bus ID should be attached.
+ * When attaching a new object (typically an OS device) whose locality
+ * is specified by PCI bus ID, this function returns the PCI object
+ * to use as a parent for attaching.
 *
- * This function should be used to attach an I/O device under the corresponding
- * PCI object (if any), or under a normal (non-I/O) object with same locality.
+ * If the exact PCI device with this bus ID exists, it is returned.
+ * Otherwise (for instance if it was filtered out), the function returns
+ * another object with similar locality (for instance a parent bridge,
+ * or the local CPU Package).
 */
 HWLOC_DECLSPEC struct hwloc_obj * hwloc_pci_find_parent_by_busid(struct hwloc_topology *topology, unsigned domain, unsigned bus, unsigned dev, unsigned func);

+/** \brief Find the PCI device or bridge matching a PCI bus ID exactly.
+ *
+ * This is useful for adding specific information about some objects
+ * based on their PCI id. When it comes to attaching objects based on
+ * PCI locality, hwloc_pci_find_parent_by_busid() should be preferred.
+ */
+HWLOC_DECLSPEC struct hwloc_obj * hwloc_pci_find_by_busid(struct hwloc_topology *topology, unsigned domain, unsigned bus, unsigned dev, unsigned func);
+
+/** \brief Handle to a new distances structure during its addition to the topology. */
+typedef void * hwloc_backend_distances_add_handle_t;
+
+/** \brief Create a new empty distances structure.
+ *
+ * This is identical to hwloc_distances_add_create()
+ * but this variant is designed for backend inserting
+ * distances during topology discovery.
+ */
+HWLOC_DECLSPEC hwloc_backend_distances_add_handle_t
+hwloc_backend_distances_add_create(hwloc_topology_t topology,
+                                   const char *name, unsigned long kind,
+                                   unsigned long flags);
+
+/** \brief Specify the objects and values in a new empty distances structure.
+ *
+ * This is similar to hwloc_distances_add_values()
+ * but this variant is designed for backend inserting
+ * distances during topology discovery.
+ *
+ * The only semantical difference is that \p objs and \p values
+ * are not duplicated, but directly attached to the topology.
+ * On success, these arrays are given to the core and should not
+ * ever be freed by the caller anymore.
+ */
+HWLOC_DECLSPEC int
+hwloc_backend_distances_add_values(hwloc_topology_t topology,
+                                   hwloc_backend_distances_add_handle_t handle,
+                                   unsigned nbobjs, hwloc_obj_t *objs,
+                                   hwloc_uint64_t *values,
+                                   unsigned long flags);
+
+/** \brief Commit a new distances structure.
+ *
+ * This is similar to hwloc_distances_add_commit()
+ * but this variant is designed for backend inserting
+ * distances during topology discovery.
+ */
+HWLOC_DECLSPEC int
+hwloc_backend_distances_add_commit(hwloc_topology_t topology,
+                                   hwloc_backend_distances_add_handle_t handle,
+                                   unsigned long flags);
+
 /** @} */


--- a/src/3rdparty/hwloc/include/hwloc/rename.h
+++ b/src/3rdparty/hwloc/include/hwloc/rename.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2022 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@ -120,6 +120,12 @@ extern "C" {
 #define HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM HWLOC_NAME_CAPS(TOPOLOGY_FLAG_IS_THISSYSTEM)
 #define HWLOC_TOPOLOGY_FLAG_THISSYSTEM_ALLOWED_RESOURCES HWLOC_NAME_CAPS(TOPOLOGY_FLAG_THISSYSTEM_ALLOWED_RESOURCES)
 #define HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT HWLOC_NAME_CAPS(TOPOLOGY_FLAG_IMPORT_SUPPORT)
+#define HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING HWLOC_NAME_CAPS(TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING)
+#define HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING HWLOC_NAME_CAPS(TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING)
+#define HWLOC_TOPOLOGY_FLAG_DONT_CHANGE_BINDING HWLOC_NAME_CAPS(TOPOLOGY_FLAG_DONT_CHANGE_BINDING)
+#define HWLOC_TOPOLOGY_FLAG_NO_DISTANCES HWLOC_NAME_CAPS(TOPOLOGY_FLAG_NO_DISTANCES)
+#define HWLOC_TOPOLOGY_FLAG_NO_MEMATTRS HWLOC_NAME_CAPS(TOPOLOGY_FLAG_NO_MEMATTRS)
+#define HWLOC_TOPOLOGY_FLAG_NO_CPUKINDS HWLOC_NAME_CAPS(TOPOLOGY_FLAG_NO_CPUKINDS)

 #define hwloc_topology_set_pid HWLOC_NAME(topology_set_pid)
 #define hwloc_topology_set_synthetic HWLOC_NAME(topology_set_synthetic)
@ -356,6 +362,7 @@ extern "C" {
 #define hwloc_get_closest_objs HWLOC_NAME(get_closest_objs)
 #define hwloc_get_obj_below_by_type HWLOC_NAME(get_obj_below_by_type)
 #define hwloc_get_obj_below_array_by_type HWLOC_NAME(get_obj_below_array_by_type)
+#define hwloc_get_obj_with_same_locality HWLOC_NAME(get_obj_with_same_locality)
 #define hwloc_distrib_flags_e HWLOC_NAME(distrib_flags_e)
 #define HWLOC_DISTRIB_FLAG_REVERSE HWLOC_NAME_CAPS(DISTRIB_FLAG_REVERSE)
 #define hwloc_distrib HWLOC_NAME(distrib)
@ -377,6 +384,11 @@ extern "C" {
 #define HWLOC_MEMATTR_ID_LOCALITY HWLOC_NAME_CAPS(MEMATTR_ID_LOCALITY)
 #define HWLOC_MEMATTR_ID_BANDWIDTH HWLOC_NAME_CAPS(MEMATTR_ID_BANDWIDTH)
 #define HWLOC_MEMATTR_ID_LATENCY HWLOC_NAME_CAPS(MEMATTR_ID_LATENCY)
+#define HWLOC_MEMATTR_ID_READ_BANDWIDTH HWLOC_NAME_CAPS(MEMATTR_ID_READ_BANDWIDTH)
+#define HWLOC_MEMATTR_ID_WRITE_BANDWIDTH HWLOC_NAME_CAPS(MEMATTR_ID_WRITE_BANDWIDTH)
+#define HWLOC_MEMATTR_ID_READ_LATENCY HWLOC_NAME_CAPS(MEMATTR_ID_READ_LATENCY)
+#define HWLOC_MEMATTR_ID_WRITE_LATENCY HWLOC_NAME_CAPS(MEMATTR_ID_WRITE_LATENCY)
+#define HWLOC_MEMATTR_ID_MAX HWLOC_NAME_CAPS(MEMATTR_ID_MAX)

 #define hwloc_memattr_id_t HWLOC_NAME(memattr_id_t)
 #define hwloc_memattr_get_by_name HWLOC_NAME(memattr_get_by_name)
@ -454,11 +466,22 @@ extern "C" {
 #define hwloc_distances_obj_index HWLOC_NAME(distances_obj_index)
 #define hwloc_distances_obj_pair_values HWLOC_NAME(distances_pair_values)

+#define hwloc_distances_transform_e HWLOC_NAME(distances_transform_e)
+#define HWLOC_DISTANCES_TRANSFORM_REMOVE_NULL HWLOC_NAME_CAPS(DISTANCES_TRANSFORM_REMOVE_NULL)
+#define HWLOC_DISTANCES_TRANSFORM_LINKS HWLOC_NAME_CAPS(DISTANCES_TRANSFORM_LINKS)
+#define HWLOC_DISTANCES_TRANSFORM_MERGE_SWITCH_PORTS HWLOC_NAME_CAPS(DISTANCES_TRANSFORM_MERGE_SWITCH_PORTS)
+#define HWLOC_DISTANCES_TRANSFORM_TRANSITIVE_CLOSURE HWLOC_NAME_CAPS(DISTANCES_TRANSFORM_TRANSITIVE_CLOSURE)
+#define hwloc_distances_transform HWLOC_NAME(distances_transform)
+
 #define hwloc_distances_add_flag_e HWLOC_NAME(distances_add_flag_e)
 #define HWLOC_DISTANCES_ADD_FLAG_GROUP HWLOC_NAME_CAPS(DISTANCES_ADD_FLAG_GROUP)
 #define HWLOC_DISTANCES_ADD_FLAG_GROUP_INACCURATE HWLOC_NAME_CAPS(DISTANCES_ADD_FLAG_GROUP_INACCURATE)

-#define hwloc_distances_add HWLOC_NAME(distances_add)
+#define hwloc_distances_add_handle_t HWLOC_NAME(distances_add_handle_t)
+#define hwloc_distances_add_create HWLOC_NAME(distances_add_create)
+#define hwloc_distances_add_values HWLOC_NAME(distances_add_values)
+#define hwloc_distances_add_commit HWLOC_NAME(distances_add_commit)
+
 #define hwloc_distances_remove HWLOC_NAME(distances_remove)
 #define hwloc_distances_remove_by_depth HWLOC_NAME(distances_remove_by_depth)
 #define hwloc_distances_remove_by_type HWLOC_NAME(distances_remove_by_type)
@ -523,6 +546,11 @@ extern "C" {
 #define hwloc_linux_get_tid_last_cpu_location HWLOC_NAME(linux_get_tid_last_cpu_location)
 #define hwloc_linux_read_path_as_cpumask HWLOC_NAME(linux_read_file_cpumask)

+/* windows.h */
+
+#define hwloc_windows_get_nr_processor_groups HWLOC_NAME(windows_get_nr_processor_groups)
+#define hwloc_windows_get_processor_group_cpuset HWLOC_NAME(windows_get_processor_group_cpuset)
+
 /* openfabrics-verbs.h */

 #define hwloc_ibv_get_device_cpuset HWLOC_NAME(ibv_get_device_cpuset)
@ -564,6 +592,11 @@ extern "C" {
 #define hwloc_rsmi_get_device_osdev HWLOC_NAME(rsmi_get_device_osdev)
 #define hwloc_rsmi_get_device_osdev_by_index HWLOC_NAME(rsmi_get_device_osdev_by_index)

+/* levelzero.h */
+
+#define hwloc_levelzero_get_device_cpuset HWLOC_NAME(levelzero_get_device_cpuset)
+#define hwloc_levelzero_get_device_osdev HWLOC_NAME(levelzero_get_device_osdev)
+
 /* gl.h */

 #define hwloc_gl_get_display_osdev_by_port_device HWLOC_NAME(gl_get_display_osdev_by_port_device)
@ -620,10 +653,18 @@ extern "C" {
 #define hwloc_pcidisc_tree_insert_by_busid HWLOC_NAME(pcidisc_tree_insert_by_busid)
 #define hwloc_pcidisc_tree_attach HWLOC_NAME(pcidisc_tree_attach)

+#define hwloc_pci_find_by_busid HWLOC_NAME(pcidisc_find_by_busid)
 #define hwloc_pci_find_parent_by_busid HWLOC_NAME(pcidisc_find_busid_parent)

+#define hwloc_backend_distances_add_handle_t HWLOC_NAME(backend_distances_add_handle_t)
+#define hwloc_backend_distances_add_create HWLOC_NAME(backend_distances_add_create)
+#define hwloc_backend_distances_add_values HWLOC_NAME(backend_distances_add_values)
+#define hwloc_backend_distances_add_commit HWLOC_NAME(backend_distances_add_commit)
+
 /* hwloc/deprecated.h */

+#define hwloc_distances_add HWLOC_NAME(distances_add)
+
 #define hwloc_topology_insert_misc_object_by_parent HWLOC_NAME(topology_insert_misc_object_by_parent)
 #define hwloc_obj_cpuset_snprintf HWLOC_NAME(obj_cpuset_snprintf)
 #define hwloc_obj_type_sscanf HWLOC_NAME(obj_type_sscanf)
@ -733,6 +774,7 @@ extern "C" {

 #define hwloc_cuda_component HWLOC_NAME(cuda_component)
 #define hwloc_gl_component HWLOC_NAME(gl_component)
+#define hwloc_levelzero_component HWLOC_NAME(levelzero_component)
 #define hwloc_nvml_component HWLOC_NAME(nvml_component)
 #define hwloc_rsmi_component HWLOC_NAME(rsmi_component)
 #define hwloc_opencl_component HWLOC_NAME(opencl_component)
@ -772,7 +814,6 @@ extern "C" {
 #define hwloc_pci_discovery_init HWLOC_NAME(pci_discovery_init)
 #define hwloc_pci_discovery_prepare HWLOC_NAME(pci_discovery_prepare)
 #define hwloc_pci_discovery_exit HWLOC_NAME(pci_discovery_exit)
-#define hwloc_pci_find_by_busid HWLOC_NAME(pcidisc_find_by_busid)
 #define hwloc_find_insert_io_parent_by_complete_cpuset HWLOC_NAME(hwloc_find_insert_io_parent_by_complete_cpuset)

 #define hwloc__add_info HWLOC_NAME(_add_info)
@ -816,7 +857,6 @@ extern "C" {
 #define hwloc_internal_distances_dup HWLOC_NAME(internal_distances_dup)
 #define hwloc_internal_distances_refresh HWLOC_NAME(internal_distances_refresh)
 #define hwloc_internal_distances_destroy HWLOC_NAME(internal_distances_destroy)
-
 #define hwloc_internal_distances_add HWLOC_NAME(internal_distances_add)
 #define hwloc_internal_distances_add_by_index HWLOC_NAME(internal_distances_add_by_index)
 #define hwloc_internal_distances_invalidate_cached_objs HWLOC_NAME(hwloc_internal_distances_invalidate_cached_objs)
@ -830,6 +870,7 @@ extern "C" {
 #define hwloc_internal_memattrs_destroy HWLOC_NAME(internal_memattrs_destroy)
 #define hwloc_internal_memattrs_need_refresh HWLOC_NAME(internal_memattrs_need_refresh)
 #define hwloc_internal_memattrs_refresh HWLOC_NAME(internal_memattrs_refresh)
+#define hwloc_internal_memattrs_guess_memory_tiers HWLOC_NAME(internal_memattrs_guess_memory_tiers)

 #define hwloc_internal_cpukind_s HWLOC_NAME(internal_cpukind_s)
 #define hwloc_internal_cpukinds_init HWLOC_NAME(internal_cpukinds_init)
--- a/src/3rdparty/hwloc/include/hwloc/rsmi.h
+++ b/src/3rdparty/hwloc/include/hwloc/rsmi.h
@ -1,5 +1,5 @@
 /*
- * Copyright © 2012-2020 Inria.  All rights reserved.
+ * Copyright © 2012-2021 Inria.  All rights reserved.
 * Copyright (c) 2020, Advanced Micro Devices, Inc. All rights reserved.
 * Written by Advanced Micro Devices,
 * See COPYING in top-level directory.
@ -41,7 +41,7 @@ extern "C" {
 /** \brief Get the CPU set of logical processors that are physically
 * close to AMD GPU device whose index is \p dv_ind.
 *
- * Return the CPU set describing the locality of the AMD GPU device
+ * Store in \p set the CPU-set describing the locality of the AMD GPU device
 * whose index is \p dv_ind.
 *
 * Topology \p topology and device \p dv_ind must match the local machine.
@ -96,8 +96,9 @@ hwloc_rsmi_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused,
 /** \brief Get the hwloc OS device object corresponding to the
 * AMD GPU device whose index is \p dv_ind.
 *
- * Return the OS device object describing the AMD GPU device whose
- * index is \p dv_ind. Returns NULL if there is none.
+ * \return The hwloc OS device object describing the AMD GPU device whose
+ * index is \p dv_ind.
+ * \return \c NULL if none could be found.
 *
 * The topology \p topology does not necessarily have to match the current
 * machine. For instance the topology may be an XML import of a remote host.
@ -124,8 +125,9 @@ hwloc_rsmi_get_device_osdev_by_index(hwloc_topology_t topology, uint32_t dv_ind)
 /** \brief Get the hwloc OS device object corresponding to AMD GPU device,
 * whose index is \p dv_ind.
 *
- * Return the hwloc OS device object that describes the given
- * AMD GPU, whose index is \p dv_ind Return NULL if there is none.
+ * \return The hwloc OS device object that describes the given
+ * AMD GPU, whose index is \p dv_ind.
+ * \return \c NULL if none could be found.
 *
 * Topology \p topology and device \p dv_ind must match the local machine.
 * I/O devices detection and the ROCm SMI component must be enabled in the
--- a/src/3rdparty/hwloc/include/hwloc/windows.h
+++ b/src/3rdparty/hwloc/include/hwloc/windows.h
@ -0,0 +1,76 @@
+/*
+ * Copyright © 2021 Inria.  All rights reserved.
+ * See COPYING in top-level directory.
+ */
+
+/** \file
+ * \brief Macros to help interaction between hwloc and Windows.
+ *
+ * Applications that use hwloc on Windows may want to include this file
+ * for Windows specific hwloc features.
+ */
+
+#ifndef HWLOC_WINDOWS_H
+#define HWLOC_WINDOWS_H
+
+#include "hwloc.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/** \defgroup hwlocality_windows Windows-specific helpers
+ *
+ * These functions query Windows processor groups.
+ * These groups partition the operating system into virtual sets
+ * of up to 64 neighbor PUs.
+ * Threads and processes may only be bound inside a single group.
+ * Although Windows processor groups may be exposed in the hwloc
+ * hierarchy as hwloc Groups, they are also often merged into
+ * existing hwloc objects such as NUMA nodes or Packages.
+ * This API provides explicit information about Windows processor
+ * groups so that applications know whether binding to a large
+ * set of PUs may fail because it spans over multiple Windows
+ * processor groups.
+ *
+ * @{
+ */
+
+
+/** \brief Get the number of Windows processor groups
+ *
+ * \p flags must be 0 for now.
+ *
+ * \return at least \c 1 on success.
+ * \return -1 on error, for instance if the topology does not match
+ * the current system (e.g. loaded from another machine through XML).
+ */
+HWLOC_DECLSPEC int hwloc_windows_get_nr_processor_groups(hwloc_topology_t topology, unsigned long flags);
+
+/** \brief Get the CPU-set of a Windows processor group.
+ *
+ * Get the set of PU included in the processor group specified
+ * by \p pg_index.
+ * \p pg_index must be between \c 0 and the value returned
+ * by hwloc_windows_get_nr_processor_groups() minus 1.
+ *
+ * \p flags must be 0 for now.
+ *
+ * \return \c 0 on success.
+ * \return \c -1 on error, for instance if \p pg_index is invalid,
+ * or if the topology does not match the current system (e.g. loaded
+ * from another machine through XML).
+ */
+HWLOC_DECLSPEC int hwloc_windows_get_processor_group_cpuset(hwloc_topology_t topology, unsigned pg_index, hwloc_cpuset_t cpuset, unsigned long flags);
+
+/** @} */
+
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+
+#endif /* HWLOC_WINDOWS_H */
--- a/src/3rdparty/hwloc/include/private/autogen/config.h
+++ b/src/3rdparty/hwloc/include/private/autogen/config.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009, 2011, 2012 CNRS.  All rights reserved.
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009, 2011, 2012, 2015 Université Bordeaux.  All rights reserved.
 * Copyright © 2009-2020 Cisco Systems, Inc.  All rights reserved.
 * $COPYRIGHT$
@ -290,10 +290,6 @@
 /* Define to '1' if sysctlbyname is present and usable */
 /* #undef HAVE_SYSCTLBYNAME */

-/* Define to 1 if the system has the type
-   `SYSTEM_LOGICAL_PROCESSOR_INFORMATION'. */
-#define HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION 1
-
 /* Define to 1 if the system has the type
   `SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX'. */
 #define HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX 1
--- a/src/3rdparty/hwloc/include/private/internal-components.h
+++ b/src/3rdparty/hwloc/include/private/internal-components.h
@ -1,5 +1,5 @@
 /*
- * Copyright © 2018-2019 Inria.  All rights reserved.
+ * Copyright © 2018-2020 Inria.  All rights reserved.
 *
 * See COPYING in top-level directory.
 */
@ -31,6 +31,7 @@ HWLOC_DECLSPEC extern const struct hwloc_component hwloc_cuda_component;
 HWLOC_DECLSPEC extern const struct hwloc_component hwloc_gl_component;
 HWLOC_DECLSPEC extern const struct hwloc_component hwloc_nvml_component;
 HWLOC_DECLSPEC extern const struct hwloc_component hwloc_rsmi_component;
+HWLOC_DECLSPEC extern const struct hwloc_component hwloc_levelzero_component;
 HWLOC_DECLSPEC extern const struct hwloc_component hwloc_opencl_component;
 HWLOC_DECLSPEC extern const struct hwloc_component hwloc_pci_component;

--- a/src/3rdparty/hwloc/include/private/misc.h
+++ b/src/3rdparty/hwloc/include/private/misc.h
@ -504,7 +504,7 @@ hwloc__obj_type_is_icache(hwloc_obj_type_t type)
  }                                    \
 } while(0)
 #else /* HAVE_USELOCALE */
-#if __HWLOC_HAVE_ATTRIBUTE_UNUSED
+#if HWLOC_HAVE_ATTRIBUTE_UNUSED
 #define hwloc_localeswitch_declare int __dummy_nolocale __hwloc_attribute_unused
 #define hwloc_localeswitch_init()
 #else
--- a/src/3rdparty/hwloc/include/private/private.h
+++ b/src/3rdparty/hwloc/include/private/private.h
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009      CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2009-2012, 2020 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 *
@ -166,6 +166,7 @@ struct hwloc_topology {
    unsigned long kind;

 #define HWLOC_INTERNAL_DIST_FLAG_OBJS_VALID (1U<<0) /* if the objs array is valid below */
+#define HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED (1U<<1) /* if the distances isn't in the list yet */
    unsigned iflags;

    /* objects are currently stored in physical_index order */
@ -258,6 +259,7 @@ struct hwloc_topology {
    unsigned bus_first, bus_last;
    hwloc_bitmap_t cpuset;
  } * pci_forced_locality;
+  hwloc_uint64_t pci_locality_quirks;

  /* component blacklisting */
  unsigned nr_blacklisted_components;
@ -304,11 +306,6 @@ extern void hwloc_pci_discovery_init(struct hwloc_topology *topology);
 extern void hwloc_pci_discovery_prepare(struct hwloc_topology *topology);
 extern void hwloc_pci_discovery_exit(struct hwloc_topology *topology);

-/* Look for an object matching the given domain/bus/func,
- * either exactly or return the smallest container bridge
- */
-extern struct hwloc_obj * hwloc_pci_find_by_busid(struct hwloc_topology *topology, unsigned domain, unsigned bus, unsigned dev, unsigned func);
-
 /* Look for an object matching complete cpuset exactly, or insert one.
 * Return NULL on failure.
 * Return a good fallback (object above) on failure to insert.
@ -408,10 +405,14 @@ extern void hwloc_internal_distances_prepare(hwloc_topology_t topology);
 extern void hwloc_internal_distances_destroy(hwloc_topology_t topology);
 extern int hwloc_internal_distances_dup(hwloc_topology_t new, hwloc_topology_t old);
 extern void hwloc_internal_distances_refresh(hwloc_topology_t topology);
-extern int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name, unsigned nbobjs, hwloc_obj_t *objs, uint64_t *values, unsigned long kind, unsigned long flags);
-extern int hwloc_internal_distances_add_by_index(hwloc_topology_t topology, const char *name, hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types, unsigned nbobjs, uint64_t *indexes, uint64_t *values, unsigned long kind, unsigned long flags);
 extern void hwloc_internal_distances_invalidate_cached_objs(hwloc_topology_t topology);

+/* these distances_add() functions are higher-level than those in hwloc/plugins.h
+ * but they may change in the future, hence they are not exported to plugins.
+ */
+extern int hwloc_internal_distances_add_by_index(hwloc_topology_t topology, const char *name, hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types, unsigned nbobjs, uint64_t *indexes, uint64_t *values, unsigned long kind, unsigned long flags);
+extern int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name, unsigned nbobjs, hwloc_obj_t *objs, uint64_t *values, unsigned long kind, unsigned long flags);
+
 extern void hwloc_internal_memattrs_init(hwloc_topology_t topology);
 extern void hwloc_internal_memattrs_prepare(hwloc_topology_t topology);
 extern void hwloc_internal_memattrs_destroy(hwloc_topology_t topology);
@ -419,6 +420,7 @@ extern void hwloc_internal_memattrs_need_refresh(hwloc_topology_t topology);
 extern void hwloc_internal_memattrs_refresh(hwloc_topology_t topology);
 extern int hwloc_internal_memattrs_dup(hwloc_topology_t new, hwloc_topology_t old);
 extern int hwloc_internal_memattr_set_value(hwloc_topology_t topology, hwloc_memattr_id_t id, hwloc_obj_type_t target_type, hwloc_uint64_t target_gp_index, unsigned target_os_index, struct hwloc_internal_location_s *initiator, hwloc_uint64_t value);
+extern int hwloc_internal_memattrs_guess_memory_tiers(hwloc_topology_t topology);

 extern void hwloc_internal_cpukinds_init(hwloc_topology_t topology);
 extern int hwloc_internal_cpukinds_rank(hwloc_topology_t topology);
@ -480,6 +482,7 @@ extern char * hwloc_progname(struct hwloc_topology *topology);
 #define HWLOC_GROUP_KIND_AIX_SDL_UNKNOWN		210	/* subkind is SDL level */
 #define HWLOC_GROUP_KIND_WINDOWS_PROCESSOR_GROUP	220	/* no subkind */
 #define HWLOC_GROUP_KIND_WINDOWS_RELATIONSHIP_UNKNOWN	221	/* no subkind */
+#define HWLOC_GROUP_KIND_LINUX_CLUSTER                  222     /* no subkind */
 /* distance groups */
 #define HWLOC_GROUP_KIND_DISTANCE			900	/* subkind is round of adding these groups during distance based grouping */
 /* finally, hwloc-specific groups required to insert something else, should disappear as soon as possible */
--- a/src/3rdparty/hwloc/include/private/windows.h
+++ b/src/3rdparty/hwloc/include/private/windows.h
@ -0,0 +1,30 @@
+/*
+ * Copyright © 2009 Université Bordeaux
+ * Copyright © 2020-2022 Inria.  All rights reserved.
+ *
+ * See COPYING in top-level directory.
+ */
+
+#ifndef HWLOC_PRIVATE_WINDOWS_H
+#define HWLOC_PRIVATE_WINDOWS_H
+
+#ifndef _ANONYMOUS_UNION
+#ifdef __GNUC__
+#define _ANONYMOUS_UNION __extension__
+#else
+#define _ANONYMOUS_UNION
+#endif /* __GNUC__ */
+#endif /* _ANONYMOUS_UNION */
+
+#ifndef _ANONYMOUS_STRUCT
+#ifdef __GNUC__
+#define _ANONYMOUS_STRUCT __extension__
+#else
+#define _ANONYMOUS_STRUCT
+#endif /* __GNUC__ */
+#endif /* _ANONYMOUS_STRUCT */
+
+#define DUMMYUNIONNAME
+#define DUMMYSTRUCTNAME
+
+#endif /* HWLOC_PRIVATE_WINDOWS_H */
--- a/src/3rdparty/hwloc/src/components.c
+++ b/src/3rdparty/hwloc/src/components.c
@ -1,5 +1,5 @@
 /*
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2012 Université Bordeaux
 * See COPYING in top-level directory.
 */
@ -124,7 +124,7 @@ hwloc_dlforeachfile(const char *_paths,
      *colon = '\0';

    if (hwloc_plugins_verbose)
-      fprintf(stderr, " Looking under %s\n", path);
+      fprintf(stderr, "hwloc:  Looking under %s\n", path);

    dir = opendir(path);
    if (!dir)
@ -198,7 +198,7 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)
  char *componentsymbolname;

  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Plugin dlforeach found `%s'\n", filename);
+    fprintf(stderr, "hwloc: Plugin dlforeach found `%s'\n", filename);

  basename = strrchr(filename, '/');
  if (!basename)
@ -208,7 +208,7 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)

  if (hwloc_plugins_blacklist && strstr(hwloc_plugins_blacklist, basename)) {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Plugin `%s' is blacklisted in the environment\n", basename);
+      fprintf(stderr, "hwloc: Plugin `%s' is blacklisted in the environment\n", basename);
    goto out;
  }

@ -216,14 +216,14 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)
  handle = hwloc_dlopenext(filename);
  if (!handle) {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Failed to load plugin: %s\n", hwloc_dlerror());
+      fprintf(stderr, "hwloc: Failed to load plugin: %s\n", hwloc_dlerror());
    goto out;
  }

  componentsymbolname = malloc(strlen(basename)+10+1);
  if (!componentsymbolname) {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Failed to allocation component `%s' symbol\n",
+      fprintf(stderr, "hwloc: Failed to allocation component `%s' symbol\n",
 	      basename);
    goto out_with_handle;
  }
@ -231,38 +231,38 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)
  component = hwloc_dlsym(handle, componentsymbolname);
  if (!component) {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Failed to find component symbol `%s'\n",
+      fprintf(stderr, "hwloc: Failed to find component symbol `%s'\n",
 	      componentsymbolname);
    free(componentsymbolname);
    goto out_with_handle;
  }
  if (component->abi != HWLOC_COMPONENT_ABI) {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Plugin symbol ABI %u instead of %d\n",
+      fprintf(stderr, "hwloc: Plugin symbol ABI %u instead of %d\n",
 	      component->abi, HWLOC_COMPONENT_ABI);
    free(componentsymbolname);
    goto out_with_handle;
  }
  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Plugin contains expected symbol `%s'\n",
+    fprintf(stderr, "hwloc: Plugin contains expected symbol `%s'\n",
 	    componentsymbolname);
  free(componentsymbolname);

  if (HWLOC_COMPONENT_TYPE_DISC == component->type) {
    if (strncmp(basename, "hwloc_", 6)) {
      if (hwloc_plugins_verbose)
-	fprintf(stderr, "Plugin name `%s' doesn't match its type DISCOVERY\n", basename);
+	fprintf(stderr, "hwloc: Plugin name `%s' doesn't match its type DISCOVERY\n", basename);
      goto out_with_handle;
    }
  } else if (HWLOC_COMPONENT_TYPE_XML == component->type) {
    if (strncmp(basename, "hwloc_xml_", 10)) {
      if (hwloc_plugins_verbose)
-	fprintf(stderr, "Plugin name `%s' doesn't match its type XML\n", basename);
+	fprintf(stderr, "hwloc: Plugin name `%s' doesn't match its type XML\n", basename);
      goto out_with_handle;
    }
  } else {
    if (hwloc_plugins_verbose)
-      fprintf(stderr, "Plugin name `%s' has invalid type %u\n",
+      fprintf(stderr, "hwloc: Plugin name `%s' has invalid type %u\n",
 	      basename, (unsigned) component->type);
    goto out_with_handle;
  }
@ -277,7 +277,7 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)
  desc->handle = handle;
  desc->next = NULL;
  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Plugin descriptor `%s' ready\n", basename);
+    fprintf(stderr, "hwloc: Plugin descriptor `%s' ready\n", basename);

  /* append to the list */
  prevdesc = &hwloc_plugins;
@ -285,7 +285,7 @@ hwloc__dlforeach_cb(const char *filename, void *_data __hwloc_attribute_unused)
    prevdesc = &((*prevdesc)->next);
  *prevdesc = desc;
  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Plugin descriptor `%s' queued\n", basename);
+    fprintf(stderr, "hwloc: Plugin descriptor `%s' queued\n", basename);
  return 0;

 out_with_handle:
@ -300,7 +300,7 @@ hwloc_plugins_exit(void)
  struct hwloc__plugin_desc *desc, *next;

  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Closing all plugins\n");
+    fprintf(stderr, "hwloc: Closing all plugins\n");

  desc = hwloc_plugins;
  while (desc) {
@ -340,7 +340,7 @@ hwloc_plugins_init(void)
  hwloc_plugins = NULL;

  if (hwloc_plugins_verbose)
-    fprintf(stderr, "Starting plugin dlforeach in %s\n", path);
+    fprintf(stderr, "hwloc: Starting plugin dlforeach in %s\n", path);
  err = hwloc_dlforeachfile(path, hwloc__dlforeach_cb, NULL);
  if (err)
    goto out_with_init;
@ -364,14 +364,14 @@ hwloc_disc_component_register(struct hwloc_disc_component *component,
  /* check that the component name is valid */
  if (!strcmp(component->name, HWLOC_COMPONENT_STOP_NAME)) {
    if (hwloc_components_verbose)
-      fprintf(stderr, "Cannot register discovery component with reserved name `" HWLOC_COMPONENT_STOP_NAME "'\n");
+      fprintf(stderr, "hwloc: Cannot register discovery component with reserved name `" HWLOC_COMPONENT_STOP_NAME "'\n");
    return -1;
  }
  if (strchr(component->name, HWLOC_COMPONENT_EXCLUDE_CHAR)
      || strchr(component->name, HWLOC_COMPONENT_PHASESEP_CHAR)
      || strcspn(component->name, HWLOC_COMPONENT_SEPS) != strlen(component->name)) {
    if (hwloc_components_verbose)
-      fprintf(stderr, "Cannot register discovery component with name `%s' containing reserved characters `%c" HWLOC_COMPONENT_SEPS "'\n",
+      fprintf(stderr, "hwloc: Cannot register discovery component with name `%s' containing reserved characters `%c" HWLOC_COMPONENT_SEPS "'\n",
 	      component->name, HWLOC_COMPONENT_EXCLUDE_CHAR);
    return -1;
  }
@ -386,8 +386,9 @@ hwloc_disc_component_register(struct hwloc_disc_component *component,
 				   |HWLOC_DISC_PHASE_MISC
 				   |HWLOC_DISC_PHASE_ANNOTATE
 				   |HWLOC_DISC_PHASE_TWEAK))) {
-    fprintf(stderr, "Cannot register discovery component `%s' with invalid phases 0x%x\n",
-	    component->name, component->phases);
+    if (HWLOC_SHOW_CRITICAL_ERRORS())
+      fprintf(stderr, "hwloc: Cannot register discovery component `%s' with invalid phases 0x%x\n",
+              component->name, component->phases);
    return -1;
  }

@ -398,13 +399,13 @@ hwloc_disc_component_register(struct hwloc_disc_component *component,
      if ((*prev)->priority < component->priority) {
 	/* drop the existing component */
 	if (hwloc_components_verbose)
-	  fprintf(stderr, "Dropping previously registered discovery component `%s', priority %u lower than new one %u\n",
+	  fprintf(stderr, "hwloc: Dropping previously registered discovery component `%s', priority %u lower than new one %u\n",
 		  (*prev)->name, (*prev)->priority, component->priority);
 	*prev = (*prev)->next;
      } else {
 	/* drop the new one */
 	if (hwloc_components_verbose)
-	  fprintf(stderr, "Ignoring new discovery component `%s', priority %u lower than previously registered one %u\n",
+	  fprintf(stderr, "hwloc: Ignoring new discovery component `%s', priority %u lower than previously registered one %u\n",
 		  component->name, component->priority, (*prev)->priority);
 	return -1;
      }
@ -412,7 +413,7 @@ hwloc_disc_component_register(struct hwloc_disc_component *component,
    prev = &((*prev)->next);
  }
  if (hwloc_components_verbose)
-    fprintf(stderr, "Registered discovery component `%s' phases 0x%x with priority %u (%s%s)\n",
+    fprintf(stderr, "hwloc: Registered discovery component `%s' phases 0x%x with priority %u (%s%s)\n",
 	    component->name, component->phases, component->priority,
 	    filename ? "from plugin " : "statically build", filename ? filename : "");

@ -475,15 +476,16 @@ hwloc_components_init(void)
  /* hwloc_static_components is created by configure in static-components.h */
  for(i=0; NULL != hwloc_static_components[i]; i++) {
    if (hwloc_static_components[i]->flags) {
-      fprintf(stderr, "Ignoring static component with invalid flags %lx\n",
-	      hwloc_static_components[i]->flags);
+      if (HWLOC_SHOW_CRITICAL_ERRORS())
+        fprintf(stderr, "hwloc: Ignoring static component with invalid flags %lx\n",
+                hwloc_static_components[i]->flags);
      continue;
    }

    /* initialize the component */
    if (hwloc_static_components[i]->init && hwloc_static_components[i]->init(0) < 0) {
      if (hwloc_components_verbose)
-	fprintf(stderr, "Ignoring static component, failed to initialize\n");
+	fprintf(stderr, "hwloc: Ignoring static component, failed to initialize\n");
      continue;
    }
    /* queue ->finalize() callback if any */
@ -503,15 +505,16 @@ hwloc_components_init(void)
 #ifdef HWLOC_HAVE_PLUGINS
  for(desc = hwloc_plugins; NULL != desc; desc = desc->next) {
    if (desc->component->flags) {
-      fprintf(stderr, "Ignoring plugin `%s' component with invalid flags %lx\n",
-	      desc->name, desc->component->flags);
+      if (HWLOC_SHOW_CRITICAL_ERRORS())
+        fprintf(stderr, "hwloc: Ignoring plugin `%s' component with invalid flags %lx\n",
+                desc->name, desc->component->flags);
      continue;
    }

    /* initialize the component */
    if (desc->component->init && desc->component->init(0) < 0) {
      if (hwloc_components_verbose)
-	fprintf(stderr, "Ignoring plugin `%s', failed to initialize\n", desc->name);
+	fprintf(stderr, "hwloc: Ignoring plugin `%s', failed to initialize\n", desc->name);
      continue;
    }
    /* queue ->finalize() callback if any */
@ -608,7 +611,7 @@ hwloc_disc_component_blacklist_one(struct hwloc_topology *topology,
    /* replace linuxpci and linuxio with linux (with IO phases)
     * for backward compatibility with pre-v2.0 and v2.0 respectively */
    if (hwloc_components_verbose)
-      fprintf(stderr, "Replacing deprecated component `%s' with `linux' IO phases in blacklisting\n", name);
+      fprintf(stderr, "hwloc: Replacing deprecated component `%s' with `linux' IO phases in blacklisting\n", name);
    comp = hwloc_disc_component_find("linux", NULL);
    phases = HWLOC_DISC_PHASE_PCI | HWLOC_DISC_PHASE_IO | HWLOC_DISC_PHASE_MISC | HWLOC_DISC_PHASE_ANNOTATE;

@ -624,7 +627,7 @@ hwloc_disc_component_blacklist_one(struct hwloc_topology *topology,
  }

  if (hwloc_components_verbose)
-    fprintf(stderr, "Blacklisting component `%s` phases 0x%x\n", comp->name, phases);
+    fprintf(stderr, "hwloc: Blacklisting component `%s` phases 0x%x\n", comp->name, phases);

  for(i=0; i<topology->nr_blacklisted_components; i++) {
    if (topology->blacklisted_components[i].component == comp) {
@ -727,7 +730,7 @@ hwloc_disc_component_try_enable(struct hwloc_topology *topology,
    if (hwloc_components_verbose)
      /* do not warn if envvar_forced since system-wide HWLOC_COMPONENTS must be silently ignored after set_xml() etc.
       */
-      fprintf(stderr, "Excluding discovery component `%s' phases 0x%x, conflicts with excludes 0x%x\n",
+      fprintf(stderr, "hwloc: Excluding discovery component `%s' phases 0x%x, conflicts with excludes 0x%x\n",
 	      comp->name, comp->phases, topology->backend_excluded_phases);
    return -1;
  }
@ -735,8 +738,8 @@ hwloc_disc_component_try_enable(struct hwloc_topology *topology,
  backend = comp->instantiate(topology, comp, topology->backend_excluded_phases | blacklisted_phases,
 			      NULL, NULL, NULL);
  if (!backend) {
-    if (hwloc_components_verbose || envvar_forced)
-      fprintf(stderr, "Failed to instantiate discovery component `%s'\n", comp->name);
+    if (hwloc_components_verbose || (envvar_forced && HWLOC_SHOW_CRITICAL_ERRORS()))
+      fprintf(stderr, "hwloc: Failed to instantiate discovery component `%s'\n", comp->name);
    return -1;
  }

@ -817,7 +820,7 @@ hwloc_disc_components_enable_others(struct hwloc_topology *topology)
 	name = curenv;
 	if (!strcmp(name, "linuxpci") || !strcmp(name, "linuxio")) {
 	  if (hwloc_components_verbose)
-	    fprintf(stderr, "Replacing deprecated component `%s' with `linux' in envvar forcing\n", name);
+	    fprintf(stderr, "hwloc: Replacing deprecated component `%s' with `linux' in envvar forcing\n", name);
 	  name = "linux";
 	}

@ -832,7 +835,8 @@ hwloc_disc_components_enable_others(struct hwloc_topology *topology)
 	  if (comp->phases & ~blacklisted_phases)
 	    hwloc_disc_component_try_enable(topology, comp, 1 /* envvar forced */, blacklisted_phases);
 	} else {
-	  fprintf(stderr, "Cannot find discovery component `%s'\n", name);
+          if (HWLOC_SHOW_CRITICAL_ERRORS())
+            fprintf(stderr, "hwloc: Cannot find discovery component `%s'\n", name);
 	}

 	/* restore chars (the second loop below needs env to be unmodified) */
@ -864,7 +868,7 @@ hwloc_disc_components_enable_others(struct hwloc_topology *topology)

      if (!(comp->phases & ~blacklisted_phases)) {
 	if (hwloc_components_verbose)
-	  fprintf(stderr, "Excluding blacklisted discovery component `%s' phases 0x%x\n",
+	  fprintf(stderr, "hwloc: Excluding blacklisted discovery component `%s' phases 0x%x\n",
 		  comp->name, comp->phases);
 	goto nextcomp;
      }
@ -879,7 +883,7 @@ nextcomp:
    /* print a summary */
    int first = 1;
    backend = topology->backends;
-    fprintf(stderr, "Final list of enabled discovery components: ");
+    fprintf(stderr, "hwloc: Final list of enabled discovery components: ");
    while (backend != NULL) {
      fprintf(stderr, "%s%s(0x%x)", first ? "" : ",", backend->component->name, backend->phases);
      backend = backend->next;
@ -935,7 +939,7 @@ hwloc_backend_alloc(struct hwloc_topology *topology,
  /* filter-out component phases that are excluded */
  backend->phases = component->phases & ~topology->backend_excluded_phases;
  if (backend->phases != component->phases && hwloc_components_verbose)
-    fprintf(stderr, "Trying discovery component `%s' with phases 0x%x instead of 0x%x\n",
+    fprintf(stderr, "hwloc: Trying discovery component `%s' with phases 0x%x instead of 0x%x\n",
 	    component->name, backend->phases, component->phases);
  backend->flags = 0;
  backend->discover = NULL;
@ -963,8 +967,9 @@ hwloc_backend_enable(struct hwloc_backend *backend)

  /* check backend flags */
  if (backend->flags) {
-    fprintf(stderr, "Cannot enable discovery component `%s' phases 0x%x with unknown flags %lx\n",
-	    backend->component->name, backend->component->phases, backend->flags);
+    if (HWLOC_SHOW_CRITICAL_ERRORS())
+      fprintf(stderr, "hwloc: Cannot enable discovery component `%s' phases 0x%x with unknown flags %lx\n",
+              backend->component->name, backend->component->phases, backend->flags);
    return -1;
  }

@ -973,7 +978,7 @@ hwloc_backend_enable(struct hwloc_backend *backend)
  while (NULL != *pprev) {
    if ((*pprev)->component == backend->component) {
      if (hwloc_components_verbose)
-	fprintf(stderr, "Cannot enable  discovery component `%s' phases 0x%x twice\n",
+	fprintf(stderr, "hwloc: Cannot enable  discovery component `%s' phases 0x%x twice\n",
 		backend->component->name, backend->component->phases);
      hwloc_backend_disable(backend);
      errno = EBUSY;
@ -983,7 +988,7 @@ hwloc_backend_enable(struct hwloc_backend *backend)
  }

  if (hwloc_components_verbose)
-    fprintf(stderr, "Enabling discovery component `%s' with phases 0x%x (among 0x%x)\n",
+    fprintf(stderr, "hwloc: Enabling discovery component `%s' with phases 0x%x (among 0x%x)\n",
 	    backend->component->name, backend->phases, backend->component->phases);

  /* enqueue at the end */
@ -1067,7 +1072,7 @@ hwloc_backends_disable_all(struct hwloc_topology *topology)
  while (NULL != (backend = topology->backends)) {
    struct hwloc_backend *next = backend->next;
    if (hwloc_components_verbose)
-      fprintf(stderr, "Disabling discovery component `%s'\n",
+      fprintf(stderr, "hwloc: Disabling discovery component `%s'\n",
 	      backend->component->name);
    hwloc_backend_disable(backend);
    topology->backends = next;
--- a/src/3rdparty/hwloc/src/cpukinds.c
+++ b/src/3rdparty/hwloc/src/cpukinds.c
@ -1,5 +1,5 @@
 /*
- * Copyright © 2020-2021 Inria.  All rights reserved.
+ * Copyright © 2020-2022 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@ -42,6 +42,9 @@ hwloc_internal_cpukinds_dup(hwloc_topology_t new, hwloc_topology_t old)
  struct hwloc_internal_cpukind_s *kinds;
  unsigned i;

+  if (!old->nr_cpukinds)
+    return 0;
+
  kinds = hwloc_tma_malloc(tma, old->nr_cpukinds * sizeof(*kinds));
  if (!kinds)
    return -1;
@ -343,7 +346,8 @@ enum hwloc_cpukinds_ranking {
  HWLOC_CPUKINDS_RANKING_DEFAULT, /* forced + frequency on ARM, forced + coretype_frequency otherwise */
  HWLOC_CPUKINDS_RANKING_NO_FORCED_EFFICIENCY, /* default without forced */
  HWLOC_CPUKINDS_RANKING_FORCED_EFFICIENCY,
-  HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY,
+  HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY, /* either coretype or frequency or both */
+  HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY_STRICT, /* both coretype and frequency are required */
  HWLOC_CPUKINDS_RANKING_CORETYPE,
  HWLOC_CPUKINDS_RANKING_FREQUENCY,
  HWLOC_CPUKINDS_RANKING_FREQUENCY_MAX,
@ -358,9 +362,9 @@ hwloc__cpukinds_try_rank_by_info(struct hwloc_topology *topology,
 {
  unsigned i;

-  if (HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY == heuristics) {
-    hwloc_debug("Trying to rank cpukinds by coretype+frequency...\n");
-    /* we need intel_core_type + (base or max freq) for all kinds */
+  if (HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY_STRICT == heuristics) {
+    hwloc_debug("Trying to rank cpukinds by coretype+frequency_strict...\n");
+    /* we need intel_core_type AND (base or max freq) for all kinds */
    if (!summary->have_intel_core_type
        || (!summary->have_max_freq && !summary->have_base_freq))
      return -1;
@ -373,6 +377,21 @@ hwloc__cpukinds_try_rank_by_info(struct hwloc_topology *topology,
        kind->ranking_value = (summary->summaries[i].intel_core_type << 20) + summary->summaries[i].max_freq;
    }

+  } else if (HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY == heuristics) {
+    hwloc_debug("Trying to rank cpukinds by coretype+frequency...\n");
+    /* we need intel_core_type OR (base or max freq) for all kinds */
+    if (!summary->have_intel_core_type
+        && (!summary->have_max_freq && !summary->have_base_freq))
+      return -1;
+    /* rank first by coretype (Core>>Atom) then by frequency, base if available, max otherwise */
+    for(i=0; i<topology->nr_cpukinds; i++) {
+      struct hwloc_internal_cpukind_s *kind = &topology->cpukinds[i];
+      if (summary->have_base_freq)
+        kind->ranking_value = (summary->summaries[i].intel_core_type << 20) + summary->summaries[i].base_freq;
+      else
+        kind->ranking_value = (summary->summaries[i].intel_core_type << 20) + summary->summaries[i].max_freq;
+    }
+
  } else if (HWLOC_CPUKINDS_RANKING_CORETYPE == heuristics) {
    hwloc_debug("Trying to rank cpukinds by coretype...\n");
    /* we need intel_core_type */
@ -429,7 +448,9 @@ static int hwloc__cpukinds_compare_ranking_values(const void *_a, const void *_b
 {
  const struct hwloc_internal_cpukind_s *a = _a;
  const struct hwloc_internal_cpukind_s *b = _b;
-  return a->ranking_value - b->ranking_value;
+  uint64_t arv = a->ranking_value;
+  uint64_t brv = b->ranking_value;
+  return arv < brv ? -1 : arv > brv ? 1 : 0;
 }

 /* this function requires ranking values to be unique */
@ -469,6 +490,8 @@ hwloc_internal_cpukinds_rank(struct hwloc_topology *topology)
      heuristics = HWLOC_CPUKINDS_RANKING_NONE;
    else if (!strcmp(env, "coretype+frequency"))
      heuristics = HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY;
+    else if (!strcmp(env, "coretype+frequency_strict"))
+      heuristics = HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY_STRICT;
    else if (!strcmp(env, "coretype"))
      heuristics = HWLOC_CPUKINDS_RANKING_CORETYPE;
    else if (!strcmp(env, "frequency"))
@ -481,16 +504,14 @@ hwloc_internal_cpukinds_rank(struct hwloc_topology *topology)
      heuristics = HWLOC_CPUKINDS_RANKING_FORCED_EFFICIENCY;
    else if (!strcmp(env, "no_forced_efficiency"))
      heuristics = HWLOC_CPUKINDS_RANKING_NO_FORCED_EFFICIENCY;
-    else if (!hwloc_hide_errors())
-      fprintf(stderr, "Failed to recognize HWLOC_CPUKINDS_RANKING value %s\n", env);
+    else if (HWLOC_SHOW_CRITICAL_ERRORS())
+      fprintf(stderr, "hwloc: Failed to recognize HWLOC_CPUKINDS_RANKING value %s\n", env);
  }

  if (heuristics == HWLOC_CPUKINDS_RANKING_DEFAULT
      || heuristics == HWLOC_CPUKINDS_RANKING_NO_FORCED_EFFICIENCY) {
    /* default is forced_efficiency first */
    struct hwloc_cpukinds_info_summary summary;
-    enum hwloc_cpukinds_ranking subheuristics;
-    const char *arch;

    if (heuristics == HWLOC_CPUKINDS_RANKING_DEFAULT)
      hwloc_debug("Using default ranking strategy...\n");
@ -508,16 +529,7 @@ hwloc_internal_cpukinds_rank(struct hwloc_topology *topology)
      goto failed;
    hwloc__cpukinds_summarize_info(topology, &summary);

-    arch = hwloc_obj_get_info_by_name(topology->levels[0][0], "Architecture");
-    /* TODO: rather coretype_frequency only on x86/Intel? */
-    if (arch && (!strncmp(arch, "arm", 3) || !strncmp(arch, "aarch", 5)))
-      /* then frequency on ARM */
-      subheuristics = HWLOC_CPUKINDS_RANKING_FREQUENCY;
-    else
-      /* or coretype+frequency otherwise */
-      subheuristics = HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY;
-
-    err = hwloc__cpukinds_try_rank_by_info(topology, subheuristics, &summary);
+    err = hwloc__cpukinds_try_rank_by_info(topology, HWLOC_CPUKINDS_RANKING_CORETYPE_FREQUENCY, &summary);
    free(summary.summaries);
    if (!err)
      goto ready;
--- a/src/3rdparty/hwloc/src/diff.c
+++ b/src/3rdparty/hwloc/src/diff.c
@ -1,5 +1,5 @@
 /*
- * Copyright © 2013-2020 Inria.  All rights reserved.
+ * Copyright © 2013-2022 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@ -218,7 +218,7 @@ hwloc_diff_trees(hwloc_topology_t topo1, hwloc_obj_t obj1,
 		struct hwloc_info_s *info1 = &obj1->infos[i], *info2 = &obj2->infos[i];
 		if (strcmp(info1->name, info2->name))
 			goto out_too_complex;
-		if (strcmp(obj1->infos[i].value, obj2->infos[i].value)) {
+		if (strcmp(info1->value, info2->value)) {
 			err = hwloc_append_diff_obj_attr_string(obj1,
 								HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO,
 								info1->name,
--- a/src/3rdparty/hwloc/src/distances.c
+++ b/src/3rdparty/hwloc/src/distances.c
@ -1,5 +1,5 @@
 /*
- * Copyright © 2010-2020 Inria.  All rights reserved.
+ * Copyright © 2010-2022 Inria.  All rights reserved.
 * Copyright © 2011-2012 Université Bordeaux
 * Copyright © 2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -17,6 +17,37 @@
 static struct hwloc_internal_distances_s *
 hwloc__internal_distances_from_public(hwloc_topology_t topology, struct hwloc_distances_s *distances);

+static void
+hwloc__groups_by_distances(struct hwloc_topology *topology, unsigned nbobjs, struct hwloc_obj **objs, uint64_t *values, unsigned long kind, unsigned nbaccuracies, float *accuracies, int needcheck);
+
+static void
+hwloc_internal_distances_restrict(hwloc_obj_t *objs,
+				  uint64_t *indexes,
+				  hwloc_obj_type_t *different_types,
+				  uint64_t *values,
+				  unsigned nbobjs, unsigned disappeared);
+
+static void
+hwloc_internal_distances_print_matrix(struct hwloc_internal_distances_s *dist)
+{
+  unsigned nbobjs = dist->nbobjs;
+  hwloc_obj_t *objs = dist->objs;
+  hwloc_uint64_t *values = dist->values;
+  int gp = !HWLOC_DIST_TYPE_USE_OS_INDEX(dist->unique_type);
+  unsigned i, j;
+
+  fprintf(stderr, "%s", gp ? "gp_index" : "os_index");
+  for(j=0; j<nbobjs; j++)
+    fprintf(stderr, " % 5d", (int)(gp ? objs[j]->gp_index : objs[j]->os_index));
+  fprintf(stderr, "\n");
+  for(i=0; i<nbobjs; i++) {
+    fprintf(stderr, "  % 5d", (int)(gp ? objs[i]->gp_index : objs[i]->os_index));
+    for(j=0; j<nbobjs; j++)
+      fprintf(stderr, " % 5lld", (long long) values[i*nbobjs + j]);
+    fprintf(stderr, "\n");
+  }
+}
+
 /******************************************************
 * Global init, prepare, destroy, dup
 */
@ -244,27 +275,33 @@ int hwloc_distances_release_remove(hwloc_topology_t topology,
  return 0;
 }

-/******************************************************
- * Add distances to the topology
+/*********************************************************
+ * Backend functions for adding distances to the topology
 */

+/* cancel a distances handle. only needed internally for now */
 static void
-hwloc__groups_by_distances(struct hwloc_topology *topology, unsigned nbobjs, struct hwloc_obj **objs, uint64_t *values, unsigned long kind, unsigned nbaccuracies, float *accuracies, int needcheck);
+hwloc_backend_distances_add__cancel(struct hwloc_internal_distances_s *dist)
+{
+  /* everything is set to NULL in hwloc_backend_distances_add_create() */
+  free(dist->name);
+  free(dist->indexes);
+  free(dist->objs);
+  free(dist->different_types);
+  free(dist->values);
+  free(dist);
+}

-/* insert a distance matrix in the topology.
- * the caller gives us the distances and objs pointers, we'll free them later.
+/* prepare a distances handle for later commit in the topology.
+ * we duplicate the caller's name.
 */
-static int
-hwloc_internal_distances__add(hwloc_topology_t topology, const char *name,
-			      hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types,
-			      unsigned nbobjs, hwloc_obj_t *objs, uint64_t *indexes, uint64_t *values,
-			      unsigned long kind, unsigned iflags)
+hwloc_backend_distances_add_handle_t
+hwloc_backend_distances_add_create(hwloc_topology_t topology,
+                                   const char *name, unsigned long kind, unsigned long flags)
 {
  struct hwloc_internal_distances_s *dist;

-  if (different_types) {
-    kind |= HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES; /* the user isn't forced to give it */
-  } else if (kind & HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES) {
+  if (flags) {
    errno = EINVAL;
    goto err;
  }
@ -273,110 +310,54 @@ hwloc_internal_distances__add(hwloc_topology_t topology, const char *name,
  if (!dist)
    goto err;

-  if (name)
+  if (name) {
    dist->name = strdup(name); /* ignore failure */
-
-  dist->unique_type = unique_type;
-  dist->different_types = different_types;
-  dist->nbobjs = nbobjs;
-  dist->kind = kind;
-  dist->iflags = iflags;
-
-  assert(!!(iflags & HWLOC_INTERNAL_DIST_FLAG_OBJS_VALID) == !!objs);
-
-  if (!objs) {
-    assert(indexes);
-    /* we only have indexes, we'll refresh objs from there */
-    dist->indexes = indexes;
-    dist->objs = calloc(nbobjs, sizeof(hwloc_obj_t));
-    if (!dist->objs)
+    if (!dist->name)
      goto err_with_dist;
-
-  } else {
-    unsigned i;
-    assert(!indexes);
-    /* we only have objs, generate the indexes arrays so that we can refresh objs later */
-    dist->objs = objs;
-    dist->indexes = malloc(nbobjs * sizeof(*dist->indexes));
-    if (!dist->indexes)
-      goto err_with_dist;
-    if (HWLOC_DIST_TYPE_USE_OS_INDEX(dist->unique_type)) {
-      for(i=0; i<nbobjs; i++)
-	dist->indexes[i] = objs[i]->os_index;
-    } else {
-      for(i=0; i<nbobjs; i++)
-	dist->indexes[i] = objs[i]->gp_index;
-    }
  }

-  dist->values = values;
+  dist->kind = kind;
+  dist->iflags = HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED;
+
+  dist->unique_type = HWLOC_OBJ_TYPE_NONE;
+  dist->different_types = NULL;
+  dist->nbobjs = 0;
+  dist->indexes = NULL;
+  dist->objs = NULL;
+  dist->values = NULL;

  dist->id = topology->next_dist_id++;
-
-  if (topology->last_dist)
-    topology->last_dist->next = dist;
-  else
-    topology->first_dist = dist;
-  dist->prev = topology->last_dist;
-  dist->next = NULL;
-  topology->last_dist = dist;
-  return 0;
+  return dist;

 err_with_dist:
-  if (name)
-    free(dist->name);
-  free(dist);
+  hwloc_backend_distances_add__cancel(dist);
 err:
-  free(different_types);
-  free(objs);
-  free(indexes);
-  free(values);
-  return -1;
+  return NULL;
 }

-int hwloc_internal_distances_add_by_index(hwloc_topology_t topology, const char *name,
-					  hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types, unsigned nbobjs, uint64_t *indexes, uint64_t *values,
-					  unsigned long kind, unsigned long flags)
+/* attach objects and values to a distances handle.
+ * on success, objs and values arrays are attached and will be freed with the distances.
+ * on failure, the handle is freed.
+ */
+int
+hwloc_backend_distances_add_values(hwloc_topology_t topology __hwloc_attribute_unused,
+                                   hwloc_backend_distances_add_handle_t handle,
+                                   unsigned nbobjs, hwloc_obj_t *objs,
+                                   hwloc_uint64_t *values,
+                                   unsigned long flags)
 {
-  unsigned iflags = 0; /* objs not valid */
-
-  if (nbobjs < 2) {
-    errno = EINVAL;
-    goto err;
-  }
-
-  /* cannot group without objects,
-   * and we don't group from XML anyway since the hwloc that generated the XML should have grouped already.
-   */
-  if (flags & HWLOC_DISTANCES_ADD_FLAG_GROUP) {
-    errno = EINVAL;
-    goto err;
-  }
-
-  return hwloc_internal_distances__add(topology, name, unique_type, different_types, nbobjs, NULL, indexes, values, kind, iflags);
-
- err:
-  free(indexes);
-  free(values);
-  free(different_types);
-  return -1;
-}
-
-static void
-hwloc_internal_distances_restrict(hwloc_obj_t *objs,
-				  uint64_t *indexes,
-				  uint64_t *values,
-				  unsigned nbobjs, unsigned disappeared);
-
-int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
-				 unsigned nbobjs, hwloc_obj_t *objs, uint64_t *values,
-				 unsigned long kind, unsigned long flags)
-{
-  hwloc_obj_type_t unique_type, *different_types;
+  struct hwloc_internal_distances_s *dist = handle;
+  hwloc_obj_type_t unique_type, *different_types = NULL;
+  hwloc_uint64_t *indexes = NULL;
  unsigned i, disappeared = 0;
-  unsigned iflags = HWLOC_INTERNAL_DIST_FLAG_OBJS_VALID;

-  if (nbobjs < 2) {
+  if (dist->nbobjs || !(dist->iflags & HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED)) {
+    /* target distances is already set */
+    errno = EINVAL;
+    goto err;
+  }
+
+  if (flags || nbobjs < 2 || !objs || !values) {
    errno = EINVAL;
    goto err;
  }
@ -389,15 +370,18 @@ int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
    /* some objects are NULL */
    if (disappeared == nbobjs) {
      /* nothing left, drop the matrix */
-      free(objs);
-      free(values);
-      return 0;
+      errno = ENOENT;
+      goto err;
    }
    /* restrict the matrix */
-    hwloc_internal_distances_restrict(objs, NULL, values, nbobjs, disappeared);
+    hwloc_internal_distances_restrict(objs, NULL, NULL, values, nbobjs, disappeared);
    nbobjs -= disappeared;
  }

+  indexes = malloc(nbobjs * sizeof(*indexes));
+  if (!indexes)
+    goto err;
+
  unique_type = objs[0]->type;
  for(i=1; i<nbobjs; i++)
    if (objs[i]->type != unique_type) {
@ -408,16 +392,108 @@ int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
    /* heterogeneous types */
    different_types = malloc(nbobjs * sizeof(*different_types));
    if (!different_types)
-      goto err;
+      goto err_with_indexes;
    for(i=0; i<nbobjs; i++)
      different_types[i] = objs[i]->type;
-
-  } else {
-    /* homogeneous types */
-    different_types = NULL;
  }

-  if (topology->grouping && (flags & HWLOC_DISTANCES_ADD_FLAG_GROUP) && !different_types) {
+  dist->nbobjs = nbobjs;
+  dist->objs = objs;
+  dist->iflags |= HWLOC_INTERNAL_DIST_FLAG_OBJS_VALID;
+  dist->indexes = indexes;
+  dist->unique_type = unique_type;
+  dist->different_types = different_types;
+  dist->values = values;
+
+  if (different_types)
+    dist->kind |= HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES;
+
+  if (HWLOC_DIST_TYPE_USE_OS_INDEX(dist->unique_type)) {
+      for(i=0; i<nbobjs; i++)
+	dist->indexes[i] = objs[i]->os_index;
+    } else {
+      for(i=0; i<nbobjs; i++)
+	dist->indexes[i] = objs[i]->gp_index;
+    }
+
+  return 0;
+
+ err_with_indexes:
+  free(indexes);
+ err:
+  hwloc_backend_distances_add__cancel(dist);
+  return -1;
+}
+
+/* attach objects and values to a distance handle.
+ * on success, objs and values arrays are attached and will be freed with the distances.
+ * on failure, the handle is freed.
+ */
+static int
+hwloc_backend_distances_add_values_by_index(hwloc_topology_t topology __hwloc_attribute_unused,
+                                            hwloc_backend_distances_add_handle_t handle,
+                                            unsigned nbobjs, hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types, hwloc_uint64_t *indexes,
+                                            hwloc_uint64_t *values)
+{
+  struct hwloc_internal_distances_s *dist = handle;
+  hwloc_obj_t *objs;
+
+  if (dist->nbobjs || !(dist->iflags & HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED)) {
+    /* target distances is already set */
+    errno = EINVAL;
+    goto err;
+  }
+  if (nbobjs < 2 || !indexes || !values || (unique_type == HWLOC_OBJ_TYPE_NONE && !different_types)) {
+    errno = EINVAL;
+    goto err;
+  }
+
+  objs = malloc(nbobjs * sizeof(*objs));
+  if (!objs)
+    goto err;
+
+  dist->nbobjs = nbobjs;
+  dist->objs = objs;
+  dist->indexes = indexes;
+  dist->unique_type = unique_type;
+  dist->different_types = different_types;
+  dist->values = values;
+
+  if (different_types)
+    dist->kind |= HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES;
+
+  return 0;
+
+ err:
+  hwloc_backend_distances_add__cancel(dist);
+  return -1;
+}
+
+/* commit a distances handle.
+ * on failure, the handle is freed with its objects and values arrays.
+ */
+int
+hwloc_backend_distances_add_commit(hwloc_topology_t topology,
+                                   hwloc_backend_distances_add_handle_t handle,
+                                   unsigned long flags)
+{
+  struct hwloc_internal_distances_s *dist = handle;
+
+  if (!dist->nbobjs || !(dist->iflags & HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED)) {
+    /* target distances not ready for commit */
+    errno = EINVAL;
+    goto err;
+  }
+
+  if ((flags & HWLOC_DISTANCES_ADD_FLAG_GROUP) && !dist->objs) {
+    /* cannot group without objects,
+     * and we don't group from XML anyway since the hwloc that generated the XML should have grouped already.
+     */
+    errno = EINVAL;
+    goto err;
+  }
+
+  if (topology->grouping && (flags & HWLOC_DISTANCES_ADD_FLAG_GROUP) && !dist->different_types) {
    float full_accuracy = 0.f;
    float *accuracies;
    unsigned nbaccuracies;
@ -431,26 +507,94 @@ int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
    }

    if (topology->grouping_verbose) {
-      unsigned j;
-      int gp = !HWLOC_DIST_TYPE_USE_OS_INDEX(unique_type);
      fprintf(stderr, "Trying to group objects using distance matrix:\n");
-      fprintf(stderr, "%s", gp ? "gp_index" : "os_index");
-      for(j=0; j<nbobjs; j++)
-	fprintf(stderr, " % 5d", (int)(gp ? objs[j]->gp_index : objs[j]->os_index));
-      fprintf(stderr, "\n");
-      for(i=0; i<nbobjs; i++) {
-	fprintf(stderr, "  % 5d", (int)(gp ? objs[i]->gp_index : objs[i]->os_index));
-	for(j=0; j<nbobjs; j++)
-	  fprintf(stderr, " % 5lld", (long long) values[i*nbobjs + j]);
-	fprintf(stderr, "\n");
-      }
+      hwloc_internal_distances_print_matrix(dist);
    }

-    hwloc__groups_by_distances(topology, nbobjs, objs, values,
-			       kind, nbaccuracies, accuracies, 1 /* check the first matrice */);
+    hwloc__groups_by_distances(topology, dist->nbobjs, dist->objs, dist->values,
+			       dist->kind, nbaccuracies, accuracies, 1 /* check the first matrix */);
  }

-  return hwloc_internal_distances__add(topology, name, unique_type, different_types, nbobjs, objs, NULL, values, kind, iflags);
+  if (topology->last_dist)
+    topology->last_dist->next = dist;
+  else
+    topology->first_dist = dist;
+  dist->prev = topology->last_dist;
+  dist->next = NULL;
+  topology->last_dist = dist;
+
+  dist->iflags &= ~HWLOC_INTERNAL_DIST_FLAG_NOT_COMMITTED;
+  return 0;
+
+ err:
+  hwloc_backend_distances_add__cancel(dist);
+  return -1;
+}
+
+/* all-in-one backend function not exported to plugins, only used by XML for now */
+int hwloc_internal_distances_add_by_index(hwloc_topology_t topology, const char *name,
+                                          hwloc_obj_type_t unique_type, hwloc_obj_type_t *different_types, unsigned nbobjs, uint64_t *indexes, uint64_t *values,
+                                          unsigned long kind, unsigned long flags)
+{
+  hwloc_backend_distances_add_handle_t handle;
+  int err;
+
+  handle = hwloc_backend_distances_add_create(topology, name, kind, 0);
+  if (!handle)
+    goto err;
+
+  err = hwloc_backend_distances_add_values_by_index(topology, handle,
+                                                    nbobjs, unique_type, different_types, indexes,
+                                                    values);
+  if (err < 0)
+    goto err;
+
+  /* arrays are now attached to the handle */
+  indexes = NULL;
+  different_types = NULL;
+  values = NULL;
+
+  err = hwloc_backend_distances_add_commit(topology, handle, flags);
+  if (err < 0)
+    goto err;
+
+  return 0;
+
+ err:
+  free(indexes);
+  free(different_types);
+  free(values);
+  return -1;
+}
+
+/* all-in-one backend function not exported to plugins, used by OS backends */
+int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
+                                 unsigned nbobjs, hwloc_obj_t *objs, uint64_t *values,
+                                 unsigned long kind, unsigned long flags)
+{
+  hwloc_backend_distances_add_handle_t handle;
+  int err;
+
+  handle = hwloc_backend_distances_add_create(topology, name, kind, 0);
+  if (!handle)
+    goto err;
+
+  err = hwloc_backend_distances_add_values(topology, handle,
+                                           nbobjs, objs,
+                                           values,
+                                           0);
+  if (err < 0)
+    goto err;
+
+  /* arrays are now attached to the handle */
+  objs = NULL;
+  values = NULL;
+
+  err = hwloc_backend_distances_add_commit(topology, handle, flags);
+  if (err < 0)
+    goto err;
+
+  return 0;

 err:
  free(objs);
@ -458,44 +602,54 @@ int hwloc_internal_distances_add(hwloc_topology_t topology, const char *name,
  return -1;
 }

+/********************************
+ * User API for adding distances
+ */
+
 #define HWLOC_DISTANCES_KIND_FROM_ALL (HWLOC_DISTANCES_KIND_FROM_OS|HWLOC_DISTANCES_KIND_FROM_USER)
 #define HWLOC_DISTANCES_KIND_MEANS_ALL (HWLOC_DISTANCES_KIND_MEANS_LATENCY|HWLOC_DISTANCES_KIND_MEANS_BANDWIDTH)
-#define HWLOC_DISTANCES_KIND_ALL (HWLOC_DISTANCES_KIND_FROM_ALL|HWLOC_DISTANCES_KIND_MEANS_ALL)
+#define HWLOC_DISTANCES_KIND_ALL (HWLOC_DISTANCES_KIND_FROM_ALL|HWLOC_DISTANCES_KIND_MEANS_ALL|HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES)
 #define HWLOC_DISTANCES_ADD_FLAG_ALL (HWLOC_DISTANCES_ADD_FLAG_GROUP|HWLOC_DISTANCES_ADD_FLAG_GROUP_INACCURATE)

-/* The actual function exported to the user
- */
-int hwloc_distances_add(hwloc_topology_t topology,
-			unsigned nbobjs, hwloc_obj_t *objs, hwloc_uint64_t *values,
-			unsigned long kind, unsigned long flags)
+void * hwloc_distances_add_create(hwloc_topology_t topology,
+                                  const char *name, unsigned long kind,
+                                  unsigned long flags)
+{
+  if (!topology->is_loaded) {
+    errno = EINVAL;
+    return NULL;
+  }
+  if (topology->adopted_shmem_addr) {
+    errno = EPERM;
+    return NULL;
+  }
+  if ((kind & ~HWLOC_DISTANCES_KIND_ALL)
+      || hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_FROM_ALL) != 1
+      || hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_MEANS_ALL) != 1) {
+    errno = EINVAL;
+    return NULL;
+  }
+
+  return hwloc_backend_distances_add_create(topology, name, kind, flags);
+}
+
+int hwloc_distances_add_values(hwloc_topology_t topology,
+                               void *handle,
+                               unsigned nbobjs, hwloc_obj_t *objs,
+                               hwloc_uint64_t *values,
+                               unsigned long flags)
 {
  unsigned i;
  uint64_t *_values;
  hwloc_obj_t *_objs;
  int err;

-  if (nbobjs < 2 || !objs || !values || !topology->is_loaded) {
-    errno = EINVAL;
-    return -1;
-  }
-  if (topology->adopted_shmem_addr) {
-    errno = EPERM;
-    return -1;
-  }
-  if ((kind & ~HWLOC_DISTANCES_KIND_ALL)
-      || hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_FROM_ALL) != 1
-      || hwloc_weight_long(kind & HWLOC_DISTANCES_KIND_MEANS_ALL) != 1
-      || (flags & ~HWLOC_DISTANCES_ADD_FLAG_ALL)) {
-    errno = EINVAL;
-    return -1;
-  }
-
  /* no strict need to check for duplicates, things shouldn't break */

  for(i=1; i<nbobjs; i++)
    if (!objs[i]) {
      errno = EINVAL;
-      return -1;
+      goto out;
    }

  /* copy the input arrays and give them to the topology */
@ -506,22 +660,78 @@ int hwloc_distances_add(hwloc_topology_t topology,

  memcpy(_objs, objs, nbobjs*sizeof(hwloc_obj_t));
  memcpy(_values, values, nbobjs*nbobjs*sizeof(*_values));
-  err = hwloc_internal_distances_add(topology, NULL, nbobjs, _objs, _values, kind, flags);
-  if (err < 0)
-    goto out; /* _objs and _values freed in hwloc_internal_distances_add() */
+
+  err = hwloc_backend_distances_add_values(topology, handle, nbobjs, _objs, _values, flags);
+  if (err < 0) {
+    /* handle was canceled inside hwloc_backend_distances_add_values */
+    handle = NULL;
+    goto out_with_arrays;
+  }
+
+  return 0;
+
+ out_with_arrays:
+  free(_objs);
+  free(_values);
+ out:
+  if (handle)
+    hwloc_backend_distances_add__cancel(handle);
+  return -1;
+}
+
+int
+hwloc_distances_add_commit(hwloc_topology_t topology,
+                           void *handle,
+                           unsigned long flags)
+{
+  int err;
+
+  if (flags & ~HWLOC_DISTANCES_ADD_FLAG_ALL) {
+    errno = EINVAL;
+    goto out;
+  }
+
+  err = hwloc_backend_distances_add_commit(topology, handle, flags);
+  if (err < 0) {
+    /* handle was canceled inside hwloc_backend_distances_add_commit */
+    handle = NULL;
+    goto out;
+  }

  /* in case we added some groups, see if we need to reconnect */
  hwloc_topology_reconnect(topology, 0);

  return 0;

- out_with_arrays:
-  free(_values);
-  free(_objs);
 out:
+  if (handle)
+    hwloc_backend_distances_add__cancel(handle);
  return -1;
 }

+/* deprecated all-in-one user function */
+int hwloc_distances_add(hwloc_topology_t topology,
+			unsigned nbobjs, hwloc_obj_t *objs, hwloc_uint64_t *values,
+			unsigned long kind, unsigned long flags)
+{
+  void *handle;
+  int err;
+
+  handle = hwloc_distances_add_create(topology, NULL, kind, 0);
+  if (!handle)
+    return -1;
+
+  err = hwloc_distances_add_values(topology, handle, nbobjs, objs, values, 0);
+  if (err < 0)
+    return -1;
+
+  err = hwloc_distances_add_commit(topology, handle, flags);
+  if (err < 0)
+    return -1;
+
+  return 0;
+}
+
 /******************************************************
 * Refresh objects in distances
 */
@ -529,6 +739,7 @@ int hwloc_distances_add(hwloc_topology_t topology,
 static void
 hwloc_internal_distances_restrict(hwloc_obj_t *objs,
 				  uint64_t *indexes,
+                                  hwloc_obj_type_t *different_types,
 				  uint64_t *values,
 				  unsigned nbobjs, unsigned disappeared)
 {
@ -550,6 +761,8 @@ hwloc_internal_distances_restrict(hwloc_obj_t *objs,
      objs[newi] = objs[i];
      if (indexes)
 	indexes[newi] = indexes[i];
+      if (different_types)
+        different_types[newi] = different_types[i];
      newi++;
    }
 }
@ -594,7 +807,7 @@ hwloc_internal_distances_refresh_one(hwloc_topology_t topology,
    return -1;

  if (disappeared) {
-    hwloc_internal_distances_restrict(objs, dist->indexes, dist->values, nbobjs, disappeared);
+    hwloc_internal_distances_restrict(objs, dist->indexes, dist->different_types, dist->values, nbobjs, disappeared);
    dist->nbobjs -= disappeared;
  }

@ -647,7 +860,7 @@ struct hwloc_distances_container_s {
  struct hwloc_distances_s distances;
 };

-#define HWLOC_DISTANCES_CONTAINER_OFFSET ((char*)&((struct hwloc_distances_container_s*)NULL)->distances - (char*)NULL)
+#define HWLOC_DISTANCES_CONTAINER_OFFSET ((uintptr_t)(&((struct hwloc_distances_container_s*)NULL)->distances) - (uintptr_t)NULL)
 #define HWLOC_DISTANCES_CONTAINER(_d) (struct hwloc_distances_container_s *) ( ((char*)_d) - HWLOC_DISTANCES_CONTAINER_OFFSET )

 static struct hwloc_internal_distances_s *
@ -1087,3 +1300,210 @@ hwloc__groups_by_distances(struct hwloc_topology *topology,
 out_with_groupids:
  free(groupids);
 }
+
+static int
+hwloc__distances_transform_remove_null(struct hwloc_distances_s *distances)
+{
+  hwloc_uint64_t *values = distances->values;
+  hwloc_obj_t *objs = distances->objs;
+  unsigned i, nb, nbobjs = distances->nbobjs;
+  hwloc_obj_type_t unique_type;
+
+  for(i=0, nb=0; i<nbobjs; i++)
+    if (objs[i])
+      nb++;
+
+  if (nb < 2) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if (nb == nbobjs)
+    return 0;
+
+  hwloc_internal_distances_restrict(objs, NULL, NULL, values, nbobjs, nbobjs-nb);
+  distances->nbobjs = nb;
+
+  /* update HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES for convenience */
+  unique_type = objs[0]->type;
+  for(i=1; i<nb; i++)
+    if (objs[i]->type != unique_type) {
+      unique_type = HWLOC_OBJ_TYPE_NONE;
+      break;
+    }
+  if (unique_type == HWLOC_OBJ_TYPE_NONE)
+    distances->kind |= HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES;
+  else
+    distances->kind &= ~HWLOC_DISTANCES_KIND_HETEROGENEOUS_TYPES;
+
+  return 0;
+}
+
+static int
+hwloc__distances_transform_links(struct hwloc_distances_s *distances)
+{
+  /* FIXME: we should look for the greatest common denominator
+   * but we just use the smallest positive value, that's enough for current use-cases.
+   * We'll return -1 in other cases.
+   */
+  hwloc_uint64_t divider, *values = distances->values;
+  unsigned i, nbobjs = distances->nbobjs;
+
+  if (!(distances->kind & HWLOC_DISTANCES_KIND_MEANS_BANDWIDTH)) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  for(i=0; i<nbobjs; i++)
+    values[i*nbobjs+i] = 0;
+
+  /* find the smallest positive value */
+  divider = 0;
+  for(i=0; i<nbobjs*nbobjs; i++)
+    if (values[i] && (!divider || values[i] < divider))
+      divider = values[i];
+
+  if (!divider)
+    /* only zeroes? do nothing */
+    return 0;
+
+  /* check it divides all values */
+  for(i=0; i<nbobjs*nbobjs; i++)
+    if (values[i]%divider) {
+      errno = ENOENT;
+      return -1;
+    }
+
+  /* ok, now divide for real */
+  for(i=0; i<nbobjs*nbobjs; i++)
+    values[i] /= divider;
+
+  return 0;
+}
+
+static __hwloc_inline int is_nvswitch(hwloc_obj_t obj)
+{
+  return obj && obj->subtype && !strcmp(obj->subtype, "NVSwitch");
+}
+
+static int
+hwloc__distances_transform_merge_switch_ports(hwloc_topology_t topology,
+                                              struct hwloc_distances_s *distances)
+{
+  struct hwloc_internal_distances_s *dist = hwloc__internal_distances_from_public(topology, distances);
+  hwloc_obj_t *objs = distances->objs;
+  hwloc_uint64_t *values = distances->values;
+  unsigned first, i, j, nbobjs = distances->nbobjs;
+
+  if (strcmp(dist->name, "NVLinkBandwidth")) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  /* find the first port */
+  first = (unsigned) -1;
+  for(i=0; i<nbobjs; i++)
+    if (is_nvswitch(objs[i])) {
+      first = i;
+      break;
+    }
+  if (first == (unsigned)-1) {
+    errno = ENOENT;
+    return -1;
+  }
+
+  for(j=i+1; j<nbobjs; j++) {
+    if (is_nvswitch(objs[j])) {
+      /* another port, merge it */
+      unsigned k;
+      for(k=0; k<nbobjs; k++) {
+        if (k==i || k==j)
+          continue;
+        values[k*nbobjs+i] += values[k*nbobjs+j];
+        values[k*nbobjs+j] = 0;
+        values[i*nbobjs+k] += values[j*nbobjs+k];
+        values[j*nbobjs+k] = 0;
+      }
+      values[i*nbobjs+i] += values[j*nbobjs+j];
+      values[j*nbobjs+j] = 0;
+    }
+    /* the caller will also call REMOVE_NULL to remove other ports */
+    objs[j] = NULL;
+  }
+
+  return 0;
+}
+
+static int
+hwloc__distances_transform_transitive_closure(hwloc_topology_t topology,
+                                              struct hwloc_distances_s *distances)
+{
+  struct hwloc_internal_distances_s *dist = hwloc__internal_distances_from_public(topology, distances);
+  hwloc_obj_t *objs = distances->objs;
+  hwloc_uint64_t *values = distances->values;
+  unsigned nbobjs = distances->nbobjs;
+  unsigned i, j, k;
+
+  if (strcmp(dist->name, "NVLinkBandwidth")) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  for(i=0; i<nbobjs; i++) {
+    hwloc_uint64_t bw_i2sw = 0;
+    if (is_nvswitch(objs[i]))
+      continue;
+    /* count our BW to the switch */
+    for(k=0; k<nbobjs; k++)
+      if (is_nvswitch(objs[k]))
+        bw_i2sw += values[i*nbobjs+k];
+
+    for(j=0; j<nbobjs; j++) {
+      hwloc_uint64_t bw_sw2j = 0;
+      if (i == j || is_nvswitch(objs[j]))
+        continue;
+      /* count our BW from the switch */
+      for(k=0; k<nbobjs; k++)
+        if (is_nvswitch(objs[k]))
+          bw_sw2j += values[k*nbobjs+j];
+
+      /* bandwidth from i to j is now min(i2sw,sw2j) */
+      values[i*nbobjs+j] = bw_i2sw > bw_sw2j ? bw_sw2j : bw_i2sw;
+    }
+  }
+
+  return 0;
+}
+
+int
+hwloc_distances_transform(hwloc_topology_t topology,
+                          struct hwloc_distances_s *distances,
+                          enum hwloc_distances_transform_e transform,
+                          void *transform_attr,
+                          unsigned long flags)
+{
+  if (flags || transform_attr) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  switch (transform) {
+  case HWLOC_DISTANCES_TRANSFORM_REMOVE_NULL:
+    return hwloc__distances_transform_remove_null(distances);
+  case HWLOC_DISTANCES_TRANSFORM_LINKS:
+    return hwloc__distances_transform_links(distances);
+  case HWLOC_DISTANCES_TRANSFORM_MERGE_SWITCH_PORTS:
+  {
+    int err;
+    err = hwloc__distances_transform_merge_switch_ports(topology, distances);
+    if (!err)
+      err = hwloc__distances_transform_remove_null(distances);
+    return err;
+  }
+  case HWLOC_DISTANCES_TRANSFORM_TRANSITIVE_CLOSURE:
+    return hwloc__distances_transform_transitive_closure(topology, distances);
+  default:
+    errno = EINVAL;
+    return -1;
+  }
+}
--- a/src/3rdparty/hwloc/src/memattrs.c
+++ b/src/3rdparty/hwloc/src/memattrs.c
@ -1,11 +1,12 @@
 /*
- * Copyright © 2020 Inria.  All rights reserved.
+ * Copyright © 2020-2022 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

 #include "private/autogen/config.h"
 #include "hwloc.h"
 #include "private/private.h"
+#include "private/debug.h"


 /*****************************
@ -49,36 +50,51 @@ hwloc__setup_memattr(struct hwloc_internal_memattr_s *imattr,
 void
 hwloc_internal_memattrs_prepare(struct hwloc_topology *topology)
 {
-#define NR_DEFAULT_MEMATTRS 4
-  topology->memattrs = malloc(NR_DEFAULT_MEMATTRS * sizeof(*topology->memattrs));
+  topology->memattrs = malloc(HWLOC_MEMATTR_ID_MAX * sizeof(*topology->memattrs));
  if (!topology->memattrs)
    return;

-  assert(HWLOC_MEMATTR_ID_CAPACITY < NR_DEFAULT_MEMATTRS);
  hwloc__setup_memattr(&topology->memattrs[HWLOC_MEMATTR_ID_CAPACITY],
                       (char *) "Capacity",
                       HWLOC_MEMATTR_FLAG_HIGHER_FIRST,
                       HWLOC_IMATTR_FLAG_STATIC_NAME|HWLOC_IMATTR_FLAG_CONVENIENCE);

-  assert(HWLOC_MEMATTR_ID_LOCALITY < NR_DEFAULT_MEMATTRS);
  hwloc__setup_memattr(&topology->memattrs[HWLOC_MEMATTR_ID_LOCALITY],
                       (char *) "Locality",
                       HWLOC_MEMATTR_FLAG_LOWER_FIRST,
                       HWLOC_IMATTR_FLAG_STATIC_NAME|HWLOC_IMATTR_FLAG_CONVENIENCE);

-  assert(HWLOC_MEMATTR_ID_BANDWIDTH < NR_DEFAULT_MEMATTRS);
  hwloc__setup_memattr(&topology->memattrs[HWLOC_MEMATTR_ID_BANDWIDTH],
                       (char *) "Bandwidth",
                       HWLOC_MEMATTR_FLAG_HIGHER_FIRST|HWLOC_MEMATTR_FLAG_NEED_INITIATOR,
                       HWLOC_IMATTR_FLAG_STATIC_NAME);

-  assert(HWLOC_MEMATTR_ID_LATENCY < NR_DEFAULT_MEMATTRS);
+  hwloc__setup_memattr(&topology->memattrs[HWLOC_MEMATTR_ID_READ_BANDWIDTH],
+                       (char *) "ReadBandwidth",
+                       HWLOC_MEMATTR_FLAG_HIGHER_FIRST|HWLOC_MEMATTR_FLAG_NEED_INITIATOR,
+                       HWLOC_IMATTR_FLAG_STATIC_NAME);
+
+  hwloc__setup_memattr(&topology->memattrs[HWLOC_MEMATTR_ID_WRITE_BANDWIDTH],
+                       (char *) "WriteBandwidth",
+                       HWLOC_MEMATTR_FLAG_HIGHER_FIRST|HWLOC_MEMATTR_FLAG_NEED_INITIATOR,
+                       HWLOC_IMATTR_FLAG_STATIC_NAME);
+
  hwloc__setup_memattr(&topology->memattrs[HWLOC_MEMATTR_ID_LATENCY],
                       (char *) "Latency",
                       HWLOC_MEMATTR_FLAG_LOWER_FIRST|HWLOC_MEMATTR_FLAG_NEED_INITIATOR,
                       HWLOC_IMATTR_FLAG_STATIC_NAME);

-  topology->nr_memattrs = NR_DEFAULT_MEMATTRS;
+  hwloc__setup_memattr(&topology->memattrs[HWLOC_MEMATTR_ID_READ_LATENCY],
+                       (char *) "ReadLatency",
+                       HWLOC_MEMATTR_FLAG_LOWER_FIRST|HWLOC_MEMATTR_FLAG_NEED_INITIATOR,
+                       HWLOC_IMATTR_FLAG_STATIC_NAME);
+
+  hwloc__setup_memattr(&topology->memattrs[HWLOC_MEMATTR_ID_WRITE_LATENCY],
+                       (char *) "WriteLatency",
+                       HWLOC_MEMATTR_FLAG_LOWER_FIRST|HWLOC_MEMATTR_FLAG_NEED_INITIATOR,
+                       HWLOC_IMATTR_FLAG_STATIC_NAME);
+
+  topology->nr_memattrs = HWLOC_MEMATTR_ID_MAX;
 }

 static void
@ -127,6 +143,8 @@ hwloc_internal_memattrs_dup(struct hwloc_topology *new, struct hwloc_topology *o
  struct hwloc_internal_memattr_s *imattrs;
  hwloc_memattr_id_t id;

+  /* old->nr_memattrs is always > 0 thanks to default memattrs */
+
  imattrs = hwloc_tma_malloc(tma, old->nr_memattrs * sizeof(*imattrs));
  if (!imattrs)
    return -1;
@ -1195,3 +1213,214 @@ hwloc_get_local_numanode_objs(hwloc_topology_t topology,
  *nrp = i;
  return 0;
 }
+
+
+/**************************************
+ * Using memattrs to identify HBM/DRAM
+ */
+
+struct hwloc_memory_tier_s {
+  hwloc_obj_t node;
+  uint64_t local_bw;
+  enum hwloc_memory_tier_type_e {
+    /* warning the order is important for guess_memory_tiers() after qsort() */
+    HWLOC_MEMORY_TIER_UNKNOWN,
+    HWLOC_MEMORY_TIER_DRAM,
+    HWLOC_MEMORY_TIER_HBM,
+    HWLOC_MEMORY_TIER_SPM, /* Specific-Purpose Memory is usually HBM, we'll use BW to confirm */
+    HWLOC_MEMORY_TIER_NVM,
+    HWLOC_MEMORY_TIER_GPU,
+  } type;
+};
+
+static int compare_tiers(const void *_a, const void *_b)
+{
+  const struct hwloc_memory_tier_s *a = _a, *b = _b;
+  /* sort by type of tier first */
+  if (a->type != b->type)
+    return a->type - b->type;
+  /* then by bandwidth */
+  if (a->local_bw > b->local_bw)
+    return -1;
+  else if (a->local_bw < b->local_bw)
+    return 1;
+  return 0;
+}
+
+int
+hwloc_internal_memattrs_guess_memory_tiers(hwloc_topology_t topology)
+{
+  struct hwloc_internal_memattr_s *imattr;
+  struct hwloc_memory_tier_s *tiers;
+  unsigned i, j, n;
+  const char *env;
+  int spm_is_hbm = -1; /* -1 will guess from BW, 0 no, 1 forced */
+  int mark_dram = 1;
+  unsigned first_spm, first_nvm;
+  hwloc_uint64_t max_unknown_bw, min_spm_bw;
+
+  env = getenv("HWLOC_MEMTIERS_GUESS");
+  if (env) {
+    if (!strcmp(env, "none")) {
+      return 0;
+    } else if (!strcmp(env, "default")) {
+      /* nothing */
+    } else if (!strcmp(env, "spm_is_hbm")) {
+      hwloc_debug("Assuming SPM-tier is HBM, ignore bandwidth\n");
+      spm_is_hbm = 1;
+    } else if (HWLOC_SHOW_CRITICAL_ERRORS()) {
+      fprintf(stderr, "hwloc: Failed to recognize HWLOC_MEMTIERS_GUESS value %s\n", env);
+    }
+  }
+
+  imattr = &topology->memattrs[HWLOC_MEMATTR_ID_BANDWIDTH];
+
+  if (!(imattr->iflags & HWLOC_IMATTR_FLAG_CACHE_VALID))
+    hwloc__imattr_refresh(topology, imattr);
+
+  n = hwloc_get_nbobjs_by_depth(topology, HWLOC_TYPE_DEPTH_NUMANODE);
+  assert(n);
+
+  tiers = malloc(n * sizeof(*tiers));
+  if (!tiers)
+    return -1;
+
+  for(i=0; i<n; i++) {
+    hwloc_obj_t node;
+    const char *daxtype;
+    struct hwloc_internal_location_s iloc;
+    struct hwloc_internal_memattr_target_s *imtg = NULL;
+    struct hwloc_internal_memattr_initiator_s *imi;
+
+    node = hwloc_get_obj_by_depth(topology, HWLOC_TYPE_DEPTH_NUMANODE, i);
+    assert(node);
+    tiers[i].node = node;
+
+    /* defaults */
+    tiers[i].type = HWLOC_MEMORY_TIER_UNKNOWN;
+    tiers[i].local_bw = 0; /* unknown */
+
+    daxtype = hwloc_obj_get_info_by_name(node, "DAXType");
+    /* mark NVM, SPM and GPU nodes */
+    if (daxtype && !strcmp(daxtype, "NVM"))
+      tiers[i].type = HWLOC_MEMORY_TIER_NVM;
+    if (daxtype && !strcmp(daxtype, "SPM"))
+      tiers[i].type = HWLOC_MEMORY_TIER_SPM;
+    if (node->subtype && !strcmp(node->subtype, "GPUMemory"))
+      tiers[i].type = HWLOC_MEMORY_TIER_GPU;
+
+    if (spm_is_hbm == -1) {
+      for(j=0; j<imattr->nr_targets; j++)
+        if (imattr->targets[j].obj == node) {
+          imtg = &imattr->targets[j];
+          break;
+        }
+      if (imtg && !hwloc_bitmap_iszero(node->cpuset)) {
+        iloc.type = HWLOC_LOCATION_TYPE_CPUSET;
+        iloc.location.cpuset = node->cpuset;
+        imi = hwloc__memattr_target_get_initiator(imtg, &iloc, 0);
+        if (imi)
+          tiers[i].local_bw = imi->value;
+      }
+    }
+  }
+
+  /* sort tiers */
+  qsort(tiers, n, sizeof(*tiers), compare_tiers);
+  hwloc_debug("Sorting memory tiers...\n");
+  for(i=0; i<n; i++)
+    hwloc_debug("  tier %u = node L#%u P#%u with tier type %d and local BW #%llu\n",
+                i,
+                tiers[i].node->logical_index, tiers[i].node->os_index,
+                tiers[i].type, (unsigned long long) tiers[i].local_bw);
+
+  /* now we have UNKNOWN tiers (sorted by BW), then SPM tiers (sorted by BW), then NVM, then GPU */
+
+  /* iterate over UNKNOWN tiers, and find their BW */
+  for(i=0; i<n; i++) {
+    if (tiers[i].type > HWLOC_MEMORY_TIER_UNKNOWN)
+      break;
+  }
+  first_spm = i;
+  /* get max BW from first */
+  if (first_spm > 0)
+    max_unknown_bw = tiers[0].local_bw;
+  else
+    max_unknown_bw = 0;
+
+  /* there are no DRAM or HBM tiers yet */
+
+  /* iterate over SPM tiers, and find their BW */
+  for(i=first_spm; i<n; i++) {
+    if (tiers[i].type > HWLOC_MEMORY_TIER_SPM)
+      break;
+  }
+  first_nvm = i;
+  /* get min BW from last */
+  if (first_nvm > first_spm)
+    min_spm_bw = tiers[first_nvm-1].local_bw;
+  else
+    min_spm_bw = 0;
+
+  /* FIXME: if there's more than 10% between some sets of nodes inside a tier, split it? */
+  /* FIXME: if there are cpuset-intersecting nodes in same tier, abort? */
+
+  if (spm_is_hbm == -1) {
+    /* if we have BW for all SPM and UNKNOWN
+     * and all SPM BW are 2x superior to all UNKNOWN BW
+     */
+    hwloc_debug("UNKNOWN-memory-tier max bandwidth %llu\n", (unsigned long long) max_unknown_bw);
+    hwloc_debug("SPM-memory-tier min bandwidth %llu\n", (unsigned long long) min_spm_bw);
+    if (max_unknown_bw > 0 && min_spm_bw > 0 && max_unknown_bw*2 < min_spm_bw) {
+      hwloc_debug("assuming SPM means HBM and !SPM means DRAM since bandwidths are very different\n");
+      spm_is_hbm = 1;
+    } else {
+      hwloc_debug("cannot assume SPM means HBM\n");
+      spm_is_hbm = 0;
+    }
+  }
+
+  if (spm_is_hbm) {
+    for(i=0; i<first_spm; i++)
+      tiers[i].type = HWLOC_MEMORY_TIER_DRAM;
+    for(i=first_spm; i<first_nvm; i++)
+      tiers[i].type = HWLOC_MEMORY_TIER_HBM;
+  }
+
+  if (first_spm == n)
+    mark_dram = 0;
+
+    /* now apply subtypes */
+  for(i=0; i<n; i++) {
+    const char *type = NULL;
+    if (tiers[i].node->subtype) /* don't overwrite the existing subtype */
+      continue;
+    switch (tiers[i].type) {
+    case HWLOC_MEMORY_TIER_DRAM:
+      if (mark_dram)
+        type = "DRAM";
+      break;
+    case HWLOC_MEMORY_TIER_HBM:
+      type = "HBM";
+      break;
+    case HWLOC_MEMORY_TIER_SPM:
+      type = "SPM";
+      break;
+    case HWLOC_MEMORY_TIER_NVM:
+      type = "NVM";
+      break;
+    default:
+      /* GPU memory is already marked with subtype="GPUMemory",
+       * UNKNOWN doesn't deserve any subtype
+       */
+      break;
+    }
+    if (type) {
+      hwloc_debug("Marking node L#%u P#%u as %s\n", tiers[i].node->logical_index, tiers[i].node->os_index, type);
+      tiers[i].node->subtype = strdup(type);
+    }
+  }
+
+  free(tiers);
+  return 0;
+}
--- a/src/3rdparty/hwloc/src/pci-common.c
+++ b/src/3rdparty/hwloc/src/pci-common.c
@ -1,5 +1,5 @@
 /*
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * See COPYING in top-level directory.
 */

@ -119,6 +119,13 @@ hwloc_pci_discovery_init(struct hwloc_topology *topology)
  topology->pci_forced_locality = NULL;

  topology->first_pci_locality = topology->last_pci_locality = NULL;
+
+#define HWLOC_PCI_LOCALITY_QUIRK_CRAY_EX235A (1ULL<<0)
+#define HWLOC_PCI_LOCALITY_QUIRK_FAKE (1ULL<<62)
+  topology->pci_locality_quirks = (uint64_t) -1;
+  /* -1 is unknown, 0 is disabled, >0 is bitmask of enabled quirks.
+   * bit 63 should remain unused so that -1 is unaccessible as a bitmask.
+   */
 }

 void
@ -146,8 +153,9 @@ hwloc_pci_discovery_prepare(struct hwloc_topology *topology)
 	  }
 	  free(buffer);
 	} else {
-	  fprintf(stderr, "Ignoring HWLOC_PCI_LOCALITY file `%s' too large (%lu bytes)\n",
-		  env, (unsigned long) st.st_size);
+          if (HWLOC_SHOW_CRITICAL_ERRORS())
+            fprintf(stderr, "hwloc/pci: Ignoring HWLOC_PCI_LOCALITY file `%s' too large (%lu bytes)\n",
+                    env, (unsigned long) st.st_size);
 	}
      }
      close(fd);
@ -206,8 +214,11 @@ hwloc_pci_traverse_print_cb(void * cbdata __hwloc_attribute_unused,
    else
      hwloc_debug("%s Bridge [%04x:%04x]", busid,
 		  pcidev->attr->pcidev.vendor_id, pcidev->attr->pcidev.device_id);
-    hwloc_debug(" to %04x:[%02x:%02x]\n",
-		pcidev->attr->bridge.downstream.pci.domain, pcidev->attr->bridge.downstream.pci.secondary_bus, pcidev->attr->bridge.downstream.pci.subordinate_bus);
+    if (pcidev->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI)
+      hwloc_debug(" to %04x:[%02x:%02x]\n",
+                  pcidev->attr->bridge.downstream.pci.domain, pcidev->attr->bridge.downstream.pci.secondary_bus, pcidev->attr->bridge.downstream.pci.subordinate_bus);
+    else
+      assert(0);
  } else
    hwloc_debug("%s Device [%04x:%04x (%04x:%04x) rev=%02x class=%04x]\n", busid,
 		pcidev->attr->pcidev.vendor_id, pcidev->attr->pcidev.device_id,
@ -251,11 +262,11 @@ hwloc_pci_compare_busids(struct hwloc_obj *a, struct hwloc_obj *b)
  if (a->attr->pcidev.domain > b->attr->pcidev.domain)
    return HWLOC_PCI_BUSID_HIGHER;

-  if (a->type == HWLOC_OBJ_BRIDGE
+  if (a->type == HWLOC_OBJ_BRIDGE && a->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI
      && b->attr->pcidev.bus >= a->attr->bridge.downstream.pci.secondary_bus
      && b->attr->pcidev.bus <= a->attr->bridge.downstream.pci.subordinate_bus)
    return HWLOC_PCI_BUSID_SUPERSET;
-  if (b->type == HWLOC_OBJ_BRIDGE
+  if (b->type == HWLOC_OBJ_BRIDGE && b->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI
      && a->attr->pcidev.bus >= b->attr->bridge.downstream.pci.secondary_bus
      && a->attr->pcidev.bus <= b->attr->bridge.downstream.pci.subordinate_bus)
    return HWLOC_PCI_BUSID_INCLUDED;
@ -302,7 +313,7 @@ hwloc_pci_add_object(struct hwloc_obj *parent, struct hwloc_obj **parent_io_firs
      new->next_sibling = *curp;
      *curp = new;
      new->parent = parent;
-      if (new->type == HWLOC_OBJ_BRIDGE) {
+      if (new->type == HWLOC_OBJ_BRIDGE && new->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI) {
 	/* look at remaining siblings and move some below new */
 	childp = &new->io_first_child;
 	curp = &new->next_sibling;
@ -329,7 +340,7 @@ hwloc_pci_add_object(struct hwloc_obj *parent, struct hwloc_obj **parent_io_firs
    }
    case HWLOC_PCI_BUSID_EQUAL: {
      static int reported = 0;
-      if (!reported && !hwloc_hide_errors()) {
+      if (!reported && HWLOC_SHOW_CRITICAL_ERRORS()) {
        fprintf(stderr, "*********************************************************\n");
        fprintf(stderr, "* hwloc %s received invalid PCI information.\n", HWLOC_VERSION);
        fprintf(stderr, "*\n");
@ -411,7 +422,7 @@ hwloc_pcidisc_add_hostbridges(struct hwloc_topology *topology,
    dstnextp = &child->next_sibling;

    /* compute hostbridge secondary/subordinate buses */
-    if (child->type == HWLOC_OBJ_BRIDGE
+    if (child->type == HWLOC_OBJ_BRIDGE && child->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI
 	&& child->attr->bridge.downstream.pci.subordinate_bus > current_subordinate)
      current_subordinate = child->attr->bridge.downstream.pci.subordinate_bus;

@ -438,13 +449,90 @@ hwloc_pcidisc_add_hostbridges(struct hwloc_topology *topology,
  return new;
 }

-static struct hwloc_obj *
-hwloc_pci_fixup_busid_parent(struct hwloc_topology *topology __hwloc_attribute_unused,
-			     struct hwloc_pcidev_attr_s *busid __hwloc_attribute_unused,
-			     struct hwloc_obj *parent __hwloc_attribute_unused)
+/* return 1 if a quirk was applied */
+static int
+hwloc__pci_find_busid_parent_quirk(struct hwloc_topology *topology,
+                                   struct hwloc_pcidev_attr_s *busid,
+                                   hwloc_cpuset_t cpuset)
 {
-  /* no quirk for now */
-  return parent;
+  if (topology->pci_locality_quirks == (uint64_t)-1 /* unknown */) {
+    const char *dmi_board_name, *env;
+
+    /* first invokation, detect which quirks are needed */
+    topology->pci_locality_quirks = 0; /* no quirk yet */
+
+    dmi_board_name = hwloc_obj_get_info_by_name(hwloc_get_root_obj(topology), "DMIBoardName");
+    if (dmi_board_name && !strcmp(dmi_board_name, "HPE CRAY EX235A")) {
+      hwloc_debug("enabling for PCI locality quirk for HPE Cray EX235A\n");
+      topology->pci_locality_quirks |= HWLOC_PCI_LOCALITY_QUIRK_CRAY_EX235A;
+    }
+
+    env = getenv("HWLOC_PCI_LOCALITY_QUIRK_FAKE");
+    if (env && atoi(env)) {
+      hwloc_debug("enabling for PCI locality fake quirk (attaching everything to last PU)\n");
+      topology->pci_locality_quirks |= HWLOC_PCI_LOCALITY_QUIRK_FAKE;
+    }
+  }
+
+  if (topology->pci_locality_quirks & HWLOC_PCI_LOCALITY_QUIRK_FAKE) {
+    unsigned last = hwloc_bitmap_last(hwloc_topology_get_topology_cpuset(topology));
+    hwloc_bitmap_set(cpuset, last);
+    return 1;
+  }
+
+  if (topology->pci_locality_quirks & HWLOC_PCI_LOCALITY_QUIRK_CRAY_EX235A) {
+    /* AMD Trento has xGMI ports connected to individual CCDs (8 cores + L3)
+     * instead of NUMA nodes (pairs of CCDs within Trento) as is usual in AMD EPYC CPUs.
+     * This is not described by the ACPI tables, hence we need to manually hardwire
+     * the xGMI locality for the (currently single) server that currently uses that CPU.
+     * It's not clear if ACPI tables can/will ever be fixed (would require one initiator
+     * proximity domain per CCD), or if Linux can/will work around the issue.
+     */
+    if (busid->domain == 0) {
+      if (busid->bus >= 0xd0 && busid->bus <= 0xd1) {
+        hwloc_bitmap_set_range(cpuset, 0, 7);
+        hwloc_bitmap_set_range(cpuset, 64, 71);
+        return 1;
+      }
+      if (busid->bus >= 0xd4 && busid->bus <= 0xd6) {
+        hwloc_bitmap_set_range(cpuset, 8, 15);
+        hwloc_bitmap_set_range(cpuset, 72, 79);
+        return 1;
+      }
+      if (busid->bus >= 0xc8 && busid->bus <= 0xc9) {
+        hwloc_bitmap_set_range(cpuset, 16, 23);
+        hwloc_bitmap_set_range(cpuset, 80, 87);
+        return 1;
+      }
+      if (busid->bus >= 0xcc && busid->bus <= 0xce) {
+        hwloc_bitmap_set_range(cpuset, 24, 31);
+        hwloc_bitmap_set_range(cpuset, 88, 95);
+        return 1;
+      }
+      if (busid->bus >= 0xd8 && busid->bus <= 0xd9) {
+        hwloc_bitmap_set_range(cpuset, 32, 39);
+        hwloc_bitmap_set_range(cpuset, 96, 103);
+        return 1;
+      }
+      if (busid->bus >= 0xdc && busid->bus <= 0xde) {
+        hwloc_bitmap_set_range(cpuset, 40, 47);
+        hwloc_bitmap_set_range(cpuset, 104, 111);
+        return 1;
+      }
+      if (busid->bus >= 0xc0 && busid->bus <= 0xc1) {
+        hwloc_bitmap_set_range(cpuset, 48, 55);
+        hwloc_bitmap_set_range(cpuset, 112, 119);
+        return 1;
+      }
+      if (busid->bus >= 0xc4 && busid->bus <= 0xc6) {
+        hwloc_bitmap_set_range(cpuset, 56, 63);
+        hwloc_bitmap_set_range(cpuset, 120, 127);
+        return 1;
+      }
+    }
+  }
+
+  return 0;
 }

 static struct hwloc_obj *
@ -453,7 +541,7 @@ hwloc__pci_find_busid_parent(struct hwloc_topology *topology, struct hwloc_pcide
  hwloc_bitmap_t cpuset = hwloc_bitmap_alloc();
  hwloc_obj_t parent;
  int forced = 0;
-  int noquirks = 0;
+  int noquirks = 0, got_quirked = 0;
  unsigned i;
  int err;

@ -486,7 +574,8 @@ hwloc__pci_find_busid_parent(struct hwloc_topology *topology, struct hwloc_pcide
    if (env) {
      static int reported = 0;
      if (!topology->pci_has_forced_locality && !reported) {
-	fprintf(stderr, "Environment variable %s is deprecated, please use HWLOC_PCI_LOCALITY instead.\n", env);
+        if (HWLOC_SHOW_ALL_ERRORS())
+          fprintf(stderr, "hwloc/pci: Environment variable %s is deprecated, please use HWLOC_PCI_LOCALITY instead.\n", env);
 	reported = 1;
      }
      if (*env) {
@ -500,7 +589,13 @@ hwloc__pci_find_busid_parent(struct hwloc_topology *topology, struct hwloc_pcide
    }
  }

-  if (!forced) {
+  if (!forced && !noquirks && topology->pci_locality_quirks /* either quirks are unknown yet, or some are enabled */) {
+    err = hwloc__pci_find_busid_parent_quirk(topology, busid, cpuset);
+    if (err > 0)
+      got_quirked = 1;
+  }
+
+  if (!forced && !got_quirked) {
    /* get the cpuset by asking the backend that provides the relevant hook, if any. */
    struct hwloc_backend *backend = topology->get_pci_busid_cpuset_backend;
    if (backend)
@ -515,11 +610,7 @@ hwloc__pci_find_busid_parent(struct hwloc_topology *topology, struct hwloc_pcide
  hwloc_debug_bitmap("  will attach PCI bus to cpuset %s\n", cpuset);

  parent = hwloc_find_insert_io_parent_by_complete_cpuset(topology, cpuset);
-  if (parent) {
-    if (!noquirks)
-      /* We found a valid parent. Check that the OS didn't report invalid locality */
-      parent = hwloc_pci_fixup_busid_parent(topology, busid, parent);
-  } else {
+  if (!parent) {
    /* Fallback to root */
    parent = hwloc_get_root_obj(topology);
  }
@ -565,7 +656,7 @@ hwloc_pcidisc_tree_attach(struct hwloc_topology *topology, struct hwloc_obj *tre
    assert(pciobj->type == HWLOC_OBJ_PCI_DEVICE
 	   || (pciobj->type == HWLOC_OBJ_BRIDGE && pciobj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI));

-    if (obj->type == HWLOC_OBJ_BRIDGE) {
+    if (obj->type == HWLOC_OBJ_BRIDGE && obj->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI) {
      domain = obj->attr->bridge.downstream.pci.domain;
      bus_min = obj->attr->bridge.downstream.pci.secondary_bus;
      bus_max = obj->attr->bridge.downstream.pci.subordinate_bus;
@ -800,18 +891,28 @@ hwloc_pcidisc_find_linkspeed(const unsigned char *config,
  memcpy(&linksta, &config[offset + HWLOC_PCI_EXP_LNKSTA], 4);
  speed = linksta & HWLOC_PCI_EXP_LNKSTA_SPEED; /* PCIe generation */
  width = (linksta & HWLOC_PCI_EXP_LNKSTA_WIDTH) >> 4; /* how many lanes */
-  /* PCIe Gen1 = 2.5GT/s signal-rate per lane with 8/10 encoding    = 0.25GB/s data-rate per lane
-   * PCIe Gen2 = 5  GT/s signal-rate per lane with 8/10 encoding    = 0.5 GB/s data-rate per lane
-   * PCIe Gen3 = 8  GT/s signal-rate per lane with 128/130 encoding = 1   GB/s data-rate per lane
-   * PCIe Gen4 = 16 GT/s signal-rate per lane with 128/130 encoding = 2   GB/s data-rate per lane
-   * PCIe Gen5 = 32 GT/s signal-rate per lane with 128/130 encoding = 4   GB/s data-rate per lane
+  /*
+   * These are single-direction bandwidths only.
+   *
+   * Gen1 used NRZ with 8/10 encoding.
+   * PCIe Gen1 = 2.5GT/s signal-rate per lane x 8/10    =  0.25GB/s data-rate per lane
+   * PCIe Gen2 = 5  GT/s signal-rate per lane x 8/10    =  0.5 GB/s data-rate per lane
+   * Gen3 switched to NRZ with 128/130 encoding.
+   * PCIe Gen3 = 8  GT/s signal-rate per lane x 128/130 =  1   GB/s data-rate per lane
+   * PCIe Gen4 = 16 GT/s signal-rate per lane x 128/130 =  2   GB/s data-rate per lane
+   * PCIe Gen5 = 32 GT/s signal-rate per lane x 128/130 =  4   GB/s data-rate per lane
+   * Gen6 switched to PAM with with 242/256 FLIT (242B payload protected by 8B CRC + 6B FEC).
+   * PCIe Gen6 = 64 GT/s signal-rate per lane x 242/256 =  8   GB/s data-rate per lane
+   * PCIe Gen7 = 128GT/s signal-rate per lane x 242/256 = 16   GB/s data-rate per lane
   */

  /* lanespeed in Gbit/s */
  if (speed <= 2)
    lanespeed = 2.5f * speed * 0.8f;
+  else if (speed <= 5)
+    lanespeed = 8.0f * (1<<(speed-3)) * 128/130;
  else
-    lanespeed = 8.0f * (1<<(speed-3)) * 128/130; /* assume Gen6 will be 64 GT/s and so on */
+    lanespeed = 8.0f * (1<<(speed-3)) * 242/256; /* assume Gen8 will be 256 GT/s and so on */

  /* linkspeed in GB/s */
  *linkspeed = lanespeed * width / 8;
@ -938,6 +1039,7 @@ hwloc_pci_class_string(unsigned short class_id)
      switch (class_id) {
 	case 0x0500: return "RAM";
 	case 0x0501: return "Flash";
+        case 0x0502: return "CXLMem";
      }
      return "Memory";
    case 0x06:
--- a/src/3rdparty/hwloc/src/topology-synthetic.c
+++ b/src/3rdparty/hwloc/src/topology-synthetic.c
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2009-2010 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -323,17 +323,29 @@ hwloc_synthetic_parse_memory_attr(const char *attr, const char **endp)
  hwloc_uint64_t size;
  size = strtoull(attr, (char **) &endptr, 0);
  if (!hwloc_strncasecmp(endptr, "TB", 2)) {
+    size *= 1000ULL*1000ULL*1000ULL*1000ULL;
+    endptr += 2;
+  } else if (!hwloc_strncasecmp(endptr, "TiB", 3)) {
    size <<= 40;
-    endptr += 2;
+    endptr += 3;
  } else if (!hwloc_strncasecmp(endptr, "GB", 2)) {
+    size *= 1000ULL*1000ULL*1000ULL;
+    endptr += 2;
+  } else if (!hwloc_strncasecmp(endptr, "GiB", 3)) {
    size <<= 30;
-    endptr += 2;
+    endptr += 3;
  } else if (!hwloc_strncasecmp(endptr, "MB", 2)) {
+    size *= 1000ULL*1000ULL;
+    endptr += 2;
+  } else if (!hwloc_strncasecmp(endptr, "MiB", 3)) {
    size <<= 20;
-    endptr += 2;
+    endptr += 3;
  } else if (!hwloc_strncasecmp(endptr, "kB", 2)) {
-    size <<= 10;
+    size *= 1000ULL;
    endptr += 2;
+  } else if (!hwloc_strncasecmp(endptr, "kiB", 3)) {
+    size <<= 10;
+    endptr += 3;
  }
  *endp = endptr;
  return size;
@ -802,15 +814,15 @@ hwloc_backend_synthetic_init(struct hwloc_synthetic_backend_data_s *data,
    } else if (hwloc__obj_type_is_cache(type)) {
      if (!curlevel->attr.memorysize) {
 	if (1 == curlevel->attr.depth)
-	  /* 32Kb in L1 */
+	  /* 32KiB in L1 */
 	  curlevel->attr.memorysize = 32*1024;
 	else
-	  /* *4 at each level, starting from 1MB for L2, unified */
+	  /* *4 at each level, starting from 1MiB for L2, unified */
 	  curlevel->attr.memorysize = 256ULL*1024 << (2*curlevel->attr.depth);
      }

    } else if (type == HWLOC_OBJ_NUMANODE && !curlevel->attr.memorysize) {
-      /* 1GB in memory nodes. */
+      /* 1GiB in memory nodes. */
      curlevel->attr.memorysize = 1024*1024*1024;
    }

--- a/src/3rdparty/hwloc/src/topology-windows.c
+++ b/src/3rdparty/hwloc/src/topology-windows.c
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2009-2012, 2020 Université Bordeaux
 * Copyright © 2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -11,7 +11,9 @@

 #include "private/autogen/config.h"
 #include "hwloc.h"
+#include "hwloc/windows.h"
 #include "private/private.h"
+#include "private/windows.h" /* must be before windows.h */
 #include "private/debug.h"

 #include <windows.h>
@ -64,26 +66,6 @@ typedef enum _LOGICAL_PROCESSOR_RELATIONSHIP {
 #  endif /* HAVE_RELATIONPROCESSORPACKAGE */
 #endif /* HAVE_LOGICAL_PROCESSOR_RELATIONSHIP */

-#ifndef HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION
-typedef struct _SYSTEM_LOGICAL_PROCESSOR_INFORMATION {
-  ULONG_PTR ProcessorMask;
-  LOGICAL_PROCESSOR_RELATIONSHIP Relationship;
-  _ANONYMOUS_UNION
-  union {
-    struct {
-      BYTE flags;
-    } ProcessorCore;
-    struct {
-      DWORD NodeNumber;
-    } NumaNode;
-    CACHE_DESCRIPTOR Cache;
-    ULONGLONG Reserved[2];
-  } DUMMYUNIONNAME;
-} SYSTEM_LOGICAL_PROCESSOR_INFORMATION, *PSYSTEM_LOGICAL_PROCESSOR_INFORMATION;
-#endif
-
-/* Extended interface, for group support */
-
 #ifndef HAVE_GROUP_AFFINITY
 typedef struct _GROUP_AFFINITY {
  KAFFINITY Mask;
@ -92,35 +74,40 @@ typedef struct _GROUP_AFFINITY {
 } GROUP_AFFINITY, *PGROUP_AFFINITY;
 #endif

-#ifndef HAVE_PROCESSOR_RELATIONSHIP
+/* always use our own structure because the EfficiencyClass field didn't exist before Win10 */
 typedef struct HWLOC_PROCESSOR_RELATIONSHIP {
  BYTE Flags;
-  BYTE EfficiencyClass; /* for RelationProcessorCore, higher means greater performance but less efficiency, only available in Win10+ */
+  BYTE EfficiencyClass; /* for RelationProcessorCore, higher means greater performance but less efficiency */
  BYTE Reserved[20];
  WORD GroupCount;
  GROUP_AFFINITY GroupMask[ANYSIZE_ARRAY];
-} PROCESSOR_RELATIONSHIP, *PPROCESSOR_RELATIONSHIP;
-#endif
+} HWLOC_PROCESSOR_RELATIONSHIP;

-#ifndef HAVE_NUMA_NODE_RELATIONSHIP
-typedef struct _NUMA_NODE_RELATIONSHIP {
+/* always use our own structure because the GroupCount and GroupMasks fields didn't exist in some Win10 */
+typedef struct HWLOC_NUMA_NODE_RELATIONSHIP {
  DWORD NodeNumber;
-  BYTE Reserved[20];
-  GROUP_AFFINITY GroupMask;
-} NUMA_NODE_RELATIONSHIP, *PNUMA_NODE_RELATIONSHIP;
-#endif
+  BYTE Reserved[18];
+  WORD GroupCount;
+  _ANONYMOUS_UNION
+  union {
+    GROUP_AFFINITY GroupMask;
+    GROUP_AFFINITY GroupMasks[ANYSIZE_ARRAY];
+  } DUMMYUNIONNAME;
+} HWLOC_NUMA_NODE_RELATIONSHIP;

-#ifndef HAVE_CACHE_RELATIONSHIP
-typedef struct _CACHE_RELATIONSHIP {
+typedef struct HWLOC_CACHE_RELATIONSHIP {
  BYTE Level;
  BYTE Associativity;
  WORD LineSize;
  DWORD CacheSize;
  PROCESSOR_CACHE_TYPE Type;
-  BYTE Reserved[20];
-  GROUP_AFFINITY GroupMask;
-} CACHE_RELATIONSHIP, *PCACHE_RELATIONSHIP;
-#endif
+  BYTE Reserved[18];
+  WORD GroupCount;
+  union {
+    GROUP_AFFINITY GroupMask;
+    GROUP_AFFINITY GroupMasks[ANYSIZE_ARRAY];
+  } DUMMYUNIONNAME;
+} HWLOC_CACHE_RELATIONSHIP;

 #ifndef HAVE_PROCESSOR_GROUP_INFO
 typedef struct _PROCESSOR_GROUP_INFO {
@ -140,20 +127,19 @@ typedef struct _GROUP_RELATIONSHIP {
 } GROUP_RELATIONSHIP, *PGROUP_RELATIONSHIP;
 #endif

-#ifndef HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX
-typedef struct _SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX {
+/* always use our own structure because we need our own HWLOC_PROCESSOR/CACHE/NUMA_NODE_RELATIONSHIP */
+typedef struct HWLOC_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX {
  LOGICAL_PROCESSOR_RELATIONSHIP Relationship;
  DWORD Size;
  _ANONYMOUS_UNION
  union {
-    PROCESSOR_RELATIONSHIP Processor;
-    NUMA_NODE_RELATIONSHIP NumaNode;
-    CACHE_RELATIONSHIP Cache;
+    HWLOC_PROCESSOR_RELATIONSHIP Processor;
+    HWLOC_NUMA_NODE_RELATIONSHIP NumaNode;
+    HWLOC_CACHE_RELATIONSHIP Cache;
    GROUP_RELATIONSHIP Group;
    /* Odd: no member to tell the cpu mask of the package... */
  } DUMMYUNIONNAME;
-} SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX, *PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX;
-#endif
+} HWLOC_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX;

 #ifndef HAVE_PSAPI_WORKING_SET_EX_BLOCK
 typedef union _PSAPI_WORKING_SET_EX_BLOCK {
@ -190,9 +176,6 @@ typedef struct _PROCESSOR_NUMBER {
 typedef WORD (WINAPI *PFN_GETACTIVEPROCESSORGROUPCOUNT)(void);
 static PFN_GETACTIVEPROCESSORGROUPCOUNT GetActiveProcessorGroupCountProc;

-static unsigned long nr_processor_groups = 1;
-static unsigned long max_numanode_index = 0;
-
 typedef WORD (WINAPI *PFN_GETACTIVEPROCESSORCOUNT)(WORD);
 static PFN_GETACTIVEPROCESSORCOUNT GetActiveProcessorCountProc;

@ -202,10 +185,7 @@ static PFN_GETCURRENTPROCESSORNUMBER GetCurrentProcessorNumberProc;
 typedef VOID (WINAPI *PFN_GETCURRENTPROCESSORNUMBEREX)(PPROCESSOR_NUMBER);
 static PFN_GETCURRENTPROCESSORNUMBEREX GetCurrentProcessorNumberExProc;

-typedef BOOL (WINAPI *PFN_GETLOGICALPROCESSORINFORMATION)(PSYSTEM_LOGICAL_PROCESSOR_INFORMATION Buffer, PDWORD ReturnLength);
-static PFN_GETLOGICALPROCESSORINFORMATION GetLogicalProcessorInformationProc;
-
-typedef BOOL (WINAPI *PFN_GETLOGICALPROCESSORINFORMATIONEX)(LOGICAL_PROCESSOR_RELATIONSHIP relationship, PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX Buffer, PDWORD ReturnLength);
+typedef BOOL (WINAPI *PFN_GETLOGICALPROCESSORINFORMATIONEX)(LOGICAL_PROCESSOR_RELATIONSHIP relationship, HWLOC_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *Buffer, PDWORD ReturnLength);
 static PFN_GETLOGICALPROCESSORINFORMATIONEX GetLogicalProcessorInformationExProc;

 typedef BOOL (WINAPI *PFN_SETTHREADGROUPAFFINITY)(HANDLE hThread, const GROUP_AFFINITY *GroupAffinity, PGROUP_AFFINITY PreviousGroupAffinity);
@ -246,8 +226,6 @@ static void hwloc_win_get_function_ptrs(void)
 	(PFN_GETACTIVEPROCESSORGROUPCOUNT) GetProcAddress(kernel32, "GetActiveProcessorGroupCount");
      GetActiveProcessorCountProc =
 	(PFN_GETACTIVEPROCESSORCOUNT) GetProcAddress(kernel32, "GetActiveProcessorCount");
-      GetLogicalProcessorInformationProc =
-	(PFN_GETLOGICALPROCESSORINFORMATION) GetProcAddress(kernel32, "GetLogicalProcessorInformation");
      GetCurrentProcessorNumberProc =
 	(PFN_GETCURRENTPROCESSORNUMBER) GetProcAddress(kernel32, "GetCurrentProcessorNumber");
      GetCurrentProcessorNumberExProc =
@ -270,9 +248,6 @@ static void hwloc_win_get_function_ptrs(void)
 	(PFN_VIRTUALFREEEX) GetProcAddress(kernel32, "VirtualFreeEx");
    }

-    if (GetActiveProcessorGroupCountProc)
-      nr_processor_groups = GetActiveProcessorGroupCountProc();
-
    if (!QueryWorkingSetExProc) {
      HMODULE psapi = LoadLibrary("psapi.dll");
      if (psapi)
@ -363,6 +338,173 @@ static int hwloc_bitmap_to_single_ULONG_PTR(hwloc_const_bitmap_t set, unsigned *
  return 0;
 }

+/**********************
+ * Processor Groups
+ */
+
+static unsigned long max_numanode_index = 0;
+
+static unsigned long nr_processor_groups = 1;
+static hwloc_cpuset_t * processor_group_cpusets = NULL;
+
+static void
+hwloc_win_get_processor_groups(void)
+{
+  HWLOC_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *procInfoTotal, *tmpprocInfoTotal, *procInfo;
+  DWORD length;
+  unsigned i;
+
+  hwloc_debug("querying windows processor groups\n");
+
+  if (!GetLogicalProcessorInformationExProc)
+    goto error;
+
+  nr_processor_groups = GetActiveProcessorGroupCountProc();
+  if (!nr_processor_groups)
+    goto error;
+
+  hwloc_debug("found %lu windows processor groups\n", nr_processor_groups);
+
+  if (nr_processor_groups > 1 && SIZEOF_VOID_P == 4) {
+    if (HWLOC_SHOW_ALL_ERRORS())
+      fprintf(stderr, "hwloc: multiple processor groups found on 32bits Windows, topology may be invalid/incomplete.\n");
+  }
+
+  length = 0;
+  procInfoTotal = NULL;
+
+  while (1) {
+    if (GetLogicalProcessorInformationExProc(RelationGroup, procInfoTotal, &length))
+      break;
+    if (GetLastError() != ERROR_INSUFFICIENT_BUFFER)
+      goto error;
+    tmpprocInfoTotal = realloc(procInfoTotal, length);
+    if (!tmpprocInfoTotal)
+      goto error_with_procinfo;
+    procInfoTotal = tmpprocInfoTotal;
+  }
+
+  processor_group_cpusets = calloc(nr_processor_groups, sizeof(*processor_group_cpusets));
+  if (!processor_group_cpusets)
+    goto error_with_procinfo;
+
+  for (procInfo = procInfoTotal;
+       (void*) procInfo < (void*) ((uintptr_t) procInfoTotal + length);
+       procInfo = (void*) ((uintptr_t) procInfo + procInfo->Size)) {
+    unsigned id;
+
+    assert(procInfo->Relationship == RelationGroup);
+
+    hwloc_debug("Found %u active windows processor groups\n",
+                (unsigned) procInfo->Group.ActiveGroupCount);
+    for (id = 0; id < procInfo->Group.ActiveGroupCount; id++) {
+      KAFFINITY mask;
+      hwloc_bitmap_t set;
+
+      set = hwloc_bitmap_alloc();
+      if (!set)
+        goto error_with_cpusets;
+
+      mask = procInfo->Group.GroupInfo[id].ActiveProcessorMask;
+      hwloc_debug("group %u with %u cpus mask 0x%llx\n", id,
+                  (unsigned) procInfo->Group.GroupInfo[id].ActiveProcessorCount, (unsigned long long) mask);
+      /* KAFFINITY is ULONG_PTR */
+      hwloc_bitmap_set_ith_ULONG_PTR(set, id, mask);
+      /* FIXME: what if running 32bits on a 64bits windows with 64-processor groups?
+       * ULONG_PTR is 32bits, so half the group is invisible?
+       * maybe scale id to id*8/sizeof(ULONG_PTR) so that groups are 64-PU aligned?
+       */
+      hwloc_debug_2args_bitmap("group %u %d bitmap %s\n", id, procInfo->Group.GroupInfo[id].ActiveProcessorCount, set);
+      processor_group_cpusets[id] = set;
+    }
+  }
+
+  free(procInfoTotal);
+  return;
+
+ error_with_cpusets:
+  for(i=0; i<nr_processor_groups; i++) {
+    if (processor_group_cpusets[i])
+      hwloc_bitmap_free(processor_group_cpusets[i]);
+  }
+  free(processor_group_cpusets);
+  processor_group_cpusets = NULL;
+ error_with_procinfo:
+  free(procInfoTotal);
+ error:
+  /* on error set nr to 1 and keep cpusets NULL. We'll use the topology cpuset whenever needed */
+  nr_processor_groups = 1;
+}
+
+static void
+hwloc_win_free_processor_groups(void)
+{
+  unsigned i;
+  for(i=0; i<nr_processor_groups; i++) {
+    if (processor_group_cpusets[i])
+      hwloc_bitmap_free(processor_group_cpusets[i]);
+  }
+  free(processor_group_cpusets);
+  processor_group_cpusets = NULL;
+  nr_processor_groups = 1;
+}
+
+
+int
+hwloc_windows_get_nr_processor_groups(hwloc_topology_t topology, unsigned long flags)
+{
+  if (!topology->is_loaded || !topology->is_thissystem) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if (flags) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  return nr_processor_groups;
+}
+
+int
+hwloc_windows_get_processor_group_cpuset(hwloc_topology_t topology, unsigned pg_index, hwloc_cpuset_t cpuset, unsigned long flags)
+{
+  if (!topology->is_loaded || !topology->is_thissystem) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if (!cpuset) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if (flags) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if (pg_index >= nr_processor_groups) {
+    errno = ENOENT;
+    return -1;
+  }
+
+  if (!processor_group_cpusets) {
+    assert(nr_processor_groups == 1);
+    /* we found no processor groups, return the entire topology as a single one */
+    hwloc_bitmap_copy(cpuset, topology->levels[0][0]->cpuset);
+    return 0;
+  }
+
+  if (!processor_group_cpusets[pg_index]) {
+    errno = ENOENT;
+    return -1;
+  }
+
+  hwloc_bitmap_copy(cpuset, processor_group_cpusets[pg_index]);
+  return 0;
+}
+
 /**************************************************************
 * hwloc PU numbering with respect to Windows processor groups
 *
@ -848,6 +990,8 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
  unsigned hostname_size = sizeof(hostname);
  int has_efficiencyclass = 0;
  struct hwloc_win_efficiency_classes eclasses;
+  char *env = getenv("HWLOC_WINDOWS_PROCESSOR_GROUP_OBJS");
+  int keep_pgroup_objs = (env && atoi(env));

  assert(dstatus->phase == HWLOC_DISC_PHASE_CPU);

@ -878,137 +1022,8 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta

  GetSystemInfo(&SystemInfo);

-  if (!GetLogicalProcessorInformationExProc && GetLogicalProcessorInformationProc) {
-      PSYSTEM_LOGICAL_PROCESSOR_INFORMATION procInfo, tmpprocInfo;
-      unsigned id;
-      unsigned i;
-      struct hwloc_obj *obj;
-      hwloc_obj_type_t type;
-
-      length = 0;
-      procInfo = NULL;
-
-      while (1) {
-	if (GetLogicalProcessorInformationProc(procInfo, &length))
-	  break;
-	if (GetLastError() != ERROR_INSUFFICIENT_BUFFER)
-	  return -1;
-	tmpprocInfo = realloc(procInfo, length);
-	if (!tmpprocInfo) {
-	  free(procInfo);
-	  goto out;
-	}
-	procInfo = tmpprocInfo;
-      }
-
-      assert(!length || procInfo);
-
-      for (i = 0; i < length / sizeof(*procInfo); i++) {
-
-        /* Ignore unknown caches */
-	if (procInfo->Relationship == RelationCache
-		&& procInfo->Cache.Type != CacheUnified
-		&& procInfo->Cache.Type != CacheData
-		&& procInfo->Cache.Type != CacheInstruction)
-	  continue;
-
-	id = HWLOC_UNKNOWN_INDEX;
-	switch (procInfo[i].Relationship) {
-	  case RelationNumaNode:
-	    type = HWLOC_OBJ_NUMANODE;
-	    id = procInfo[i].NumaNode.NodeNumber;
-	    gotnuma++;
-	    if (id > max_numanode_index)
-	      max_numanode_index = id;
-	    break;
-	  case RelationProcessorPackage:
-	    type = HWLOC_OBJ_PACKAGE;
-	    break;
-	  case RelationCache:
-	    type = (procInfo[i].Cache.Type == CacheInstruction ? HWLOC_OBJ_L1ICACHE : HWLOC_OBJ_L1CACHE) + procInfo[i].Cache.Level - 1;
-	    break;
-	  case RelationProcessorCore:
-	    type = HWLOC_OBJ_CORE;
-	    break;
-	  case RelationGroup:
-	  default:
-	    type = HWLOC_OBJ_GROUP;
-	    break;
-	}
-
-	if (!hwloc_filter_check_keep_object_type(topology, type))
-	  continue;
-
-	obj = hwloc_alloc_setup_object(topology, type, id);
-        obj->cpuset = hwloc_bitmap_alloc();
-	hwloc_debug("%s#%u mask %llx\n", hwloc_obj_type_string(type), id, (unsigned long long) procInfo[i].ProcessorMask);
-	/* ProcessorMask is a ULONG_PTR */
-	hwloc_bitmap_set_ith_ULONG_PTR(obj->cpuset, 0, procInfo[i].ProcessorMask);
-	hwloc_debug_2args_bitmap("%s#%u bitmap %s\n", hwloc_obj_type_string(type), id, obj->cpuset);
-
-	switch (type) {
-	  case HWLOC_OBJ_NUMANODE:
-	    {
-	      ULONGLONG avail;
-	      obj->nodeset = hwloc_bitmap_alloc();
-	      hwloc_bitmap_set(obj->nodeset, id);
-	      if ((GetNumaAvailableMemoryNodeExProc && GetNumaAvailableMemoryNodeExProc(id, &avail))
-		  || (GetNumaAvailableMemoryNodeProc && GetNumaAvailableMemoryNodeProc(id, &avail))) {
-		obj->attr->numanode.local_memory = avail;
-		gotnumamemory++;
-	      }
-	      obj->attr->numanode.page_types_len = 2;
-	      obj->attr->numanode.page_types = malloc(2 * sizeof(*obj->attr->numanode.page_types));
-	      memset(obj->attr->numanode.page_types, 0, 2 * sizeof(*obj->attr->numanode.page_types));
-	      obj->attr->numanode.page_types_len = 1;
-	      obj->attr->numanode.page_types[0].size = SystemInfo.dwPageSize;
-#if HAVE_DECL__SC_LARGE_PAGESIZE
-	      obj->attr->numanode.page_types_len++;
-	      obj->attr->numanode.page_types[1].size = sysconf(_SC_LARGE_PAGESIZE);
-#endif
-	      break;
-	    }
-	  case HWLOC_OBJ_L1CACHE:
-	  case HWLOC_OBJ_L2CACHE:
-	  case HWLOC_OBJ_L3CACHE:
-	  case HWLOC_OBJ_L4CACHE:
-	  case HWLOC_OBJ_L5CACHE:
-	  case HWLOC_OBJ_L1ICACHE:
-	  case HWLOC_OBJ_L2ICACHE:
-	  case HWLOC_OBJ_L3ICACHE:
-	    obj->attr->cache.size = procInfo[i].Cache.Size;
-	    obj->attr->cache.associativity = procInfo[i].Cache.Associativity == CACHE_FULLY_ASSOCIATIVE ? -1 : procInfo[i].Cache.Associativity ;
-	    obj->attr->cache.linesize = procInfo[i].Cache.LineSize;
-	    obj->attr->cache.depth = procInfo[i].Cache.Level;
-	    switch (procInfo->Cache.Type) {
-	      case CacheUnified:
-		obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED;
-		break;
-	      case CacheData:
-		obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA;
-		break;
-	      case CacheInstruction:
-		obj->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION;
-		break;
-	      default:
-		hwloc_free_unlinked_object(obj);
-		continue;
-	    }
-	    break;
-	  case HWLOC_OBJ_GROUP:
-	    obj->attr->group.kind = procInfo[i].Relationship == RelationGroup ? HWLOC_GROUP_KIND_WINDOWS_PROCESSOR_GROUP : HWLOC_GROUP_KIND_WINDOWS_RELATIONSHIP_UNKNOWN;
-	    break;
-	  default:
-	    break;
-	}
-	hwloc__insert_object_by_cpuset(topology, NULL, obj, "windows:GetLogicalProcessorInformation");
-      }
-
-      free(procInfo);
-  }
-
  if (GetLogicalProcessorInformationExProc) {
-      PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX procInfoTotal, tmpprocInfoTotal, procInfo;
+      HWLOC_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *procInfoTotal, *tmpprocInfoTotal, *procInfo;
      unsigned id;
      struct hwloc_obj *obj;
      hwloc_obj_type_t type;
@ -1047,8 +1062,16 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
 	switch (procInfo->Relationship) {
 	  case RelationNumaNode:
 	    type = HWLOC_OBJ_NUMANODE;
-            num = 1;
-            GroupMask = &procInfo->NumaNode.GroupMask;
+            /* Starting with Windows 11 and Server 2022, the GroupCount field is valid and >=1
+             * and we may read GroupMasks[]. Older releases have GroupCount==0 and we must read GroupMask.
+             */
+            if (procInfo->NumaNode.GroupCount) {
+              num = procInfo->NumaNode.GroupCount;
+              GroupMask = procInfo->NumaNode.GroupMasks;
+            } else {
+              num = 1;
+              GroupMask = &procInfo->NumaNode.GroupMask;
+            }
 	    id = procInfo->NumaNode.NodeNumber;
 	    gotnuma++;
 	    if (id > max_numanode_index)
@ -1061,18 +1084,20 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
 	    break;
 	  case RelationCache:
 	    type = (procInfo->Cache.Type == CacheInstruction ? HWLOC_OBJ_L1ICACHE : HWLOC_OBJ_L1CACHE) + procInfo->Cache.Level - 1;
-            num = 1;
-            GroupMask = &procInfo->Cache.GroupMask;
+            /* GroupCount added approximately with NumaNode.GroupCount above */
+            if (procInfo->Cache.GroupCount) {
+              num = procInfo->Cache.GroupCount;
+              GroupMask = procInfo->Cache.GroupMasks;
+            } else {
+              num = 1;
+              GroupMask = &procInfo->Cache.GroupMask;
+            }
 	    break;
 	  case RelationProcessorCore:
 	    type = HWLOC_OBJ_CORE;
            num = procInfo->Processor.GroupCount;
            GroupMask = procInfo->Processor.GroupMask;
-            if (has_efficiencyclass)
-              /* the EfficiencyClass field didn't exist before Windows10 and recent MSVC headers,
-               * so just access it manually instead of trying to detect it.
-               */
-              efficiency_class = * ((&procInfo->Processor.Flags) + 1);
+            efficiency_class = procInfo->Processor.EfficiencyClass;
 	    break;
 	  case RelationGroup:
 	    /* So strange an interface... */
@ -1097,11 +1122,12 @@ hwloc_look_windows(struct hwloc_backend *backend, struct hwloc_disc_status *dsta
 		groups_pu_set = hwloc_bitmap_alloc();
 	      hwloc_bitmap_or(groups_pu_set, groups_pu_set, set);

-	      if (hwloc_filter_check_keep_object_type(topology, HWLOC_OBJ_GROUP)) {
+              /* Ignore processor groups unless requested and filtered-in */
+              if (keep_pgroup_objs && hwloc_filter_check_keep_object_type(topology, HWLOC_OBJ_GROUP)) {
 		obj = hwloc_alloc_setup_object(topology, HWLOC_OBJ_GROUP, id);
 		obj->cpuset = set;
 		obj->attr->group.kind = HWLOC_GROUP_KIND_WINDOWS_PROCESSOR_GROUP;
-		hwloc__insert_object_by_cpuset(topology, NULL, obj, "windows:GetLogicalProcessorInformation:ProcessorGroup");
+		hwloc__insert_object_by_cpuset(topology, NULL, obj, "windows:GetLogicalProcessorInformationEx:ProcessorGroup");
 	      } else
 		hwloc_bitmap_free(set);
 	    }
@ -1328,11 +1354,13 @@ hwloc_set_windows_hooks(struct hwloc_binding_hooks *hooks,
 static int hwloc_windows_component_init(unsigned long flags __hwloc_attribute_unused)
 {
  hwloc_win_get_function_ptrs();
+  hwloc_win_get_processor_groups();
  return 0;
 }

 static void hwloc_windows_component_finalize(unsigned long flags __hwloc_attribute_unused)
 {
+  hwloc_win_free_processor_groups();
 }

 static struct hwloc_backend *
--- a/src/3rdparty/hwloc/src/topology-x86.c
+++ b/src/3rdparty/hwloc/src/topology-x86.c
@ -1,5 +1,5 @@
 /*
- * Copyright © 2010-2021 Inria.  All rights reserved.
+ * Copyright © 2010-2022 Inria.  All rights reserved.
 * Copyright © 2010-2013 Université Bordeaux
 * Copyright © 2010-2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -7,11 +7,14 @@
 *
 * This backend is only used when the operating system does not export
 * the necessary hardware topology information to user-space applications.
- * Currently, only the FreeBSD backend relies on this x86 backend.
+ * Currently, FreeBSD and NetBSD only add PUs and then fallback to this
+ * backend for CPU/Cache discovery.
 *
 * Other backends such as Linux have their own way to retrieve various
 * pieces of hardware topology information from the operating system
 * on various architectures, without having to use this x86-specific code.
+ * But this backend is still used after them to annotate some objects with
+ * additional details (CPU info in Package, Inclusiveness in Caches).
 */

 #include "private/autogen/config.h"
@ -497,7 +500,8 @@ static void read_amd_cores_topoext(struct procinfo *infos, unsigned long flags,
      nodes_per_proc = ((ecx >> 8) & 7) + 1;
    }
    if ((infos->cpufamilynumber == 0x15 && nodes_per_proc > 2)
-	|| ((infos->cpufamilynumber == 0x17 || infos->cpufamilynumber == 0x18) && nodes_per_proc > 4)) {
+	|| ((infos->cpufamilynumber == 0x17 || infos->cpufamilynumber == 0x18) && nodes_per_proc > 4)
+        || (infos->cpufamilynumber == 0x19 && nodes_per_proc > 1)) {
      hwloc_debug("warning: undefined nodes_per_proc value %u, assuming it means %u\n", nodes_per_proc, nodes_per_proc);
    }
  }
@ -610,10 +614,13 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns
  eax = 0x01;
  cpuid_or_from_dump(&eax, &ebx, &ecx, &edx, src_cpuiddump);
  infos->apicid = ebx >> 24;
-  if (edx & (1 << 28))
+  if (edx & (1 << 28)) {
    legacy_max_log_proc = 1 << hwloc_flsl(((ebx >> 16) & 0xff) - 1);
-  else
+  } else {
+    hwloc_debug("HTT bit not set in CPUID 0x01.edx, assuming legacy_max_log_proc = 1\n");
    legacy_max_log_proc = 1;
+  }
+
  hwloc_debug("APIC ID 0x%02x legacy_max_log_proc %u\n", infos->apicid, legacy_max_log_proc);
  infos->ids[PKG] = infos->apicid / legacy_max_log_proc;
  legacy_log_proc_id = infos->apicid % legacy_max_log_proc;
@ -676,12 +683,23 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns
      unsigned max_nbcores;
      unsigned max_nbthreads;
      unsigned threadid __hwloc_attribute_unused;
+      hwloc_debug("Trying to get core/thread IDs from 0x04...\n");
      max_nbcores = ((eax >> 26) & 0x3f) + 1;
-      max_nbthreads = legacy_max_log_proc / max_nbcores;
-      hwloc_debug("thus %u threads\n", max_nbthreads);
-      threadid = legacy_log_proc_id % max_nbthreads;
-      infos->ids[CORE] = legacy_log_proc_id / max_nbthreads;
-      hwloc_debug("this is thread %u of core %u\n", threadid, infos->ids[CORE]);
+      hwloc_debug("found %u cores max\n", max_nbcores);
+      /* some VMs (e.g. issue#525) don't report valid information, check things before dividing by 0. */
+      if (!max_nbcores) {
+        hwloc_debug("cannot detect core/thread IDs from 0x04 without a valid max of cores\n");
+      } else {
+        max_nbthreads = legacy_max_log_proc / max_nbcores;
+        hwloc_debug("found %u threads max\n", max_nbthreads);
+        if (!max_nbthreads) {
+          hwloc_debug("cannot detect core/thread IDs from 0x04 without a valid max of threads\n");
+        } else {
+          threadid = legacy_log_proc_id % max_nbthreads;
+          infos->ids[CORE] = legacy_log_proc_id / max_nbthreads;
+          hwloc_debug("this is thread %u of core %u\n", threadid, infos->ids[CORE]);
+        }
+      }
    }
  }

@ -772,13 +790,19 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns

    } else if (cpuid_type == amd) {
      /* AMD quirks */
-      if (infos->cpufamilynumber == 0x17
-	  && cache->level == 3 && cache->nbthreads_sharing == 6) {
-	/* AMD family 0x17 always shares L3 between 8 APIC ids,
-	 * even when only 6 APIC ids are enabled and reported in nbthreads_sharing
-	 * (on 24-core CPUs).
+      if (infos->cpufamilynumber >= 0x17 && cache->level == 3) {
+	/* AMD family 0x19 always shares L3 between 16 APIC ids (8 HT cores).
+         * while Family 0x17 shares between 8 APIC ids (4 HT cores).
+         * But many models have less APIC ids enabled and reported in nbthreads_sharing.
+         * It means we must round-up nbthreads_sharing to the nearest power of 2
+         * before computing cacheid.
 	 */
-	cache->cacheid = infos->apicid / 8;
+        unsigned nbapics_sharing = cache->nbthreads_sharing;
+        if (nbapics_sharing & (nbapics_sharing-1))
+          /* not a power of two, round-up */
+          nbapics_sharing = 1U<<(1+hwloc_ffsl(nbapics_sharing));
+
+	cache->cacheid = infos->apicid / nbapics_sharing;

      } else if (infos->cpufamilynumber== 0x10 && infos->cpumodelnumber == 0x9
 	  && cache->level == 3
@ -804,7 +828,7 @@ static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, uns
      } else if (infos->cpufamilynumber == 0x15
 		 && (infos->cpumodelnumber == 0x1 /* Bulldozer */ || infos->cpumodelnumber == 0x2 /* Piledriver */)
 		 && cache->level == 3 && cache->nbthreads_sharing == 6) {
-	/* AMD Bulldozer and Piledriver 12-core processors have same APIC ids as Magny-Cours below,
+	/* AMD Bulldozer and Piledriver 12-core processors have same APIC ids as Magny-Cours above,
 	 * but we can't merge the checks because the original nbthreads_sharing must be exactly 6 here.
 	 */
 	cache->cacheid = (infos->apicid % legacy_max_log_proc) / cache->nbthreads_sharing /* cacheid within the package */
@ -1228,6 +1252,18 @@ static void summarize(struct hwloc_backend *backend, struct procinfo *infos, uns
 	    }
 	  }
 	  cache = hwloc_alloc_setup_object(topology, otype, HWLOC_UNKNOWN_INDEX);
+          /* We don't specify the os_index of caches because we want to be
+           * 100% sure they are identical to what the Linux kernel reports
+           * (so that things like resctrl work).
+           * However, vendor/model-specific quirks in the x86 code above
+           * make this difficult.
+           *
+           * Caveat: if the x86 backend is used on Linux to avoid kernel bugs,
+           * IDs won't be available to resctrl users. But resctrl heavily
+           * relies on the kernel x86 discovery being non-buggy anyway.
+           *
+           * TODO: make this optional? or only disable it on Linux?
+           */
 	  cache->attr->cache.depth = level;
 	  cache->attr->cache.size = infos[i].cache[l].size;
 	  cache->attr->cache.linesize = infos[i].cache[l].linesize;
@ -1257,7 +1293,8 @@ static int
 look_procs(struct hwloc_backend *backend, struct procinfo *infos, unsigned long flags,
 	   unsigned highest_cpuid, unsigned highest_ext_cpuid, unsigned *features, enum cpuid_type cpuid_type,
 	   int (*get_cpubind)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags),
-	   int (*set_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags))
+	   int (*set_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags),
+           hwloc_bitmap_t restrict_set)
 {
  struct hwloc_x86_backend_data_s *data = backend->private_data;
  struct hwloc_topology *topology = backend->topology;
@ -1277,6 +1314,12 @@ look_procs(struct hwloc_backend *backend, struct procinfo *infos, unsigned long

  for (i = 0; i < nbprocs; i++) {
    struct cpuiddump *src_cpuiddump = NULL;
+
+    if (restrict_set && !hwloc_bitmap_isset(restrict_set, i)) {
+      /* skip this CPU outside of the binding mask */
+      continue;
+    }
+
    if (data->src_cpuiddump_path) {
      src_cpuiddump = cpuiddump_read(data->src_cpuiddump_path, i);
      if (!src_cpuiddump)
@ -1306,7 +1349,7 @@ look_procs(struct hwloc_backend *backend, struct procinfo *infos, unsigned long
  if (data->apicid_unique) {
    summarize(backend, infos, flags);

-    if (has_hybrid(features)) {
+    if (has_hybrid(features) && !(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_CPUKINDS)) {
      /* use hybrid info for cpukinds */
      hwloc_bitmap_t atomset = hwloc_bitmap_alloc();
      hwloc_bitmap_t coreset = hwloc_bitmap_alloc();
@ -1410,6 +1453,7 @@ static
 int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)
 {
  struct hwloc_x86_backend_data_s *data = backend->private_data;
+  struct hwloc_topology *topology = backend->topology;
  unsigned nbprocs = data->nbprocs;
  unsigned eax, ebx, ecx = 0, edx;
  unsigned i;
@ -1425,9 +1469,21 @@ int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)
  struct hwloc_topology_membind_support memsupport __hwloc_attribute_unused;
  int (*get_cpubind)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags) = NULL;
  int (*set_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags) = NULL;
+  hwloc_bitmap_t restrict_set = NULL;
  struct cpuiddump *src_cpuiddump = NULL;
  int ret = -1;

+  /* check if binding works */
+  memset(&hooks, 0, sizeof(hooks));
+  support.membind = &memsupport;
+  /* We could just copy the main hooks (except in some corner cases),
+   * but the current overhead is negligible, so just always reget them.
+   */
+  hwloc_set_native_binding_hooks(&hooks, &support);
+  /* in theory, those are only needed if !data->src_cpuiddump_path || HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_BINDING
+   * but that's the vast majority of cases anyway, and the overhead is very small.
+   */
+
  if (data->src_cpuiddump_path) {
    /* Just read cpuid from the dump (implies !topology->is_thissystem by default) */
    src_cpuiddump = cpuiddump_read(data->src_cpuiddump_path, 0);
@ -1440,13 +1496,6 @@ int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)
     * we may still force use this backend when debugging with !thissystem.
     */

-    /* check if binding works */
-    memset(&hooks, 0, sizeof(hooks));
-    support.membind = &memsupport;
-    /* We could just copy the main hooks (except in some corner cases),
-     * but the current overhead is negligible, so just always reget them.
-     */
-    hwloc_set_native_binding_hooks(&hooks, &support);
    if (hooks.get_thisthread_cpubind && hooks.set_thisthread_cpubind) {
      get_cpubind = hooks.get_thisthread_cpubind;
      set_cpubind = hooks.set_thisthread_cpubind;
@ -1466,6 +1515,20 @@ int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)
    }
  }

+  if (topology->flags & HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING) {
+    restrict_set = hwloc_bitmap_alloc();
+    if (!restrict_set)
+      goto out;
+    if (hooks.get_thisproc_cpubind)
+      hooks.get_thisproc_cpubind(topology, restrict_set, 0);
+    else if (hooks.get_thisthread_cpubind)
+      hooks.get_thisthread_cpubind(topology, restrict_set, 0);
+    if (hwloc_bitmap_iszero(restrict_set)) {
+      hwloc_bitmap_free(restrict_set);
+      restrict_set = NULL;
+    }
+  }
+
  if (!src_cpuiddump && !hwloc_have_x86_cpuid())
    goto out;

@ -1530,7 +1593,7 @@ int hwloc_look_x86(struct hwloc_backend *backend, unsigned long flags)

  ret = look_procs(backend, infos, flags,
 		   highest_cpuid, highest_ext_cpuid, features, cpuid_type,
-		   get_cpubind, set_cpubind);
+		   get_cpubind, set_cpubind, restrict_set);
  if (!ret)
    /* success, we're done */
    goto out_with_os_state;
@ -1555,6 +1618,7 @@ out_with_infos:
  }

 out:
+  hwloc_bitmap_free(restrict_set);
  if (src_cpuiddump)
    cpuiddump_free(src_cpuiddump);
  return ret;
@ -1571,6 +1635,11 @@ hwloc_x86_discover(struct hwloc_backend *backend, struct hwloc_disc_status *dsta

  assert(dstatus->phase == HWLOC_DISC_PHASE_CPU);

+  if (topology->flags & HWLOC_TOPOLOGY_FLAG_DONT_CHANGE_BINDING) {
+    /* TODO: Things would work if there's a single PU, no need to rebind */
+    return 0;
+  }
+
  if (getenv("HWLOC_X86_TOPOEXT_NUMANODES")) {
    flags |= HWLOC_X86_DISC_FLAG_TOPOEXT_NUMANODES;
  }
--- a/src/3rdparty/hwloc/src/topology-xml.c
+++ b/src/3rdparty/hwloc/src/topology-xml.c
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2009-2011, 2020 Université Bordeaux
 * Copyright © 2009-2018 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -123,6 +123,17 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
      fprintf(stderr, "%s: unexpected zero gp_index, topology may be invalid\n", state->global->msgprefix);
    if (obj->gp_index >= topology->next_gp_index)
      topology->next_gp_index = obj->gp_index + 1;
+  } else if (!strcmp(name, "id")) { /* forward compat */
+    if (!strncmp(value, "obj", 3)) {
+      obj->gp_index = strtoull(value+3, NULL, 10);
+      if (!obj->gp_index && hwloc__xml_verbose())
+        fprintf(stderr, "%s: unexpected zero id, topology may be invalid\n", state->global->msgprefix);
+      if (obj->gp_index >= topology->next_gp_index)
+        topology->next_gp_index = obj->gp_index + 1;
+    } else {
+      if (hwloc__xml_verbose())
+        fprintf(stderr, "%s: unexpected id `%s' not-starting with `obj', ignoring\n", state->global->msgprefix, value);
+    }
  } else if (!strcmp(name, "cpuset")) {
    if (!obj->cpuset)
      obj->cpuset = hwloc_bitmap_alloc();
@ -192,8 +203,9 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
 	  || lvalue == HWLOC_OBJ_CACHE_INSTRUCTION)
 	obj->attr->cache.type = (hwloc_obj_cache_type_t) lvalue;
      else
-	fprintf(stderr, "%s: ignoring invalid cache_type attribute %lu\n",
-		state->global->msgprefix, lvalue);
+        if (hwloc__xml_verbose())
+          fprintf(stderr, "%s: ignoring invalid cache_type attribute %lu\n",
+                  state->global->msgprefix, lvalue);
    } else if (hwloc__xml_verbose())
      fprintf(stderr, "%s: ignoring cache_type attribute for non-cache object type\n",
 	      state->global->msgprefix);
@ -242,7 +254,7 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
  else if (!strcmp(name, "dont_merge")) {
    unsigned long lvalue = strtoul(value, NULL, 10);
    if (obj->type == HWLOC_OBJ_GROUP)
-      obj->attr->group.dont_merge = lvalue;
+      obj->attr->group.dont_merge = (unsigned char) lvalue;
    else if (hwloc__xml_verbose())
      fprintf(stderr, "%s: ignoring dont_merge attribute for non-group object type\n",
 	      state->global->msgprefix);
@ -262,8 +274,8 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
 #ifndef HWLOC_HAVE_32BITS_PCI_DOMAIN
      } else if (domain > 0xffff) {
 	static int warned = 0;
-	if (!warned && !hwloc_hide_errors())
-	  fprintf(stderr, "Ignoring PCI device with non-16bit domain.\nPass --enable-32bits-pci-domain to configure to support such devices\n(warning: it would break the library ABI, don't enable unless really needed).\n");
+	if (!warned && HWLOC_SHOW_ALL_ERRORS())
+	  fprintf(stderr, "hwloc/xml: Ignoring PCI device with non-16bit domain.\nPass --enable-32bits-pci-domain to configure to support such devices\n(warning: it would break the library ABI, don't enable unless really needed).\n");
 	warned = 1;
 	*ignore = 1;
 #endif
@ -337,6 +349,7 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
      } else {
 	obj->attr->bridge.upstream_type = (hwloc_obj_bridge_type_t) upstream_type;
 	obj->attr->bridge.downstream_type = (hwloc_obj_bridge_type_t) downstream_type;
+        /* FIXME verify that upstream/downstream type is valid */
      };
      break;
    }
@ -361,12 +374,13 @@ hwloc__xml_import_object_attr(struct hwloc_topology *topology,
 #ifndef HWLOC_HAVE_32BITS_PCI_DOMAIN
      } else if (domain > 0xffff) {
 	static int warned = 0;
-	if (!warned && !hwloc_hide_errors())
-	  fprintf(stderr, "Ignoring bridge to PCI with non-16bit domain.\nPass --enable-32bits-pci-domain to configure to support such devices\n(warning: it would break the library ABI, don't enable unless really needed).\n");
+	if (!warned && HWLOC_SHOW_ALL_ERRORS())
+	  fprintf(stderr, "hwloc/xml: Ignoring bridge to PCI with non-16bit domain.\nPass --enable-32bits-pci-domain to configure to support such devices\n(warning: it would break the library ABI, don't enable unless really needed).\n");
 	warned = 1;
 	*ignore = 1;
 #endif
      } else {
+        /* FIXME verify that downstream type vs pci info are valid */
 	obj->attr->bridge.downstream.pci.domain = domain;
 	obj->attr->bridge.downstream.pci.secondary_bus = secbus;
 	obj->attr->bridge.downstream.pci.subordinate_bus = subbus;
@ -1232,7 +1246,7 @@ hwloc__xml_import_object(hwloc_topology_t topology,
 	/* next should be before cur */
 	if (!childrengotignored) {
 	  static int reported = 0;
-	  if (!reported && !hwloc_hide_errors()) {
+	  if (!reported && HWLOC_SHOW_CRITICAL_ERRORS()) {
 	    hwloc__xml_import_report_outoforder(topology, next, cur);
 	    reported = 1;
 	  }
@ -1565,7 +1579,10 @@ hwloc__xml_v2import_distances(hwloc_topology_t topology,
    }
  }

-  hwloc_internal_distances_add_by_index(topology, name, unique_type, different_types, nbobjs, indexes, u64values, kind, 0);
+  if (topology->flags & HWLOC_TOPOLOGY_FLAG_NO_DISTANCES)
+    goto out_ignore;
+
+  hwloc_internal_distances_add_by_index(topology, name, unique_type, different_types, nbobjs, indexes, u64values, kind, 0 /* assume grouping was applied when this matrix was discovered before exporting to XML */);

  /* prevent freeing below */
  indexes = NULL;
@ -1719,7 +1736,8 @@ hwloc__xml_import_memattr(hwloc_topology_t topology,
    }
  }

-  if (name && flags != (unsigned long) -1) {
+  if (name && flags != (unsigned long) -1
+      && !(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_MEMATTRS)) {
    hwloc_memattr_id_t _id;

    ret = hwloc_memattr_get_by_name(topology, name, &_id);
@ -1830,7 +1848,13 @@ hwloc__xml_import_cpukind(hwloc_topology_t topology,
    goto error;
  }

-  hwloc_internal_cpukinds_register(topology, cpuset, forced_efficiency, infos, nr_infos, HWLOC_CPUKINDS_REGISTER_FLAG_OVERWRITE_FORCED_EFFICIENCY);
+  if (topology->flags & HWLOC_TOPOLOGY_FLAG_NO_CPUKINDS) {
+    hwloc__free_infos(infos, nr_infos);
+    hwloc_bitmap_free(cpuset);
+  } else {
+    hwloc_internal_cpukinds_register(topology, cpuset, forced_efficiency, infos, nr_infos, HWLOC_CPUKINDS_REGISTER_FLAG_OVERWRITE_FORCED_EFFICIENCY);
+    hwloc__free_infos(infos, nr_infos);
+  }

  return state->global->close_tag(state);

@ -2165,7 +2189,8 @@ done:
       * but it would require to have those objects in the original XML order (like the first_numanode cousin-list).
       * because the topology order can be different if some parents are ignored during load.
       */
-      if (nbobjs == data->nbnumanodes) {
+      if (nbobjs == data->nbnumanodes
+          && !(topology->flags & HWLOC_TOPOLOGY_FLAG_NO_DISTANCES)) {
 	hwloc_obj_t *objs = malloc(nbobjs*sizeof(hwloc_obj_t));
 	uint64_t *values = malloc(nbobjs*nbobjs*sizeof(*values));
        assert(data->nbnumanodes > 0); /* v1dist->nbobjs is >0 after import */
@ -2647,7 +2672,8 @@ hwloc__xml_export_object_contents (hwloc__xml_export_state_t state, hwloc_topolo

      logical_to_v2array = malloc(nbobjs * sizeof(*logical_to_v2array));
      if (!logical_to_v2array) {
-	fprintf(stderr, "xml/export/v1: failed to allocated logical_to_v2array\n");
+        if (HWLOC_SHOW_ALL_ERRORS())
+          fprintf(stderr, "hwloc/xml/export/v1: failed to allocated logical_to_v2array\n");
 	continue;
      }

@ -2821,6 +2847,7 @@ hwloc__xml_v1export_object_with_memory(hwloc__xml_export_state_t parentstate, hw
    /* child has sibling, we must add a Group around those memory children */
    hwloc_obj_t group = parentstate->global->v1_memory_group;
    parentstate->new_child(parentstate, &gstate, "object");
+    group->parent = obj->parent;
    group->cpuset = obj->cpuset;
    group->complete_cpuset = obj->complete_cpuset;
    group->nodeset = obj->nodeset;
@ -3119,9 +3146,11 @@ hwloc__xml_export_memattrs(hwloc__xml_export_state_t state, hwloc_topology_t top
      continue;

    imattr = &topology->memattrs[id];
-    if ((id == HWLOC_MEMATTR_ID_LATENCY || id == HWLOC_MEMATTR_ID_BANDWIDTH)
-        && !imattr->nr_targets)
-      /* no need to export target-less attributes for initial attributes, no release support attributes without those definitions */
+    if (id < HWLOC_MEMATTR_ID_MAX && !imattr->nr_targets)
+      /* no need to export standard attributes without any target,
+       * their definition is now standardized,
+       * the old hwloc importing this XML may recreate these attributes just like it would for a non-imported topology.
+       */
      continue;

    state->new_child(state, &mstate, "memattr");
--- a/src/3rdparty/hwloc/src/topology.c
+++ b/src/3rdparty/hwloc/src/topology.c
@ -1,8 +1,9 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2021 Inria.  All rights reserved.
+ * Copyright © 2009-2022 Inria.  All rights reserved.
 * Copyright © 2009-2012, 2020 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
+ * Copyright © 2022 IBM Corporation.  All rights reserved.
 * See COPYING in top-level directory.
 */

@ -52,6 +53,57 @@
 #include <windows.h>
 #endif

+
+#ifdef HWLOC_HAVE_LEVELZERO
+/*
+ * Define ZES_ENABLE_SYSMAN=1 early so that the LevelZero backend gets Sysman enabled.
+ *
+ * Only if the levelzero was enabled in this build so that we don't enable sysman
+ * for external levelzero users when hwloc doesn't need it. If somebody ever loads
+ * an external levelzero plugin in a hwloc library built without levelzero (unlikely),
+ * he may have to manually set ZES_ENABLE_SYSMAN=1.
+ *
+ * Use the constructor if supported and/or the Windows DllMain callback.
+ * Do it in the main hwloc library instead of the levelzero component because
+ * the latter could be loaded later as a plugin.
+ *
+ * L0 seems to be using getenv() to check this variable on Windows
+ * (at least in the Intel Compute-Runtime of March 2021),
+ * but setenv() doesn't seem to exist on Windows, hence use putenv() to set the variable.
+ *
+ * For the record, Get/SetEnvironmentVariable() is not exactly the same as getenv/putenv():
+ * - getenv() doesn't see what was set with SetEnvironmentVariable()
+ * - GetEnvironmentVariable() doesn't see putenv() in cygwin (while it does in MSVC and MinGW).
+ * Hence, if L0 ever switches from getenv() to GetEnvironmentVariable(),
+ * it will break in cygwin, we'll have to use both putenv() and SetEnvironmentVariable().
+ * Hopefully L0 will provide a way to enable Sysman without env vars before it happens.
+ */
+#if HWLOC_HAVE_ATTRIBUTE_CONSTRUCTOR
+static void hwloc_constructor(void) __attribute__((constructor));
+static void hwloc_constructor(void)
+{
+  if (!getenv("ZES_ENABLE_SYSMAN"))
+#ifdef HWLOC_WIN_SYS
+    putenv("ZES_ENABLE_SYSMAN=1");
+#else
+    setenv("ZES_ENABLE_SYSMAN", "1", 1);
+#endif
+}
+#endif
+#ifdef HWLOC_WIN_SYS
+BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved)
+{
+  if (fdwReason == DLL_PROCESS_ATTACH) {
+    if (!getenv("ZES_ENABLE_SYSMAN"))
+      /* Windows does not have a setenv, so use putenv. */
+      putenv((char *) "ZES_ENABLE_SYSMAN=1");
+  }
+  return TRUE;
+}
+#endif
+#endif /* HWLOC_HAVE_LEVELZERO */
+
+
 unsigned hwloc_get_api_version(void)
 {
  return HWLOC_API_VERSION;
@ -62,14 +114,25 @@ int hwloc_topology_abi_check(hwloc_topology_t topology)
  return topology->topology_abi != HWLOC_TOPOLOGY_ABI ? -1 : 0;
 }

+/* callers should rather use wrappers HWLOC_SHOW_ALL_ERRORS() and HWLOC_SHOW_CRITICAL_ERRORS() for clarity */
 int hwloc_hide_errors(void)
 {
-  static int hide = 0;
+  static int hide = 1; /* only show critical errors by default. lstopo will show others */
  static int checked = 0;
  if (!checked) {
    const char *envvar = getenv("HWLOC_HIDE_ERRORS");
-    if (envvar)
+    if (envvar) {
      hide = atoi(envvar);
+#ifdef HWLOC_DEBUG
+    } else {
+      /* if debug is enabled and HWLOC_DEBUG_VERBOSE isn't forced to 0,
+       * show all errors jus like we show all debug messages.
+       */
+      envvar = getenv("HWLOC_DEBUG_VERBOSE");
+      if (!envvar || atoi(envvar))
+        hide = 0;
+#endif
+    }
    checked = 1;
  }
  return hide;
@ -106,7 +169,7 @@ static void report_insert_error(hwloc_obj_t new, hwloc_obj_t old, const char *ms
 {
  static int reported = 0;

-  if (reason && !reported && !hwloc_hide_errors()) {
+  if (reason && !reported && HWLOC_SHOW_CRITICAL_ERRORS()) {
    char newstr[512];
    char oldstr[512];
    report_insert_error_format_obj(newstr, sizeof(newstr), new);
@ -1865,6 +1928,9 @@ hwloc_topology_alloc_group_object(struct hwloc_topology *topology)
 static void hwloc_propagate_symmetric_subtree(hwloc_topology_t topology, hwloc_obj_t root);
 static void propagate_total_memory(hwloc_obj_t obj);
 static void hwloc_set_group_depth(hwloc_topology_t topology);
+static void hwloc_connect_children(hwloc_obj_t parent);
+static int hwloc_connect_levels(hwloc_topology_t topology);
+static int hwloc_connect_special_levels(hwloc_topology_t topology);

 hwloc_obj_t
 hwloc_topology_insert_group_object(struct hwloc_topology *topology, hwloc_obj_t obj)
@ -2307,9 +2373,15 @@ hwloc__filter_bridges(hwloc_topology_t topology, hwloc_obj_t root, unsigned dept

    child->attr->bridge.depth = depth;

-    if (child->type == HWLOC_OBJ_BRIDGE
-	&& filter == HWLOC_TYPE_FILTER_KEEP_IMPORTANT
-	&& !child->io_first_child) {
+    /* remove bridges that have no child,
+     * and pci-to-non-pci bridges (pcidev) that no child either.
+     * keep NVSwitch since they may be used in NVLink matrices.
+     */
+    if (filter == HWLOC_TYPE_FILTER_KEEP_IMPORTANT
+	&& !child->io_first_child
+        && (child->type == HWLOC_OBJ_BRIDGE
+            || (child->type == HWLOC_OBJ_PCI_DEVICE && (child->attr->pcidev.class_id >> 8) == 0x06
+                && (!child->subtype || strcmp(child->subtype, "NVSwitch"))))) {
      unlink_and_free_single_object(pchild);
      topology->modified = 1;
    }
@ -2432,13 +2504,26 @@ hwloc_compare_levels_structure(hwloc_topology_t topology, unsigned i)
  return 0;
 }

-/* return > 0 if any level was removed, which means reconnect is needed */
-static void
+/* return > 0 if any level was removed.
+ * performs its own reconnect internally if needed
+ */
+static int
 hwloc_filter_levels_keep_structure(hwloc_topology_t topology)
 {
  unsigned i, j;
  int res = 0;

+  if (topology->modified) {
+    /* WARNING: hwloc_topology_reconnect() is duplicated partially here
+     * and at the end of this function:
+     * - we need normal levels before merging.
+     * - and we'll need to update special levels after merging.
+     */
+    hwloc_connect_children(topology->levels[0][0]);
+    if (hwloc_connect_levels(topology) < 0)
+      return -1;
+  }
+
  /* start from the bottom since we'll remove intermediate levels */
  for(i=topology->nb_levels-1; i>0; i--) {
    int replacechild = 0, replaceparent = 0;
@ -2604,6 +2689,22 @@ hwloc_filter_levels_keep_structure(hwloc_topology_t topology)
 	topology->type_depth[type] = HWLOC_TYPE_DEPTH_MULTIPLE;
    }
  }
+
+
+  if (res > 0 || topology-> modified) {
+    /* WARNING: hwloc_topology_reconnect() is duplicated partially here
+     * and at the beginning of this function.
+     * If we merged some levels, some child+parent special children lisst
+     * may have been merged, hence specials level might need reordering,
+     * So reconnect special levels only here at the end
+     * (it's not needed at the beginning of this function).
+     */
+    if (hwloc_connect_special_levels(topology) < 0)
+      return -1;
+    topology->modified = 0;
+  }
+
+  return 0;
 }

 static void
@ -2921,9 +3022,9 @@ hwloc_list_special_objects(hwloc_topology_t topology, hwloc_obj_t obj)
  }
 }

-/* Build I/O levels */
+/* Build Memory, I/O and Misc levels */
 static int
-hwloc_connect_io_misc_levels(hwloc_topology_t topology)
+hwloc_connect_special_levels(hwloc_topology_t topology)
 {
  unsigned i;

@ -3088,7 +3189,8 @@ hwloc_connect_levels(hwloc_topology_t topology)
      tmpnbobjs = realloc(topology->level_nbobjects,
 			  2 * topology->nb_levels_allocated * sizeof(*topology->level_nbobjects));
      if (!tmplevels || !tmpnbobjs) {
-	fprintf(stderr, "hwloc failed to realloc level arrays to %u\n", topology->nb_levels_allocated * 2);
+        if (HWLOC_SHOW_CRITICAL_ERRORS())
+          fprintf(stderr, "hwloc: failed to realloc level arrays to %u\n", topology->nb_levels_allocated * 2);

 	/* if one realloc succeeded, make sure the caller will free the new buffer */
 	if (tmplevels)
@ -3133,6 +3235,10 @@ hwloc_connect_levels(hwloc_topology_t topology)
 int
 hwloc_topology_reconnect(struct hwloc_topology *topology, unsigned long flags)
 {
+  /* WARNING: when updating this function, the replicated code must
+   * also be updated inside hwloc_filter_levels_keep_structure()
+   */
+
  if (flags) {
    errno = EINVAL;
    return -1;
@ -3145,7 +3251,7 @@ hwloc_topology_reconnect(struct hwloc_topology *topology, unsigned long flags)
  if (hwloc_connect_levels(topology) < 0)
    return -1;

-  if (hwloc_connect_io_misc_levels(topology) < 0)
+  if (hwloc_connect_special_levels(topology) < 0)
    return -1;

  topology->modified = 0;
@ -3441,6 +3547,8 @@ hwloc_discover(struct hwloc_topology *topology,
  /*
   * Additional discovery
   */
+  hwloc_pci_discovery_prepare(topology);
+
  if (topology->backend_phases & HWLOC_DISC_PHASE_PCI) {
    dstatus->phase = HWLOC_DISC_PHASE_PCI;
    hwloc_discover_by_phase(topology, dstatus, "PCI");
@ -3458,6 +3566,8 @@ hwloc_discover(struct hwloc_topology *topology,
    hwloc_discover_by_phase(topology, dstatus, "ANNOTATE");
  }

+  hwloc_pci_discovery_exit(topology); /* pci needed up to annotate */
+
  if (getenv("HWLOC_DEBUG_SORT_CHILDREN"))
    hwloc_debug_sort_children(topology->levels[0][0]);

@ -3470,28 +3580,28 @@ hwloc_discover(struct hwloc_topology *topology,
  hwloc_debug("%s", "\nRemoving empty objects\n");
  remove_empty(topology, &topology->levels[0][0]);
  if (!topology->levels[0][0]) {
-    fprintf(stderr, "Topology became empty, aborting!\n");
+    if (HWLOC_SHOW_CRITICAL_ERRORS())
+      fprintf(stderr, "hwloc: Topology became empty, aborting!\n");
    return -1;
  }
  if (hwloc_bitmap_iszero(topology->levels[0][0]->cpuset)) {
-    fprintf(stderr, "Topology does not contain any PU, aborting!\n");
+    if (HWLOC_SHOW_CRITICAL_ERRORS())
+      fprintf(stderr, "hwloc: Topology does not contain any PU, aborting!\n");
    return -1;
  }
  if (hwloc_bitmap_iszero(topology->levels[0][0]->nodeset)) {
-    fprintf(stderr, "Topology does not contain any NUMA node, aborting!\n");
+    if (HWLOC_SHOW_CRITICAL_ERRORS())
+      fprintf(stderr, "hwloc: Topology does not contain any NUMA node, aborting!\n");
    return -1;
  }
  hwloc_debug_print_objects(0, topology->levels[0][0]);

-  /* Reconnect things after all these changes.
-   * Often needed because of Groups inserted for I/Os.
-   * And required for KEEP_STRUCTURE below.
-   */
-  if (hwloc_topology_reconnect(topology, 0) < 0)
-    return -1;
-
  hwloc_debug("%s", "\nRemoving levels with HWLOC_TYPE_FILTER_KEEP_STRUCTURE\n");
-  hwloc_filter_levels_keep_structure(topology);
+  if (hwloc_filter_levels_keep_structure(topology) < 0)
+    return -1;
+  /* takes care of reconnecting children/levels internally,
+   * because it needs normal levels.
+   * and it's often needed below because of Groups inserted for I/Os anyway */
  hwloc_debug_print_objects(0, topology->levels[0][0]);

  /* accumulate children memory in total_memory fields (only once parent is set) */
@ -3716,7 +3826,27 @@ hwloc_topology_set_flags (struct hwloc_topology *topology, unsigned long flags)
    return -1;
  }

-  if (flags & ~(HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED|HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM|HWLOC_TOPOLOGY_FLAG_THISSYSTEM_ALLOWED_RESOURCES|HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT)) {
+  if (flags & ~(HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED
+                |HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM
+                |HWLOC_TOPOLOGY_FLAG_THISSYSTEM_ALLOWED_RESOURCES
+                |HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT
+                |HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING
+                |HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING
+                |HWLOC_TOPOLOGY_FLAG_DONT_CHANGE_BINDING
+                |HWLOC_TOPOLOGY_FLAG_NO_DISTANCES
+                |HWLOC_TOPOLOGY_FLAG_NO_MEMATTRS
+                |HWLOC_TOPOLOGY_FLAG_NO_CPUKINDS)) {
+    errno = EINVAL;
+    return -1;
+  }
+
+  if ((flags & (HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING|HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM)) == HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING) {
+    /* RESTRICT_TO_CPUBINDING requires THISSYSTEM for binding */
+    errno = EINVAL;
+    return -1;
+  }
+  if ((flags & (HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING|HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM)) == HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING) {
+    /* RESTRICT_TO_MEMBINDING requires THISSYSTEM for binding */
    errno = EINVAL;
    return -1;
  }
@ -3970,15 +4100,11 @@ hwloc_topology_load (struct hwloc_topology *topology)
   */
  hwloc_set_binding_hooks(topology);

-  hwloc_pci_discovery_prepare(topology);
-
  /* actual topology discovery */
  err = hwloc_discover(topology, &dstatus);
  if (err < 0)
    goto out;

-  hwloc_pci_discovery_exit(topology);
-
 #ifndef HWLOC_DEBUG
  if (getenv("HWLOC_DEBUG_CHECK"))
 #endif
@ -4000,9 +4126,35 @@ hwloc_topology_load (struct hwloc_topology *topology)
  /* Same for memattrs */
  hwloc_internal_memattrs_need_refresh(topology);
  hwloc_internal_memattrs_refresh(topology);
+  hwloc_internal_memattrs_guess_memory_tiers(topology);

  topology->is_loaded = 1;

+  if (topology->flags & HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_CPUBINDING) {
+    /* FIXME: filter directly in backends during the discovery.
+     * Only x86 does it because binding may cause issues on Windows.
+     */
+    hwloc_bitmap_t set = hwloc_bitmap_alloc();
+    if (set) {
+      err = hwloc_get_cpubind(topology, set, HWLOC_CPUBIND_STRICT);
+      if (!err)
+        hwloc_topology_restrict(topology, set, 0);
+      hwloc_bitmap_free(set);
+    }
+  }
+  if (topology->flags & HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_MEMBINDING) {
+    /* FIXME: filter directly in backends during the discovery.
+     */
+    hwloc_bitmap_t set = hwloc_bitmap_alloc();
+    hwloc_membind_policy_t policy;
+    if (set) {
+      err = hwloc_get_membind(topology, set, &policy, HWLOC_MEMBIND_STRICT | HWLOC_MEMBIND_BYNODESET);
+      if (!err)
+        hwloc_topology_restrict(topology, set, HWLOC_RESTRICT_FLAG_BYNODESET);
+      hwloc_bitmap_free(set);
+    }
+  }
+
  if (topology->backend_phases & HWLOC_DISC_PHASE_TWEAK) {
    dstatus.phase = HWLOC_DISC_PHASE_TWEAK;
    hwloc_discover_by_phase(topology, &dstatus, "TWEAK");
@ -4278,14 +4430,13 @@ hwloc_topology_restrict(struct hwloc_topology *topology, hwloc_const_bitmap_t se
  hwloc_bitmap_free(droppedcpuset);
  hwloc_bitmap_free(droppednodeset);

-  if (hwloc_topology_reconnect(topology, 0) < 0)
+  if (hwloc_filter_levels_keep_structure(topology) < 0) /* takes care of reconnecting internally */
    goto out;

  /* some objects may have disappeared, we need to update distances objs arrays */
  hwloc_internal_distances_invalidate_cached_objs(topology);
  hwloc_internal_memattrs_need_refresh(topology);

-  hwloc_filter_levels_keep_structure(topology);
  hwloc_propagate_symmetric_subtree(topology, topology->levels[0][0]);
  propagate_total_memory(topology->levels[0][0]);
  hwloc_internal_cpukinds_restrict(topology);
--- a/src/3rdparty/hwloc/src/traversal.c
+++ b/src/3rdparty/hwloc/src/traversal.c
@ -1,6 +1,6 @@
 /*
 * Copyright © 2009 CNRS
- * Copyright © 2009-2020 Inria.  All rights reserved.
+ * Copyright © 2009-2021 Inria.  All rights reserved.
 * Copyright © 2009-2010, 2020 Université Bordeaux
 * Copyright © 2009-2011 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
@ -395,6 +395,8 @@ hwloc_type_sscanf(const char *string, hwloc_obj_type_t *typep,
  } else if (hwloc__type_match(string, "pcibridge", 5)) {
    type = HWLOC_OBJ_BRIDGE;
    ubtype = HWLOC_OBJ_BRIDGE_PCI;
+    /* if downstream_type can ever be non-PCI, we'll have to make strings more precise,
+     * or relax the hwloc_type_sscanf test */

  } else if (hwloc__type_match(string, "pcidev", 3)) {
    type = HWLOC_OBJ_PCI_DEVICE;
@ -448,7 +450,9 @@ hwloc_type_sscanf(const char *string, hwloc_obj_type_t *typep,
      attrp->group.depth = depthattr;
    } else if (type == HWLOC_OBJ_BRIDGE && attrsize >= sizeof(attrp->bridge)) {
      attrp->bridge.upstream_type = ubtype;
-      attrp->bridge.downstream_type = HWLOC_OBJ_BRIDGE_PCI; /* nothing else so far */
+      attrp->bridge.downstream_type = HWLOC_OBJ_BRIDGE_PCI;
+      /* if downstream_type can ever be non-PCI, we'll have to make strings more precise,
+       * or relax the hwloc_type_sscanf test */
    } else if (type == HWLOC_OBJ_OS_DEVICE && attrsize >= sizeof(attrp->osdev)) {
      attrp->osdev.type = ostype;
    }
@ -531,6 +535,9 @@ hwloc_obj_type_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t
    else
      return hwloc_snprintf(string, size, "%s", hwloc_obj_type_string(type));
  case HWLOC_OBJ_BRIDGE:
+    /* if downstream_type can ever be non-PCI, we'll have to make strings more precise,
+     * or relax the hwloc_type_sscanf test */
+    assert(obj->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI);
    return hwloc_snprintf(string, size, obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI ? "PCIBridge" : "HostBridge");
  case HWLOC_OBJ_PCI_DEVICE:
    return hwloc_snprintf(string, size, "PCI");
@ -648,8 +655,11 @@ hwloc_obj_attr_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t
      } else
        *up = '\0';
      /* downstream is_PCI */
-      snprintf(down, sizeof(down), "buses=%04x:[%02x-%02x]",
-	       obj->attr->bridge.downstream.pci.domain, obj->attr->bridge.downstream.pci.secondary_bus, obj->attr->bridge.downstream.pci.subordinate_bus);
+      if (obj->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI) {
+        snprintf(down, sizeof(down), "buses=%04x:[%02x-%02x]",
+                 obj->attr->bridge.downstream.pci.domain, obj->attr->bridge.downstream.pci.secondary_bus, obj->attr->bridge.downstream.pci.subordinate_bus);
+      } else
+        assert(0);
      if (*up)
 	res = hwloc_snprintf(string, size, "%s%s%s", up, separator, down);
      else
@ -736,3 +746,92 @@ int hwloc_bitmap_singlify_per_core(hwloc_topology_t topology, hwloc_bitmap_t cpu
  }
  return 0;
 }
+
+hwloc_obj_t
+hwloc_get_obj_with_same_locality(hwloc_topology_t topology, hwloc_obj_t src,
+                                 hwloc_obj_type_t type, const char *subtype, const char *nameprefix,
+                                 unsigned long flags)
+{
+  if (flags) {
+    errno = EINVAL;
+    return NULL;
+  }
+
+  if (hwloc_obj_type_is_normal(src->type) || hwloc_obj_type_is_memory(src->type)) {
+    /* normal/memory type, look for normal/memory type with same sets */
+    hwloc_obj_t obj;
+
+    if (!hwloc_obj_type_is_normal(type) && !hwloc_obj_type_is_memory(type)) {
+      errno = EINVAL;
+      return NULL;
+    }
+
+    obj = NULL;
+    while ((obj = hwloc_get_next_obj_by_type(topology, type, obj)) != NULL) {
+      if (!hwloc_bitmap_isequal(src->cpuset, obj->cpuset)
+          || !hwloc_bitmap_isequal(src->nodeset, obj->nodeset))
+        continue;
+      if (subtype && (!obj->subtype || strcasecmp(subtype, obj->subtype)))
+        continue;
+      if (nameprefix && (!obj->name || hwloc_strncasecmp(nameprefix, obj->name, strlen(nameprefix))))
+        continue;
+      return obj;
+    }
+    errno = ENOENT;
+    return NULL;
+
+  } else if (hwloc_obj_type_is_io(src->type)) {
+    /* I/O device, look for PCI/OS in same PCI */
+    hwloc_obj_t pci;
+
+    if ((src->type != HWLOC_OBJ_OS_DEVICE && src->type != HWLOC_OBJ_PCI_DEVICE)
+        || (type != HWLOC_OBJ_OS_DEVICE && type != HWLOC_OBJ_PCI_DEVICE)) {
+      errno = EINVAL;
+      return NULL;
+    }
+
+    /* walk up to find the container */
+    pci = src;
+    while (pci->type == HWLOC_OBJ_OS_DEVICE)
+      pci = pci->parent;
+
+    if (type == HWLOC_OBJ_PCI_DEVICE) {
+      if (pci->type != HWLOC_OBJ_PCI_DEVICE) {
+        errno = ENOENT;
+        return NULL;
+      }
+      if (subtype && (!pci->subtype || strcasecmp(subtype, pci->subtype))) {
+        errno = ENOENT;
+        return NULL;
+      }
+      if (nameprefix && (!pci->name || hwloc_strncasecmp(nameprefix, pci->name, strlen(nameprefix)))) {
+        errno = ENOENT;
+        return NULL;
+      }
+      return pci;
+
+    } else {
+      /* find a matching osdev child */
+      assert(type == HWLOC_OBJ_OS_DEVICE);
+      /* FIXME: won't work if we ever store osdevs in osdevs */
+      hwloc_obj_t child;
+      for(child = pci->io_first_child; child; child = child->next_sibling) {
+        if (child->type != HWLOC_OBJ_OS_DEVICE)
+          /* FIXME: should never occur currently */
+          continue;
+        if (subtype && (!child->subtype || strcasecmp(subtype, child->subtype)))
+          continue;
+        if (nameprefix && (!child->name || hwloc_strncasecmp(nameprefix, child->name, strlen(nameprefix))))
+          continue;
+        return child;
+      }
+    }
+    errno = ENOENT;
+    return NULL;
+
+  } else {
+    /* nothing for Misc */
+    errno = EINVAL;
+    return NULL;
+  }
+}
--- a/src/3rdparty/libethash/CMakeLists.txt
+++ b/src/3rdparty/libethash/CMakeLists.txt
@ -1,4 +1,4 @@
-cmake_minimum_required (VERSION 2.8.12)
+cmake_minimum_required(VERSION 3.1)
 project (ethash C)

 set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} -Os")
--- a/src/3rdparty/llhttp/api.c
+++ b/src/3rdparty/llhttp/api.c
@ -24,6 +24,70 @@ void llhttp_init(llhttp_t* parser, llhttp_type_t type,
 }


+#if defined(__wasm__)
+
+extern int wasm_on_message_begin(llhttp_t * p);
+extern int wasm_on_url(llhttp_t* p, const char* at, size_t length);
+extern int wasm_on_status(llhttp_t* p, const char* at, size_t length);
+extern int wasm_on_header_field(llhttp_t* p, const char* at, size_t length);
+extern int wasm_on_header_value(llhttp_t* p, const char* at, size_t length);
+extern int wasm_on_headers_complete(llhttp_t * p);
+extern int wasm_on_body(llhttp_t* p, const char* at, size_t length);
+extern int wasm_on_message_complete(llhttp_t * p);
+
+const llhttp_settings_t wasm_settings = {
+  wasm_on_message_begin,
+  wasm_on_url,
+  wasm_on_status,
+  wasm_on_header_field,
+  wasm_on_header_value,
+  wasm_on_headers_complete,
+  wasm_on_body,
+  wasm_on_message_complete,
+  NULL,
+  NULL,
+};
+
+
+llhttp_t* llhttp_alloc(llhttp_type_t type) {
+  llhttp_t* parser = malloc(sizeof(llhttp_t));
+  llhttp_init(parser, type, &wasm_settings);
+  return parser;
+}
+
+void llhttp_free(llhttp_t* parser) {
+  free(parser);
+}
+
+/* Some getters required to get stuff from the parser */
+
+uint8_t llhttp_get_type(llhttp_t* parser) {
+  return parser->type;
+}
+
+uint8_t llhttp_get_http_major(llhttp_t* parser) {
+  return parser->http_major;
+}
+
+uint8_t llhttp_get_http_minor(llhttp_t* parser) {
+  return parser->http_minor;
+}
+
+uint8_t llhttp_get_method(llhttp_t* parser) {
+  return parser->method;
+}
+
+int llhttp_get_status_code(llhttp_t* parser) {
+  return parser->status_code;
+}
+
+uint8_t llhttp_get_upgrade(llhttp_t* parser) {
+  return parser->upgrade;
+}
+
+#endif  // defined(__wasm__)
+
+
 void llhttp_reset(llhttp_t* parser) {
  llhttp_type_t type = parser->type;
  const llhttp_settings_t* settings = parser->settings;
@ -150,6 +214,7 @@ void llhttp_set_lenient_headers(llhttp_t* parser, int enabled) {
  }
 }

+
 void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled) {
  if (enabled) {
    parser->lenient_flags |= LENIENT_CHUNKED_LENGTH;
@ -159,6 +224,14 @@ void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled) {
 }


+void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled) {
+  if (enabled) {
+    parser->lenient_flags |= LENIENT_KEEP_ALIVE;
+  } else {
+    parser->lenient_flags &= ~LENIENT_KEEP_ALIVE;
+  }
+}
+
 /* Callbacks */


--- a/src/3rdparty/llhttp/api.h
+++ b/src/3rdparty/llhttp/api.h
@ -5,6 +5,12 @@ extern "C" {
 #endif
 #include <stddef.h>

+#if defined(__wasm__)
+#define LLHTTP_EXPORT __attribute__((visibility("default")))
+#else
+#define LLHTTP_EXPORT
+#endif
+
 typedef llhttp__internal_t llhttp_t;
 typedef struct llhttp_settings_s llhttp_settings_t;

@ -55,15 +61,46 @@ struct llhttp_settings_s {
 * the `parser` here. In practice, `settings` has to be either a static
 * variable or be allocated with `malloc`, `new`, etc.
 */
+LLHTTP_EXPORT
 void llhttp_init(llhttp_t* parser, llhttp_type_t type,
                 const llhttp_settings_t* settings);

+#if defined(__wasm__)
+
+LLHTTP_EXPORT
+llhttp_t* llhttp_alloc(llhttp_type_t type);
+
+LLHTTP_EXPORT
+void llhttp_free(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_type(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_http_major(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_http_minor(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_method(llhttp_t* parser);
+
+LLHTTP_EXPORT
+int llhttp_get_status_code(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_upgrade(llhttp_t* parser);
+
+#endif  // defined(__wasm__)
+
 /* Reset an already initialized parser back to the start state, preserving the
 * existing parser type, callback settings, user data, and lenient flags.
 */
+LLHTTP_EXPORT
 void llhttp_reset(llhttp_t* parser);

 /* Initialize the settings object */
+LLHTTP_EXPORT
 void llhttp_settings_init(llhttp_settings_t* settings);

 /* Parse full or partial request/response, invoking user callbacks along the
@ -82,6 +119,7 @@ void llhttp_settings_init(llhttp_settings_t* settings);
 * to return the same error upon each successive call up until `llhttp_init()`
 * is called.
 */
+LLHTTP_EXPORT
 llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len);

 /* This method should be called when the other side has no further bytes to
@ -92,16 +130,19 @@ llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len);
 * connection. This method will invoke `on_message_complete()` callback if the
 * request was terminated safely. Otherwise a error code would be returned.
 */
+LLHTTP_EXPORT
 llhttp_errno_t llhttp_finish(llhttp_t* parser);

 /* Returns `1` if the incoming message is parsed until the last byte, and has
 * to be completed by calling `llhttp_finish()` on EOF
 */
+LLHTTP_EXPORT
 int llhttp_message_needs_eof(const llhttp_t* parser);

 /* Returns `1` if there might be any other messages following the last that was
 * successfully parsed.
 */
+LLHTTP_EXPORT
 int llhttp_should_keep_alive(const llhttp_t* parser);

 /* Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set
@ -110,6 +151,7 @@ int llhttp_should_keep_alive(const llhttp_t* parser);
 * Important: do not call this from user callbacks! User callbacks must return
 * `HPE_PAUSED` if pausing is required.
 */
+LLHTTP_EXPORT
 void llhttp_pause(llhttp_t* parser);

 /* Might be called to resume the execution after the pause in user's callback.
@ -117,6 +159,7 @@ void llhttp_pause(llhttp_t* parser);
 *
 * Call this only if `llhttp_execute()` returns `HPE_PAUSED`.
 */
+LLHTTP_EXPORT
 void llhttp_resume(llhttp_t* parser);

 /* Might be called to resume the execution after the pause in user's callback.
@ -124,9 +167,11 @@ void llhttp_resume(llhttp_t* parser);
 *
 * Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`
 */
+LLHTTP_EXPORT
 void llhttp_resume_after_upgrade(llhttp_t* parser);

 /* Returns the latest return error */
+LLHTTP_EXPORT
 llhttp_errno_t llhttp_get_errno(const llhttp_t* parser);

 /* Returns the verbal explanation of the latest returned error.
@ -134,6 +179,7 @@ llhttp_errno_t llhttp_get_errno(const llhttp_t* parser);
 * Note: User callback should set error reason when returning the error. See
 * `llhttp_set_error_reason()` for details.
 */
+LLHTTP_EXPORT
 const char* llhttp_get_error_reason(const llhttp_t* parser);

 /* Assign verbal description to the returned error. Must be called in user
@ -141,6 +187,7 @@ const char* llhttp_get_error_reason(const llhttp_t* parser);
 *
 * Note: `HPE_USER` error code might be useful in user callbacks.
 */
+LLHTTP_EXPORT
 void llhttp_set_error_reason(llhttp_t* parser, const char* reason);

 /* Returns the pointer to the last parsed byte before the returned error. The
@ -148,12 +195,15 @@ void llhttp_set_error_reason(llhttp_t* parser, const char* reason);
 *
 * Note: this method might be useful for counting the number of parsed bytes.
 */
+LLHTTP_EXPORT
 const char* llhttp_get_error_pos(const llhttp_t* parser);

 /* Returns textual name of error code */
+LLHTTP_EXPORT
 const char* llhttp_errno_name(llhttp_errno_t err);

 /* Returns textual name of HTTP method */
+LLHTTP_EXPORT
 const char* llhttp_method_name(llhttp_method_t method);


@ -166,6 +216,7 @@ const char* llhttp_method_name(llhttp_method_t method);
 *
 * **(USE AT YOUR OWN RISK)**
 */
+LLHTTP_EXPORT
 void llhttp_set_lenient_headers(llhttp_t* parser, int enabled);


@ -179,8 +230,23 @@ void llhttp_set_lenient_headers(llhttp_t* parser, int enabled);
 *
 * **(USE AT YOUR OWN RISK)**
 */
+LLHTTP_EXPORT
 void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled);

+
+/* Enables/disables lenient handling of `Connection: close` and HTTP/1.0
+ * requests responses.
+ *
+ * Normally `llhttp` would error on (in strict mode) or discard (in loose mode)
+ * the HTTP request/response after the request/response with `Connection: close`
+ * and `Content-Length`. This is important to prevent cache poisoning attacks,
+ * but might interact badly with outdated and insecure clients. With this flag
+ * the extra request/response will be parsed normally.
+ *
+ * **(USE AT YOUR OWN RISK)**
+ */
+void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled);
+
 #ifdef __cplusplus
 }  /* extern "C" */
 #endif
--- a/src/3rdparty/llhttp/llhttp.c
+++ b/src/3rdparty/llhttp/llhttp.c
--- a/src/3rdparty/llhttp/llhttp.h
+++ b/src/3rdparty/llhttp/llhttp.h
@ -1,8 +1,8 @@
 #ifndef INCLUDE_LLHTTP_H_
 #define INCLUDE_LLHTTP_H_

-#define LLHTTP_VERSION_MAJOR 4
-#define LLHTTP_VERSION_MINOR 0
+#define LLHTTP_VERSION_MAJOR 5
+#define LLHTTP_VERSION_MINOR 1
 #define LLHTTP_VERSION_PATCH 0

 #ifndef LLHTTP_STRICT_MODE
@ -79,7 +79,8 @@ enum llhttp_errno {
  HPE_CB_CHUNK_COMPLETE = 20,
  HPE_PAUSED = 21,
  HPE_PAUSED_UPGRADE = 22,
-  HPE_USER = 23
+  HPE_PAUSED_H2_UPGRADE = 23,
+  HPE_USER = 24
 };
 typedef enum llhttp_errno llhttp_errno_t;

@ -98,7 +99,8 @@ typedef enum llhttp_flags llhttp_flags_t;

 enum llhttp_lenient_flags {
  LENIENT_HEADERS = 0x1,
-  LENIENT_CHUNKED_LENGTH = 0x2
+  LENIENT_CHUNKED_LENGTH = 0x2,
+  LENIENT_KEEP_ALIVE = 0x4
 };
 typedef enum llhttp_lenient_flags llhttp_lenient_flags_t;

@ -190,7 +192,8 @@ typedef enum llhttp_method llhttp_method_t;
  XX(20, CB_CHUNK_COMPLETE, CB_CHUNK_COMPLETE) \
  XX(21, PAUSED, PAUSED) \
  XX(22, PAUSED_UPGRADE, PAUSED_UPGRADE) \
-  XX(23, USER, USER) \
+  XX(23, PAUSED_H2_UPGRADE, PAUSED_H2_UPGRADE) \
+  XX(24, USER, USER) \


 #define HTTP_METHOD_MAP(XX) \
@ -255,6 +258,12 @@ extern "C" {
 #endif
 #include <stddef.h>

+#if defined(__wasm__)
+#define LLHTTP_EXPORT __attribute__((visibility("default")))
+#else
+#define LLHTTP_EXPORT
+#endif
+
 typedef llhttp__internal_t llhttp_t;
 typedef struct llhttp_settings_s llhttp_settings_t;

@ -305,15 +314,46 @@ struct llhttp_settings_s {
 * the `parser` here. In practice, `settings` has to be either a static
 * variable or be allocated with `malloc`, `new`, etc.
 */
+LLHTTP_EXPORT
 void llhttp_init(llhttp_t* parser, llhttp_type_t type,
                 const llhttp_settings_t* settings);

+#if defined(__wasm__)
+
+LLHTTP_EXPORT
+llhttp_t* llhttp_alloc(llhttp_type_t type);
+
+LLHTTP_EXPORT
+void llhttp_free(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_type(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_http_major(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_http_minor(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_method(llhttp_t* parser);
+
+LLHTTP_EXPORT
+int llhttp_get_status_code(llhttp_t* parser);
+
+LLHTTP_EXPORT
+uint8_t llhttp_get_upgrade(llhttp_t* parser);
+
+#endif  // defined(__wasm__)
+
 /* Reset an already initialized parser back to the start state, preserving the
 * existing parser type, callback settings, user data, and lenient flags.
 */
+LLHTTP_EXPORT
 void llhttp_reset(llhttp_t* parser);

 /* Initialize the settings object */
+LLHTTP_EXPORT
 void llhttp_settings_init(llhttp_settings_t* settings);

 /* Parse full or partial request/response, invoking user callbacks along the
@ -332,6 +372,7 @@ void llhttp_settings_init(llhttp_settings_t* settings);
 * to return the same error upon each successive call up until `llhttp_init()`
 * is called.
 */
+LLHTTP_EXPORT
 llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len);

 /* This method should be called when the other side has no further bytes to
@ -342,16 +383,19 @@ llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len);
 * connection. This method will invoke `on_message_complete()` callback if the
 * request was terminated safely. Otherwise a error code would be returned.
 */
+LLHTTP_EXPORT
 llhttp_errno_t llhttp_finish(llhttp_t* parser);

 /* Returns `1` if the incoming message is parsed until the last byte, and has
 * to be completed by calling `llhttp_finish()` on EOF
 */
+LLHTTP_EXPORT
 int llhttp_message_needs_eof(const llhttp_t* parser);

 /* Returns `1` if there might be any other messages following the last that was
 * successfully parsed.
 */
+LLHTTP_EXPORT
 int llhttp_should_keep_alive(const llhttp_t* parser);

 /* Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set
@ -360,6 +404,7 @@ int llhttp_should_keep_alive(const llhttp_t* parser);
 * Important: do not call this from user callbacks! User callbacks must return
 * `HPE_PAUSED` if pausing is required.
 */
+LLHTTP_EXPORT
 void llhttp_pause(llhttp_t* parser);

 /* Might be called to resume the execution after the pause in user's callback.
@ -367,6 +412,7 @@ void llhttp_pause(llhttp_t* parser);
 *
 * Call this only if `llhttp_execute()` returns `HPE_PAUSED`.
 */
+LLHTTP_EXPORT
 void llhttp_resume(llhttp_t* parser);

 /* Might be called to resume the execution after the pause in user's callback.
@ -374,9 +420,11 @@ void llhttp_resume(llhttp_t* parser);
 *
 * Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`
 */
+LLHTTP_EXPORT
 void llhttp_resume_after_upgrade(llhttp_t* parser);

 /* Returns the latest return error */
+LLHTTP_EXPORT
 llhttp_errno_t llhttp_get_errno(const llhttp_t* parser);

 /* Returns the verbal explanation of the latest returned error.
@ -384,6 +432,7 @@ llhttp_errno_t llhttp_get_errno(const llhttp_t* parser);
 * Note: User callback should set error reason when returning the error. See
 * `llhttp_set_error_reason()` for details.
 */
+LLHTTP_EXPORT
 const char* llhttp_get_error_reason(const llhttp_t* parser);

 /* Assign verbal description to the returned error. Must be called in user
@ -391,6 +440,7 @@ const char* llhttp_get_error_reason(const llhttp_t* parser);
 *
 * Note: `HPE_USER` error code might be useful in user callbacks.
 */
+LLHTTP_EXPORT
 void llhttp_set_error_reason(llhttp_t* parser, const char* reason);

 /* Returns the pointer to the last parsed byte before the returned error. The
@ -398,12 +448,15 @@ void llhttp_set_error_reason(llhttp_t* parser, const char* reason);
 *
 * Note: this method might be useful for counting the number of parsed bytes.
 */
+LLHTTP_EXPORT
 const char* llhttp_get_error_pos(const llhttp_t* parser);

 /* Returns textual name of error code */
+LLHTTP_EXPORT
 const char* llhttp_errno_name(llhttp_errno_t err);

 /* Returns textual name of HTTP method */
+LLHTTP_EXPORT
 const char* llhttp_method_name(llhttp_method_t method);


@ -416,6 +469,7 @@ const char* llhttp_method_name(llhttp_method_t method);
 *
 * **(USE AT YOUR OWN RISK)**
 */
+LLHTTP_EXPORT
 void llhttp_set_lenient_headers(llhttp_t* parser, int enabled);


@ -429,8 +483,23 @@ void llhttp_set_lenient_headers(llhttp_t* parser, int enabled);
 *
 * **(USE AT YOUR OWN RISK)**
 */
+LLHTTP_EXPORT
 void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled);

+
+/* Enables/disables lenient handling of `Connection: close` and HTTP/1.0
+ * requests responses.
+ *
+ * Normally `llhttp` would error on (in strict mode) or discard (in loose mode)
+ * the HTTP request/response after the request/response with `Connection: close`
+ * and `Content-Length`. This is important to prevent cache poisoning attacks,
+ * but might interact badly with outdated and insecure clients. With this flag
+ * the extra request/response will be parsed normally.
+ *
+ * **(USE AT YOUR OWN RISK)**
+ */
+void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled);
+
 #ifdef __cplusplus
 }  /* extern "C" */
 #endif
--- a/src/3rdparty/rapidjson/allocators.h
+++ b/src/3rdparty/rapidjson/allocators.h
@ -1,21 +1,28 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_ALLOCATORS_H_
 #define RAPIDJSON_ALLOCATORS_H_

 #include "rapidjson.h"
+#include "internal/meta.h"
+
+#include <memory>
+
+#if RAPIDJSON_HAS_CXX11
+#include <type_traits>
+#endif

 RAPIDJSON_NAMESPACE_BEGIN

@ -24,10 +31,10 @@ RAPIDJSON_NAMESPACE_BEGIN

 /*! \class rapidjson::Allocator
    \brief Concept for allocating, resizing and freeing memory block.
-    
+
    Note that Malloc() and Realloc() are non-static but Free() is static.
-    
-    So if an allocator need to support Free(), it needs to put its pointer in 
+
+    So if an allocator need to support Free(), it needs to put its pointer in
    the header of memory block.

 \code
@ -75,28 +82,35 @@ concept Allocator {
 class CrtAllocator {
 public:
    static const bool kNeedFree = true;
-    void* Malloc(size_t size) { 
+    void* Malloc(size_t size) {
        if (size) //  behavior of malloc(0) is implementation defined.
-            return std::malloc(size);
+            return RAPIDJSON_MALLOC(size);
        else
            return NULL; // standardize to returning NULL.
    }
    void* Realloc(void* originalPtr, size_t originalSize, size_t newSize) {
        (void)originalSize;
        if (newSize == 0) {
-            std::free(originalPtr);
+            RAPIDJSON_FREE(originalPtr);
            return NULL;
        }
-        return std::realloc(originalPtr, newSize);
+        return RAPIDJSON_REALLOC(originalPtr, newSize);
+    }
+    static void Free(void *ptr) RAPIDJSON_NOEXCEPT { RAPIDJSON_FREE(ptr); }
+
+    bool operator==(const CrtAllocator&) const RAPIDJSON_NOEXCEPT {
+        return true;
+    }
+    bool operator!=(const CrtAllocator&) const RAPIDJSON_NOEXCEPT {
+        return false;
    }
-    static void Free(void *ptr) { std::free(ptr); }
 };

 ///////////////////////////////////////////////////////////////////////////////
 // MemoryPoolAllocator

 //! Default memory allocator used by the parser and DOM.
-/*! This allocator allocate memory blocks from pre-allocated memory chunks. 
+/*! This allocator allocate memory blocks from pre-allocated memory chunks.

    It does not free memory blocks. And Realloc() only allocate new memory.

@ -113,16 +127,64 @@ public:
 */
 template <typename BaseAllocator = CrtAllocator>
 class MemoryPoolAllocator {
+    //! Chunk header for perpending to each chunk.
+    /*! Chunks are stored as a singly linked list.
+    */
+    struct ChunkHeader {
+        size_t capacity;    //!< Capacity of the chunk in bytes (excluding the header itself).
+        size_t size;        //!< Current size of allocated memory in bytes.
+        ChunkHeader *next;  //!< Next chunk in the linked list.
+    };
+
+    struct SharedData {
+        ChunkHeader *chunkHead;  //!< Head of the chunk linked-list. Only the head chunk serves allocation.
+        BaseAllocator* ownBaseAllocator; //!< base allocator created by this object.
+        size_t refcount;
+        bool ownBuffer;
+    };
+
+    static const size_t SIZEOF_SHARED_DATA = RAPIDJSON_ALIGN(sizeof(SharedData));
+    static const size_t SIZEOF_CHUNK_HEADER = RAPIDJSON_ALIGN(sizeof(ChunkHeader));
+
+    static inline ChunkHeader *GetChunkHead(SharedData *shared)
+    {
+        return reinterpret_cast<ChunkHeader*>(reinterpret_cast<uint8_t*>(shared) + SIZEOF_SHARED_DATA);
+    }
+    static inline uint8_t *GetChunkBuffer(SharedData *shared)
+    {
+        return reinterpret_cast<uint8_t*>(shared->chunkHead) + SIZEOF_CHUNK_HEADER;
+    }
+
+    static const size_t kDefaultChunkCapacity = RAPIDJSON_ALLOCATOR_DEFAULT_CHUNK_CAPACITY; //!< Default chunk capacity.
+
 public:
    static const bool kNeedFree = false;    //!< Tell users that no need to call Free() with this allocator. (concept Allocator)
+    static const bool kRefCounted = true;   //!< Tell users that this allocator is reference counted on copy

    //! Constructor with chunkSize.
    /*! \param chunkSize The size of memory chunk. The default is kDefaultChunkSize.
        \param baseAllocator The allocator for allocating memory chunks.
    */
-    MemoryPoolAllocator(size_t chunkSize = kDefaultChunkCapacity, BaseAllocator* baseAllocator = 0) : 
-        chunkHead_(0), chunk_capacity_(chunkSize), userBuffer_(0), baseAllocator_(baseAllocator), ownBaseAllocator_(0)
+    explicit
+    MemoryPoolAllocator(size_t chunkSize = kDefaultChunkCapacity, BaseAllocator* baseAllocator = 0) :
+        chunk_capacity_(chunkSize),
+        baseAllocator_(baseAllocator ? baseAllocator : RAPIDJSON_NEW(BaseAllocator)()),
+        shared_(static_cast<SharedData*>(baseAllocator_ ? baseAllocator_->Malloc(SIZEOF_SHARED_DATA + SIZEOF_CHUNK_HEADER) : 0))
    {
+        RAPIDJSON_ASSERT(baseAllocator_ != 0);
+        RAPIDJSON_ASSERT(shared_ != 0);
+        if (baseAllocator) {
+            shared_->ownBaseAllocator = 0;
+        }
+        else {
+            shared_->ownBaseAllocator = baseAllocator_;
+        }
+        shared_->chunkHead = GetChunkHead(shared_);
+        shared_->chunkHead->capacity = 0;
+        shared_->chunkHead->size = 0;
+        shared_->chunkHead->next = 0;
+        shared_->ownBuffer = true;
+        shared_->refcount = 1;
    }

    //! Constructor with user-supplied buffer.
@ -136,41 +198,101 @@ public:
        \param baseAllocator The allocator for allocating memory chunks.
    */
    MemoryPoolAllocator(void *buffer, size_t size, size_t chunkSize = kDefaultChunkCapacity, BaseAllocator* baseAllocator = 0) :
-        chunkHead_(0), chunk_capacity_(chunkSize), userBuffer_(buffer), baseAllocator_(baseAllocator), ownBaseAllocator_(0)
+        chunk_capacity_(chunkSize),
+        baseAllocator_(baseAllocator),
+        shared_(static_cast<SharedData*>(AlignBuffer(buffer, size)))
    {
-        RAPIDJSON_ASSERT(buffer != 0);
-        RAPIDJSON_ASSERT(size > sizeof(ChunkHeader));
-        chunkHead_ = reinterpret_cast<ChunkHeader*>(buffer);
-        chunkHead_->capacity = size - sizeof(ChunkHeader);
-        chunkHead_->size = 0;
-        chunkHead_->next = 0;
+        RAPIDJSON_ASSERT(size >= SIZEOF_SHARED_DATA + SIZEOF_CHUNK_HEADER);
+        shared_->chunkHead = GetChunkHead(shared_);
+        shared_->chunkHead->capacity = size - SIZEOF_SHARED_DATA - SIZEOF_CHUNK_HEADER;
+        shared_->chunkHead->size = 0;
+        shared_->chunkHead->next = 0;
+        shared_->ownBaseAllocator = 0;
+        shared_->ownBuffer = false;
+        shared_->refcount = 1;
    }

+    MemoryPoolAllocator(const MemoryPoolAllocator& rhs) RAPIDJSON_NOEXCEPT :
+        chunk_capacity_(rhs.chunk_capacity_),
+        baseAllocator_(rhs.baseAllocator_),
+        shared_(rhs.shared_)
+    {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
+        ++shared_->refcount;
+    }
+    MemoryPoolAllocator& operator=(const MemoryPoolAllocator& rhs) RAPIDJSON_NOEXCEPT
+    {
+        RAPIDJSON_NOEXCEPT_ASSERT(rhs.shared_->refcount > 0);
+        ++rhs.shared_->refcount;
+        this->~MemoryPoolAllocator();
+        baseAllocator_ = rhs.baseAllocator_;
+        chunk_capacity_ = rhs.chunk_capacity_;
+        shared_ = rhs.shared_;
+        return *this;
+    }
+
+#if RAPIDJSON_HAS_CXX11_RVALUE_REFS
+    MemoryPoolAllocator(MemoryPoolAllocator&& rhs) RAPIDJSON_NOEXCEPT :
+        chunk_capacity_(rhs.chunk_capacity_),
+        baseAllocator_(rhs.baseAllocator_),
+        shared_(rhs.shared_)
+    {
+        RAPIDJSON_NOEXCEPT_ASSERT(rhs.shared_->refcount > 0);
+        rhs.shared_ = 0;
+    }
+    MemoryPoolAllocator& operator=(MemoryPoolAllocator&& rhs) RAPIDJSON_NOEXCEPT
+    {
+        RAPIDJSON_NOEXCEPT_ASSERT(rhs.shared_->refcount > 0);
+        this->~MemoryPoolAllocator();
+        baseAllocator_ = rhs.baseAllocator_;
+        chunk_capacity_ = rhs.chunk_capacity_;
+        shared_ = rhs.shared_;
+        rhs.shared_ = 0;
+        return *this;
+    }
+#endif
+
    //! Destructor.
    /*! This deallocates all memory chunks, excluding the user-supplied buffer.
    */
-    ~MemoryPoolAllocator() {
+    ~MemoryPoolAllocator() RAPIDJSON_NOEXCEPT {
+        if (!shared_) {
+            // do nothing if moved
+            return;
+        }
+        if (shared_->refcount > 1) {
+            --shared_->refcount;
+            return;
+        }
        Clear();
-        RAPIDJSON_DELETE(ownBaseAllocator_);
+        BaseAllocator *a = shared_->ownBaseAllocator;
+        if (shared_->ownBuffer) {
+            baseAllocator_->Free(shared_);
+        }
+        RAPIDJSON_DELETE(a);
    }

-    //! Deallocates all memory chunks, excluding the user-supplied buffer.
-    void Clear() {
-        while (chunkHead_ && chunkHead_ != userBuffer_) {
-            ChunkHeader* next = chunkHead_->next;
-            baseAllocator_->Free(chunkHead_);
-            chunkHead_ = next;
+    //! Deallocates all memory chunks, excluding the first/user one.
+    void Clear() RAPIDJSON_NOEXCEPT {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
+        for (;;) {
+            ChunkHeader* c = shared_->chunkHead;
+            if (!c->next) {
+                break;
+            }
+            shared_->chunkHead = c->next;
+            baseAllocator_->Free(c);
        }
-        if (chunkHead_ && chunkHead_ == userBuffer_)
-            chunkHead_->size = 0; // Clear user buffer
+        shared_->chunkHead->size = 0;
    }

    //! Computes the total capacity of allocated memory chunks.
    /*! \return total capacity in bytes.
    */
-    size_t Capacity() const {
+    size_t Capacity() const RAPIDJSON_NOEXCEPT {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
        size_t capacity = 0;
-        for (ChunkHeader* c = chunkHead_; c != 0; c = c->next)
+        for (ChunkHeader* c = shared_->chunkHead; c != 0; c = c->next)
            capacity += c->capacity;
        return capacity;
    }
@ -178,25 +300,35 @@ public:
    //! Computes the memory blocks allocated.
    /*! \return total used bytes.
    */
-    size_t Size() const {
+    size_t Size() const RAPIDJSON_NOEXCEPT {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
        size_t size = 0;
-        for (ChunkHeader* c = chunkHead_; c != 0; c = c->next)
+        for (ChunkHeader* c = shared_->chunkHead; c != 0; c = c->next)
            size += c->size;
        return size;
    }

+    //! Whether the allocator is shared.
+    /*! \return true or false.
+    */
+    bool Shared() const RAPIDJSON_NOEXCEPT {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
+        return shared_->refcount > 1;
+    }
+
    //! Allocates a memory block. (concept Allocator)
    void* Malloc(size_t size) {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
        if (!size)
            return NULL;

        size = RAPIDJSON_ALIGN(size);
-        if (chunkHead_ == 0 || chunkHead_->size + size > chunkHead_->capacity)
+        if (RAPIDJSON_UNLIKELY(shared_->chunkHead->size + size > shared_->chunkHead->capacity))
            if (!AddChunk(chunk_capacity_ > size ? chunk_capacity_ : size))
                return NULL;

-        void *buffer = reinterpret_cast<char *>(chunkHead_) + RAPIDJSON_ALIGN(sizeof(ChunkHeader)) + chunkHead_->size;
-        chunkHead_->size += size;
+        void *buffer = GetChunkBuffer(shared_) + shared_->chunkHead->size;
+        shared_->chunkHead->size += size;
        return buffer;
    }

@ -205,6 +337,7 @@ public:
        if (originalPtr == 0)
            return Malloc(newSize);

+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
        if (newSize == 0)
            return NULL;

@ -216,10 +349,10 @@ public:
            return originalPtr;

        // Simply expand it if it is the last allocation and there is sufficient space
-        if (originalPtr == reinterpret_cast<char *>(chunkHead_) + RAPIDJSON_ALIGN(sizeof(ChunkHeader)) + chunkHead_->size - originalSize) {
+        if (originalPtr == GetChunkBuffer(shared_) + shared_->chunkHead->size - originalSize) {
            size_t increment = static_cast<size_t>(newSize - originalSize);
-            if (chunkHead_->size + increment <= chunkHead_->capacity) {
-                chunkHead_->size += increment;
+            if (shared_->chunkHead->size + increment <= shared_->chunkHead->capacity) {
+                shared_->chunkHead->size += increment;
                return originalPtr;
            }
        }
@ -235,50 +368,325 @@ public:
    }

    //! Frees a memory block (concept Allocator)
-    static void Free(void *ptr) { (void)ptr; } // Do nothing
+    static void Free(void *ptr) RAPIDJSON_NOEXCEPT { (void)ptr; } // Do nothing
+
+    //! Compare (equality) with another MemoryPoolAllocator
+    bool operator==(const MemoryPoolAllocator& rhs) const RAPIDJSON_NOEXCEPT {
+        RAPIDJSON_NOEXCEPT_ASSERT(shared_->refcount > 0);
+        RAPIDJSON_NOEXCEPT_ASSERT(rhs.shared_->refcount > 0);
+        return shared_ == rhs.shared_;
+    }
+    //! Compare (inequality) with another MemoryPoolAllocator
+    bool operator!=(const MemoryPoolAllocator& rhs) const RAPIDJSON_NOEXCEPT {
+        return !operator==(rhs);
+    }

 private:
-    //! Copy constructor is not permitted.
-    MemoryPoolAllocator(const MemoryPoolAllocator& rhs) /* = delete */;
-    //! Copy assignment operator is not permitted.
-    MemoryPoolAllocator& operator=(const MemoryPoolAllocator& rhs) /* = delete */;
-
    //! Creates a new chunk.
    /*! \param capacity Capacity of the chunk in bytes.
        \return true if success.
    */
    bool AddChunk(size_t capacity) {
        if (!baseAllocator_)
-            ownBaseAllocator_ = baseAllocator_ = RAPIDJSON_NEW(BaseAllocator)();
-        if (ChunkHeader* chunk = reinterpret_cast<ChunkHeader*>(baseAllocator_->Malloc(RAPIDJSON_ALIGN(sizeof(ChunkHeader)) + capacity))) {
+            shared_->ownBaseAllocator = baseAllocator_ = RAPIDJSON_NEW(BaseAllocator)();
+        if (ChunkHeader* chunk = static_cast<ChunkHeader*>(baseAllocator_->Malloc(SIZEOF_CHUNK_HEADER + capacity))) {
            chunk->capacity = capacity;
            chunk->size = 0;
-            chunk->next = chunkHead_;
-            chunkHead_ =  chunk;
+            chunk->next = shared_->chunkHead;
+            shared_->chunkHead = chunk;
            return true;
        }
        else
            return false;
    }

-    static const int kDefaultChunkCapacity = RAPIDJSON_ALLOCATOR_DEFAULT_CHUNK_CAPACITY; //!< Default chunk capacity.
+    static inline void* AlignBuffer(void* buf, size_t &size)
+    {
+        RAPIDJSON_NOEXCEPT_ASSERT(buf != 0);
+        const uintptr_t mask = sizeof(void*) - 1;
+        const uintptr_t ubuf = reinterpret_cast<uintptr_t>(buf);
+        if (RAPIDJSON_UNLIKELY(ubuf & mask)) {
+            const uintptr_t abuf = (ubuf + mask) & ~mask;
+            RAPIDJSON_ASSERT(size >= abuf - ubuf);
+            buf = reinterpret_cast<void*>(abuf);
+            size -= abuf - ubuf;
+        }
+        return buf;
+    }

-    //! Chunk header for perpending to each chunk.
-    /*! Chunks are stored as a singly linked list.
-    */
-    struct ChunkHeader {
-        size_t capacity;    //!< Capacity of the chunk in bytes (excluding the header itself).
-        size_t size;        //!< Current size of allocated memory in bytes.
-        ChunkHeader *next;  //!< Next chunk in the linked list.
+    size_t chunk_capacity_;     //!< The minimum capacity of chunk when they are allocated.
+    BaseAllocator* baseAllocator_;  //!< base allocator for allocating memory chunks.
+    SharedData *shared_;        //!< The shared data of the allocator
+};
+
+namespace internal {
+    template<typename, typename = void>
+    struct IsRefCounted :
+        public FalseType
+    { };
+    template<typename T>
+    struct IsRefCounted<T, typename internal::EnableIfCond<T::kRefCounted>::Type> :
+        public TrueType
+    { };
+}
+
+template<typename T, typename A>
+inline T* Realloc(A& a, T* old_p, size_t old_n, size_t new_n)
+{
+    RAPIDJSON_NOEXCEPT_ASSERT(old_n <= SIZE_MAX / sizeof(T) && new_n <= SIZE_MAX / sizeof(T));
+    return static_cast<T*>(a.Realloc(old_p, old_n * sizeof(T), new_n * sizeof(T)));
+}
+
+template<typename T, typename A>
+inline T *Malloc(A& a, size_t n = 1)
+{
+    return Realloc<T, A>(a, NULL, 0, n);
+}
+
+template<typename T, typename A>
+inline void Free(A& a, T *p, size_t n = 1)
+{
+    static_cast<void>(Realloc<T, A>(a, p, n, 0));
+}
+
+#ifdef __GNUC__
+RAPIDJSON_DIAG_PUSH
+RAPIDJSON_DIAG_OFF(effc++) // std::allocator can safely be inherited
+#endif
+
+template <typename T, typename BaseAllocator = CrtAllocator>
+class StdAllocator :
+    public std::allocator<T>
+{
+    typedef std::allocator<T> allocator_type;
+#if RAPIDJSON_HAS_CXX11
+    typedef std::allocator_traits<allocator_type> traits_type;
+#else
+    typedef allocator_type traits_type;
+#endif
+
+public:
+    typedef BaseAllocator BaseAllocatorType;
+
+    StdAllocator() RAPIDJSON_NOEXCEPT :
+        allocator_type(),
+        baseAllocator_()
+    { }
+
+    StdAllocator(const StdAllocator& rhs) RAPIDJSON_NOEXCEPT :
+        allocator_type(rhs),
+        baseAllocator_(rhs.baseAllocator_)
+    { }
+
+    template<typename U>
+    StdAllocator(const StdAllocator<U, BaseAllocator>& rhs) RAPIDJSON_NOEXCEPT :
+        allocator_type(rhs),
+        baseAllocator_(rhs.baseAllocator_)
+    { }
+
+#if RAPIDJSON_HAS_CXX11_RVALUE_REFS
+    StdAllocator(StdAllocator&& rhs) RAPIDJSON_NOEXCEPT :
+        allocator_type(std::move(rhs)),
+        baseAllocator_(std::move(rhs.baseAllocator_))
+    { }
+#endif
+#if RAPIDJSON_HAS_CXX11
+    using propagate_on_container_move_assignment = std::true_type;
+    using propagate_on_container_swap = std::true_type;
+#endif
+
+    /* implicit */
+    StdAllocator(const BaseAllocator& allocator) RAPIDJSON_NOEXCEPT :
+        allocator_type(),
+        baseAllocator_(allocator)
+    { }
+
+    ~StdAllocator() RAPIDJSON_NOEXCEPT
+    { }
+
+    template<typename U>
+    struct rebind {
+        typedef StdAllocator<U, BaseAllocator> other;
    };

-    ChunkHeader *chunkHead_;    //!< Head of the chunk linked-list. Only the head chunk serves allocation.
-    size_t chunk_capacity_;     //!< The minimum capacity of chunk when they are allocated.
-    void *userBuffer_;          //!< User supplied buffer.
-    BaseAllocator* baseAllocator_;  //!< base allocator for allocating memory chunks.
-    BaseAllocator* ownBaseAllocator_;   //!< base allocator created by this object.
+    typedef typename traits_type::size_type         size_type;
+    typedef typename traits_type::difference_type   difference_type;
+
+    typedef typename traits_type::value_type        value_type;
+    typedef typename traits_type::pointer           pointer;
+    typedef typename traits_type::const_pointer     const_pointer;
+
+#if RAPIDJSON_HAS_CXX11
+
+    typedef typename std::add_lvalue_reference<value_type>::type &reference;
+    typedef typename std::add_lvalue_reference<typename std::add_const<value_type>::type>::type &const_reference;
+
+    pointer address(reference r) const RAPIDJSON_NOEXCEPT
+    {
+        return std::addressof(r);
+    }
+    const_pointer address(const_reference r) const RAPIDJSON_NOEXCEPT
+    {
+        return std::addressof(r);
+    }
+
+    size_type max_size() const RAPIDJSON_NOEXCEPT
+    {
+        return traits_type::max_size(*this);
+    }
+
+    template <typename ...Args>
+    void construct(pointer p, Args&&... args)
+    {
+        traits_type::construct(*this, p, std::forward<Args>(args)...);
+    }
+    void destroy(pointer p)
+    {
+        traits_type::destroy(*this, p);
+    }
+
+#else // !RAPIDJSON_HAS_CXX11
+
+    typedef typename allocator_type::reference       reference;
+    typedef typename allocator_type::const_reference const_reference;
+
+    pointer address(reference r) const RAPIDJSON_NOEXCEPT
+    {
+        return allocator_type::address(r);
+    }
+    const_pointer address(const_reference r) const RAPIDJSON_NOEXCEPT
+    {
+        return allocator_type::address(r);
+    }
+
+    size_type max_size() const RAPIDJSON_NOEXCEPT
+    {
+        return allocator_type::max_size();
+    }
+
+    void construct(pointer p, const_reference r)
+    {
+        allocator_type::construct(p, r);
+    }
+    void destroy(pointer p)
+    {
+        allocator_type::destroy(p);
+    }
+
+#endif // !RAPIDJSON_HAS_CXX11
+
+    template <typename U>
+    U* allocate(size_type n = 1, const void* = 0)
+    {
+        return RAPIDJSON_NAMESPACE::Malloc<U>(baseAllocator_, n);
+    }
+    template <typename U>
+    void deallocate(U* p, size_type n = 1)
+    {
+        RAPIDJSON_NAMESPACE::Free<U>(baseAllocator_, p, n);
+    }
+
+    pointer allocate(size_type n = 1, const void* = 0)
+    {
+        return allocate<value_type>(n);
+    }
+    void deallocate(pointer p, size_type n = 1)
+    {
+        deallocate<value_type>(p, n);
+    }
+
+#if RAPIDJSON_HAS_CXX11
+    using is_always_equal = std::is_empty<BaseAllocator>;
+#endif
+
+    template<typename U>
+    bool operator==(const StdAllocator<U, BaseAllocator>& rhs) const RAPIDJSON_NOEXCEPT
+    {
+        return baseAllocator_ == rhs.baseAllocator_;
+    }
+    template<typename U>
+    bool operator!=(const StdAllocator<U, BaseAllocator>& rhs) const RAPIDJSON_NOEXCEPT
+    {
+        return !operator==(rhs);
+    }
+
+    //! rapidjson Allocator concept
+    static const bool kNeedFree = BaseAllocator::kNeedFree;
+    static const bool kRefCounted = internal::IsRefCounted<BaseAllocator>::Value;
+    void* Malloc(size_t size)
+    {
+        return baseAllocator_.Malloc(size);
+    }
+    void* Realloc(void* originalPtr, size_t originalSize, size_t newSize)
+    {
+        return baseAllocator_.Realloc(originalPtr, originalSize, newSize);
+    }
+    static void Free(void *ptr) RAPIDJSON_NOEXCEPT
+    {
+        BaseAllocator::Free(ptr);
+    }
+
+private:
+    template <typename, typename>
+    friend class StdAllocator; // access to StdAllocator<!T>.*
+
+    BaseAllocator baseAllocator_;
 };

+#if !RAPIDJSON_HAS_CXX17 // std::allocator<void> deprecated in C++17
+template <typename BaseAllocator>
+class StdAllocator<void, BaseAllocator> :
+    public std::allocator<void>
+{
+    typedef std::allocator<void> allocator_type;
+
+public:
+    typedef BaseAllocator BaseAllocatorType;
+
+    StdAllocator() RAPIDJSON_NOEXCEPT :
+        allocator_type(),
+        baseAllocator_()
+    { }
+
+    StdAllocator(const StdAllocator& rhs) RAPIDJSON_NOEXCEPT :
+        allocator_type(rhs),
+        baseAllocator_(rhs.baseAllocator_)
+    { }
+
+    template<typename U>
+    StdAllocator(const StdAllocator<U, BaseAllocator>& rhs) RAPIDJSON_NOEXCEPT :
+        allocator_type(rhs),
+        baseAllocator_(rhs.baseAllocator_)
+    { }
+
+    /* implicit */
+    StdAllocator(const BaseAllocator& baseAllocator) RAPIDJSON_NOEXCEPT :
+        allocator_type(),
+        baseAllocator_(baseAllocator)
+    { }
+
+    ~StdAllocator() RAPIDJSON_NOEXCEPT
+    { }
+
+    template<typename U>
+    struct rebind {
+        typedef StdAllocator<U, BaseAllocator> other;
+    };
+
+    typedef typename allocator_type::value_type value_type;
+
+private:
+    template <typename, typename>
+    friend class StdAllocator; // access to StdAllocator<!T>.*
+
+    BaseAllocator baseAllocator_;
+};
+#endif
+
+#ifdef __GNUC__
+RAPIDJSON_DIAG_POP
+#endif
+
 RAPIDJSON_NAMESPACE_END

 #endif // RAPIDJSON_ENCODINGS_H_
--- a/src/3rdparty/rapidjson/cursorstreamwrapper.h
+++ b/src/3rdparty/rapidjson/cursorstreamwrapper.h
@ -1,6 +1,6 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
 //
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
--- a/src/3rdparty/rapidjson/document.h
+++ b/src/3rdparty/rapidjson/document.h
--- a/src/3rdparty/rapidjson/encodedstream.h
+++ b/src/3rdparty/rapidjson/encodedstream.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_ENCODEDSTREAM_H_
@ -41,7 +41,7 @@ class EncodedInputStream {
 public:
    typedef typename Encoding::Ch Ch;

-    EncodedInputStream(InputByteStream& is) : is_(is) { 
+    EncodedInputStream(InputByteStream& is) : is_(is) {
        current_ = Encoding::TakeBOM(is_);
    }

@ -51,7 +51,7 @@ public:

    // Not implemented
    void Put(Ch) { RAPIDJSON_ASSERT(false); }
-    void Flush() { RAPIDJSON_ASSERT(false); } 
+    void Flush() { RAPIDJSON_ASSERT(false); }
    Ch* PutBegin() { RAPIDJSON_ASSERT(false); return 0; }
    size_t PutEnd(Ch*) { RAPIDJSON_ASSERT(false); return 0; }

@ -80,7 +80,7 @@ public:

    // Not implemented
    void Put(Ch) {}
-    void Flush() {} 
+    void Flush() {}
    Ch* PutBegin() { return 0; }
    size_t PutEnd(Ch*) { return 0; }

@ -102,7 +102,7 @@ class EncodedOutputStream {
 public:
    typedef typename Encoding::Ch Ch;

-    EncodedOutputStream(OutputByteStream& os, bool putBOM = true) : os_(os) { 
+    EncodedOutputStream(OutputByteStream& os, bool putBOM = true) : os_(os) {
        if (putBOM)
            Encoding::PutBOM(os_);
    }
@ -143,7 +143,7 @@ public:
        \param type UTF encoding type if it is not detected from the stream.
    */
    AutoUTFInputStream(InputByteStream& is, UTFType type = kUTF8) : is_(&is), type_(type), hasBOM_(false) {
-        RAPIDJSON_ASSERT(type >= kUTF8 && type <= kUTF32BE);        
+        RAPIDJSON_ASSERT(type >= kUTF8 && type <= kUTF32BE);
        DetectType();
        static const TakeFunc f[] = { RAPIDJSON_ENCODINGS_FUNC(Take) };
        takeFunc_ = f[type_];
@ -159,7 +159,7 @@ public:

    // Not implemented
    void Put(Ch) { RAPIDJSON_ASSERT(false); }
-    void Flush() { RAPIDJSON_ASSERT(false); } 
+    void Flush() { RAPIDJSON_ASSERT(false); }
    Ch* PutBegin() { RAPIDJSON_ASSERT(false); return 0; }
    size_t PutEnd(Ch*) { RAPIDJSON_ASSERT(false); return 0; }

@ -258,7 +258,7 @@ public:
    UTFType GetType() const { return type_; }

    void Put(Ch c) { putFunc_(*os_, c); }
-    void Flush() { os_->Flush(); } 
+    void Flush() { os_->Flush(); }

    // Not implemented
    Ch Peek() const { RAPIDJSON_ASSERT(false); return 0;}
@ -271,7 +271,7 @@ private:
    AutoUTFOutputStream(const AutoUTFOutputStream&);
    AutoUTFOutputStream& operator=(const AutoUTFOutputStream&);

-    void PutBOM() { 
+    void PutBOM() {
        typedef void (*PutBOMFunc)(OutputByteStream&);
        static const PutBOMFunc f[] = { RAPIDJSON_ENCODINGS_FUNC(PutBOM) };
        f[type_](*os_);
--- a/src/3rdparty/rapidjson/encodings.h
+++ b/src/3rdparty/rapidjson/encodings.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_ENCODINGS_H_
@ -100,7 +100,7 @@ struct UTF8 {

    template<typename OutputStream>
    static void Encode(OutputStream& os, unsigned codepoint) {
-        if (codepoint <= 0x7F) 
+        if (codepoint <= 0x7F)
            os.Put(static_cast<Ch>(codepoint & 0xFF));
        else if (codepoint <= 0x7FF) {
            os.Put(static_cast<Ch>(0xC0 | ((codepoint >> 6) & 0xFF)));
@ -122,7 +122,7 @@ struct UTF8 {

    template<typename OutputStream>
    static void EncodeUnsafe(OutputStream& os, unsigned codepoint) {
-        if (codepoint <= 0x7F) 
+        if (codepoint <= 0x7F)
            PutUnsafe(os, static_cast<Ch>(codepoint & 0xFF));
        else if (codepoint <= 0x7FF) {
            PutUnsafe(os, static_cast<Ch>(0xC0 | ((codepoint >> 6) & 0xFF)));
@ -276,7 +276,7 @@ struct UTF16 {
    static void Encode(OutputStream& os, unsigned codepoint) {
        RAPIDJSON_STATIC_ASSERT(sizeof(typename OutputStream::Ch) >= 2);
        if (codepoint <= 0xFFFF) {
-            RAPIDJSON_ASSERT(codepoint < 0xD800 || codepoint > 0xDFFF); // Code point itself cannot be surrogate pair 
+            RAPIDJSON_ASSERT(codepoint < 0xD800 || codepoint > 0xDFFF); // Code point itself cannot be surrogate pair
            os.Put(static_cast<typename OutputStream::Ch>(codepoint));
        }
        else {
@ -292,7 +292,7 @@ struct UTF16 {
    static void EncodeUnsafe(OutputStream& os, unsigned codepoint) {
        RAPIDJSON_STATIC_ASSERT(sizeof(typename OutputStream::Ch) >= 2);
        if (codepoint <= 0xFFFF) {
-            RAPIDJSON_ASSERT(codepoint < 0xD800 || codepoint > 0xDFFF); // Code point itself cannot be surrogate pair 
+            RAPIDJSON_ASSERT(codepoint < 0xD800 || codepoint > 0xDFFF); // Code point itself cannot be surrogate pair
            PutUnsafe(os, static_cast<typename OutputStream::Ch>(codepoint));
        }
        else {
@ -406,7 +406,7 @@ struct UTF16BE : UTF16<CharType> {
 ///////////////////////////////////////////////////////////////////////////////
 // UTF32

-//! UTF-32 encoding. 
+//! UTF-32 encoding.
 /*! http://en.wikipedia.org/wiki/UTF-32
    \tparam CharType Type for storing 32-bit UTF-32 data. Default is unsigned. C++11 may use char32_t instead.
    \note implements Encoding concept
@ -498,7 +498,7 @@ struct UTF32BE : UTF32<CharType> {
    static CharType TakeBOM(InputByteStream& is) {
        RAPIDJSON_STATIC_ASSERT(sizeof(typename InputByteStream::Ch) == 1);
        CharType c = Take(is);
-        return static_cast<uint32_t>(c) == 0x0000FEFFu ? Take(is) : c; 
+        return static_cast<uint32_t>(c) == 0x0000FEFFu ? Take(is) : c;
    }

    template <typename InputByteStream>
@ -694,13 +694,13 @@ struct Transcoder<Encoding, Encoding> {
        os.Put(is.Take());  // Just copy one code unit. This semantic is different from primary template class.
        return true;
    }
-    
+
    template<typename InputStream, typename OutputStream>
    static RAPIDJSON_FORCEINLINE bool TranscodeUnsafe(InputStream& is, OutputStream& os) {
        PutUnsafe(os, is.Take());  // Just copy one code unit. This semantic is different from primary template class.
        return true;
    }
-    
+
    template<typename InputStream, typename OutputStream>
    static RAPIDJSON_FORCEINLINE bool Validate(InputStream& is, OutputStream& os) {
        return Encoding::Validate(is, os);  // source/target encoding are the same
--- a/src/3rdparty/rapidjson/error/en.h
+++ b/src/3rdparty/rapidjson/error/en.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_ERROR_EN_H_
@ -39,13 +39,13 @@ inline const RAPIDJSON_ERROR_CHARTYPE* GetParseError_En(ParseErrorCode parseErro

        case kParseErrorDocumentEmpty:                  return RAPIDJSON_ERROR_STRING("The document is empty.");
        case kParseErrorDocumentRootNotSingular:        return RAPIDJSON_ERROR_STRING("The document root must not be followed by other values.");
-    
+
        case kParseErrorValueInvalid:                   return RAPIDJSON_ERROR_STRING("Invalid value.");
-    
+
        case kParseErrorObjectMissName:                 return RAPIDJSON_ERROR_STRING("Missing a name for object member.");
        case kParseErrorObjectMissColon:                return RAPIDJSON_ERROR_STRING("Missing a colon after a name of object member.");
        case kParseErrorObjectMissCommaOrCurlyBracket:  return RAPIDJSON_ERROR_STRING("Missing a comma or '}' after an object member.");
-    
+
        case kParseErrorArrayMissCommaOrSquareBracket:  return RAPIDJSON_ERROR_STRING("Missing a comma or ']' after an array element.");

        case kParseErrorStringUnicodeEscapeInvalidHex:  return RAPIDJSON_ERROR_STRING("Incorrect hex digit after \\u escape in string.");
@ -65,6 +65,54 @@ inline const RAPIDJSON_ERROR_CHARTYPE* GetParseError_En(ParseErrorCode parseErro
    }
 }

+//! Maps error code of validation into error message.
+/*!
+    \ingroup RAPIDJSON_ERRORS
+    \param validateErrorCode Error code obtained from validator.
+    \return the error message.
+    \note User can make a copy of this function for localization.
+        Using switch-case is safer for future modification of error codes.
+*/
+inline const RAPIDJSON_ERROR_CHARTYPE* GetValidateError_En(ValidateErrorCode validateErrorCode) {
+    switch (validateErrorCode) {
+        case kValidateErrors:                           return RAPIDJSON_ERROR_STRING("One or more validation errors have occurred");
+        case kValidateErrorNone:                        return RAPIDJSON_ERROR_STRING("No error.");
+
+        case kValidateErrorMultipleOf:                  return RAPIDJSON_ERROR_STRING("Number '%actual' is not a multiple of the 'multipleOf' value '%expected'.");
+        case kValidateErrorMaximum:                     return RAPIDJSON_ERROR_STRING("Number '%actual' is greater than the 'maximum' value '%expected'.");
+        case kValidateErrorExclusiveMaximum:            return RAPIDJSON_ERROR_STRING("Number '%actual' is greater than or equal to the 'exclusiveMaximum' value '%expected'.");
+        case kValidateErrorMinimum:                     return RAPIDJSON_ERROR_STRING("Number '%actual' is less than the 'minimum' value '%expected'.");
+        case kValidateErrorExclusiveMinimum:            return RAPIDJSON_ERROR_STRING("Number '%actual' is less than or equal to the 'exclusiveMinimum' value '%expected'.");
+
+        case kValidateErrorMaxLength:                   return RAPIDJSON_ERROR_STRING("String '%actual' is longer than the 'maxLength' value '%expected'.");
+        case kValidateErrorMinLength:                   return RAPIDJSON_ERROR_STRING("String '%actual' is shorter than the 'minLength' value '%expected'.");
+        case kValidateErrorPattern:                     return RAPIDJSON_ERROR_STRING("String '%actual' does not match the 'pattern' regular expression.");
+
+        case kValidateErrorMaxItems:                    return RAPIDJSON_ERROR_STRING("Array of length '%actual' is longer than the 'maxItems' value '%expected'.");
+        case kValidateErrorMinItems:                    return RAPIDJSON_ERROR_STRING("Array of length '%actual' is shorter than the 'minItems' value '%expected'.");
+        case kValidateErrorUniqueItems:                 return RAPIDJSON_ERROR_STRING("Array has duplicate items at indices '%duplicates' but 'uniqueItems' is true.");
+        case kValidateErrorAdditionalItems:             return RAPIDJSON_ERROR_STRING("Array has an additional item at index '%disallowed' that is not allowed by the schema.");
+
+        case kValidateErrorMaxProperties:               return RAPIDJSON_ERROR_STRING("Object has '%actual' members which is more than 'maxProperties' value '%expected'.");
+        case kValidateErrorMinProperties:               return RAPIDJSON_ERROR_STRING("Object has '%actual' members which is less than 'minProperties' value '%expected'.");
+        case kValidateErrorRequired:                    return RAPIDJSON_ERROR_STRING("Object is missing the following members required by the schema: '%missing'.");
+        case kValidateErrorAdditionalProperties:        return RAPIDJSON_ERROR_STRING("Object has an additional member '%disallowed' that is not allowed by the schema.");
+        case kValidateErrorPatternProperties:           return RAPIDJSON_ERROR_STRING("Object has 'patternProperties' that are not allowed by the schema.");
+        case kValidateErrorDependencies:                return RAPIDJSON_ERROR_STRING("Object has missing property or schema dependencies, refer to following errors.");
+
+        case kValidateErrorEnum:                        return RAPIDJSON_ERROR_STRING("Property has a value that is not one of its allowed enumerated values.");
+        case kValidateErrorType:                        return RAPIDJSON_ERROR_STRING("Property has a type '%actual' that is not in the following list: '%expected'.");
+
+        case kValidateErrorOneOf:                       return RAPIDJSON_ERROR_STRING("Property did not match any of the sub-schemas specified by 'oneOf', refer to following errors.");
+        case kValidateErrorOneOfMatch:                  return RAPIDJSON_ERROR_STRING("Property matched more than one of the sub-schemas specified by 'oneOf'.");
+        case kValidateErrorAllOf:                       return RAPIDJSON_ERROR_STRING("Property did not match all of the sub-schemas specified by 'allOf', refer to following errors.");
+        case kValidateErrorAnyOf:                       return RAPIDJSON_ERROR_STRING("Property did not match any of the sub-schemas specified by 'anyOf', refer to following errors.");
+        case kValidateErrorNot:                         return RAPIDJSON_ERROR_STRING("Property matched the sub-schema specified by 'not'.");
+
+        default:                                        return RAPIDJSON_ERROR_STRING("Unknown error.");
+    }
+}
+
 RAPIDJSON_NAMESPACE_END

 #ifdef __clang__
--- a/src/3rdparty/rapidjson/error/error.h
+++ b/src/3rdparty/rapidjson/error/error.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_ERROR_ERROR_H_
@ -152,6 +152,61 @@ private:
 */
 typedef const RAPIDJSON_ERROR_CHARTYPE* (*GetParseErrorFunc)(ParseErrorCode);

+///////////////////////////////////////////////////////////////////////////////
+// ValidateErrorCode
+
+//! Error codes when validating.
+/*! \ingroup RAPIDJSON_ERRORS
+    \see GenericSchemaValidator
+*/
+enum ValidateErrorCode {
+    kValidateErrors    = -1,                   //!< Top level error code when kValidateContinueOnErrorsFlag set.
+    kValidateErrorNone = 0,                    //!< No error.
+
+    kValidateErrorMultipleOf,                  //!< Number is not a multiple of the 'multipleOf' value.
+    kValidateErrorMaximum,                     //!< Number is greater than the 'maximum' value.
+    kValidateErrorExclusiveMaximum,            //!< Number is greater than or equal to the 'maximum' value.
+    kValidateErrorMinimum,                     //!< Number is less than the 'minimum' value.
+    kValidateErrorExclusiveMinimum,            //!< Number is less than or equal to the 'minimum' value.
+
+    kValidateErrorMaxLength,                   //!< String is longer than the 'maxLength' value.
+    kValidateErrorMinLength,                   //!< String is longer than the 'maxLength' value.
+    kValidateErrorPattern,                     //!< String does not match the 'pattern' regular expression.
+
+    kValidateErrorMaxItems,                    //!< Array is longer than the 'maxItems' value.
+    kValidateErrorMinItems,                    //!< Array is shorter than the 'minItems' value.
+    kValidateErrorUniqueItems,                 //!< Array has duplicate items but 'uniqueItems' is true.
+    kValidateErrorAdditionalItems,             //!< Array has additional items that are not allowed by the schema.
+
+    kValidateErrorMaxProperties,               //!< Object has more members than 'maxProperties' value.
+    kValidateErrorMinProperties,               //!< Object has less members than 'minProperties' value.
+    kValidateErrorRequired,                    //!< Object is missing one or more members required by the schema.
+    kValidateErrorAdditionalProperties,        //!< Object has additional members that are not allowed by the schema.
+    kValidateErrorPatternProperties,           //!< See other errors.
+    kValidateErrorDependencies,                //!< Object has missing property or schema dependencies.
+
+    kValidateErrorEnum,                        //!< Property has a value that is not one of its allowed enumerated values
+    kValidateErrorType,                        //!< Property has a type that is not allowed by the schema..
+
+    kValidateErrorOneOf,                       //!< Property did not match any of the sub-schemas specified by 'oneOf'.
+    kValidateErrorOneOfMatch,                  //!< Property matched more than one of the sub-schemas specified by 'oneOf'.
+    kValidateErrorAllOf,                       //!< Property did not match all of the sub-schemas specified by 'allOf'.
+    kValidateErrorAnyOf,                       //!< Property did not match any of the sub-schemas specified by 'anyOf'.
+    kValidateErrorNot                          //!< Property matched the sub-schema specified by 'not'.
+};
+
+//! Function pointer type of GetValidateError().
+/*! \ingroup RAPIDJSON_ERRORS
+
+    This is the prototype for \c GetValidateError_X(), where \c X is a locale.
+    User can dynamically change locale in runtime, e.g.:
+\code
+    GetValidateErrorFunc GetValidateError = GetValidateError_En; // or whatever
+    const RAPIDJSON_ERROR_CHARTYPE* s = GetValidateError(validator.GetInvalidSchemaCode());
+\endcode
+*/
+typedef const RAPIDJSON_ERROR_CHARTYPE* (*GetValidateErrorFunc)(ValidateErrorCode);
+
 RAPIDJSON_NAMESPACE_END

 #ifdef __clang__
--- a/src/3rdparty/rapidjson/filereadstream.h
+++ b/src/3rdparty/rapidjson/filereadstream.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_FILEREADSTREAM_H_
@ -41,7 +41,7 @@ public:
        \param buffer user-supplied buffer.
        \param bufferSize size of buffer in bytes. Must >=4 bytes.
    */
-    FileReadStream(std::FILE* fp, char* buffer, size_t bufferSize) : fp_(fp), buffer_(buffer), bufferSize_(bufferSize), bufferLast_(0), current_(buffer_), readCount_(0), count_(0), eof_(false) { 
+    FileReadStream(std::FILE* fp, char* buffer, size_t bufferSize) : fp_(fp), buffer_(buffer), bufferSize_(bufferSize), bufferLast_(0), current_(buffer_), readCount_(0), count_(0), eof_(false) {
        RAPIDJSON_ASSERT(fp_ != 0);
        RAPIDJSON_ASSERT(bufferSize >= 4);
        Read();
@ -53,7 +53,7 @@ public:

    // Not implemented
    void Put(Ch) { RAPIDJSON_ASSERT(false); }
-    void Flush() { RAPIDJSON_ASSERT(false); } 
+    void Flush() { RAPIDJSON_ASSERT(false); }
    Ch* PutBegin() { RAPIDJSON_ASSERT(false); return 0; }
    size_t PutEnd(Ch*) { RAPIDJSON_ASSERT(false); return 0; }

--- a/src/3rdparty/rapidjson/filewritestream.h
+++ b/src/3rdparty/rapidjson/filewritestream.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_FILEWRITESTREAM_H_
@ -33,11 +33,11 @@ class FileWriteStream {
 public:
    typedef char Ch;    //!< Character type. Only support char.

-    FileWriteStream(std::FILE* fp, char* buffer, size_t bufferSize) : fp_(fp), buffer_(buffer), bufferEnd_(buffer + bufferSize), current_(buffer_) { 
+    FileWriteStream(std::FILE* fp, char* buffer, size_t bufferSize) : fp_(fp), buffer_(buffer), bufferEnd_(buffer + bufferSize), current_(buffer_) {
        RAPIDJSON_ASSERT(fp_ != 0);
    }

-    void Put(char c) { 
+    void Put(char c) {
        if (current_ >= bufferEnd_)
            Flush();

--- a/src/3rdparty/rapidjson/fwd.h
+++ b/src/3rdparty/rapidjson/fwd.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_FWD_H_
@ -101,8 +101,8 @@ class PrettyWriter;

 // document.h

-template <typename Encoding, typename Allocator> 
-struct GenericMember;
+template <typename Encoding, typename Allocator>
+class GenericMember;

 template <bool Const, typename Encoding, typename Allocator>
 class GenericMemberIterator;
@ -110,7 +110,7 @@ class GenericMemberIterator;
 template<typename CharType>
 struct GenericStringRef;

-template <typename Encoding, typename Allocator> 
+template <typename Encoding, typename Allocator>
 class GenericValue;

 typedef GenericValue<UTF8<char>, MemoryPoolAllocator<CrtAllocator> > Value;
--- a/src/3rdparty/rapidjson/internal/biginteger.h
+++ b/src/3rdparty/rapidjson/internal/biginteger.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_BIGINTEGER_H_
@ -17,7 +17,7 @@

 #include "../rapidjson.h"

-#if defined(_MSC_VER) && !__INTEL_COMPILER && defined(_M_AMD64)
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER) && defined(_M_AMD64)
 #include <intrin.h> // for _umul128
 #pragma intrinsic(_umul128)
 #endif
@ -37,7 +37,8 @@ public:
        digits_[0] = u;
    }

-    BigInteger(const char* decimals, size_t length) : count_(1) {
+    template<typename Ch>
+    BigInteger(const Ch* decimals, size_t length) : count_(1) {
        RAPIDJSON_ASSERT(length > 0);
        digits_[0] = 0;
        size_t i = 0;
@ -51,7 +52,7 @@ public:
        if (length > 0)
            AppendDecimal64(decimals + i, decimals + i + length);
    }
-    
+
    BigInteger& operator=(const BigInteger &rhs)
    {
        if (this != &rhs) {
@ -60,9 +61,9 @@ public:
        }
        return *this;
    }
-    
+
    BigInteger& operator=(uint64_t u) {
-        digits_[0] = u;            
+        digits_[0] = u;
        count_ = 1;
        return *this;
    }
@ -95,7 +96,7 @@ public:
            digits_[i] = MulAdd64(digits_[i], u, k, &hi);
            k = hi;
        }
-        
+
        if (k > 0)
            PushBack(k);

@ -118,7 +119,7 @@ public:
            digits_[i] = (p0 & 0xFFFFFFFF) | (p1 << 32);
            k = p1 >> 32;
        }
-        
+
        if (k > 0)
            PushBack(k);

@ -221,7 +222,8 @@ public:
    bool IsZero() const { return count_ == 1 && digits_[0] == 0; }

 private:
-    void AppendDecimal64(const char* begin, const char* end) {
+    template<typename Ch>
+    void AppendDecimal64(const Ch* begin, const Ch* end) {
        uint64_t u = ParseUint64(begin, end);
        if (IsZero())
            *this = u;
@ -236,11 +238,12 @@ private:
        digits_[count_++] = digit;
    }

-    static uint64_t ParseUint64(const char* begin, const char* end) {
+    template<typename Ch>
+    static uint64_t ParseUint64(const Ch* begin, const Ch* end) {
        uint64_t r = 0;
-        for (const char* p = begin; p != end; ++p) {
-            RAPIDJSON_ASSERT(*p >= '0' && *p <= '9');
-            r = r * 10u + static_cast<unsigned>(*p - '0');
+        for (const Ch* p = begin; p != end; ++p) {
+            RAPIDJSON_ASSERT(*p >= Ch('0') && *p <= Ch('9'));
+            r = r * 10u + static_cast<unsigned>(*p - Ch('0'));
        }
        return r;
    }
--- a/src/3rdparty/rapidjson/internal/clzll.h
+++ b/src/3rdparty/rapidjson/internal/clzll.h
@ -0,0 +1,71 @@
+// Tencent is pleased to support the open source community by making RapidJSON available.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
+//
+// Licensed under the MIT License (the "License"); you may not use this file except
+// in compliance with the License. You may obtain a copy of the License at
+//
+// http://opensource.org/licenses/MIT
+//
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations under the License.
+
+#ifndef RAPIDJSON_CLZLL_H_
+#define RAPIDJSON_CLZLL_H_
+
+#include "../rapidjson.h"
+
+#if defined(_MSC_VER) && !defined(UNDER_CE)
+#include <intrin.h>
+#if defined(_WIN64)
+#pragma intrinsic(_BitScanReverse64)
+#else
+#pragma intrinsic(_BitScanReverse)
+#endif
+#endif
+
+RAPIDJSON_NAMESPACE_BEGIN
+namespace internal {
+
+inline uint32_t clzll(uint64_t x) {
+    // Passing 0 to __builtin_clzll is UB in GCC and results in an
+    // infinite loop in the software implementation.
+    RAPIDJSON_ASSERT(x != 0);
+
+#if defined(_MSC_VER) && !defined(UNDER_CE)
+    unsigned long r = 0;
+#if defined(_WIN64)
+    _BitScanReverse64(&r, x);
+#else
+    // Scan the high 32 bits.
+    if (_BitScanReverse(&r, static_cast<uint32_t>(x >> 32)))
+        return 63 - (r + 32);
+
+    // Scan the low 32 bits.
+    _BitScanReverse(&r, static_cast<uint32_t>(x & 0xFFFFFFFF));
+#endif // _WIN64
+
+    return 63 - r;
+#elif (defined(__GNUC__) && __GNUC__ >= 4) || RAPIDJSON_HAS_BUILTIN(__builtin_clzll)
+    // __builtin_clzll wrapper
+    return static_cast<uint32_t>(__builtin_clzll(x));
+#else
+    // naive version
+    uint32_t r = 0;
+    while (!(x & (static_cast<uint64_t>(1) << 63))) {
+        x <<= 1;
+        ++r;
+    }
+
+    return r;
+#endif // _MSC_VER
+}
+
+#define RAPIDJSON_CLZLL RAPIDJSON_NAMESPACE::internal::clzll
+
+} // namespace internal
+RAPIDJSON_NAMESPACE_END
+
+#endif // RAPIDJSON_CLZLL_H_
--- a/src/3rdparty/rapidjson/internal/diyfp.h
+++ b/src/3rdparty/rapidjson/internal/diyfp.h
@ -1,6 +1,6 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
 //
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
@ -20,11 +20,11 @@
 #define RAPIDJSON_DIYFP_H_

 #include "../rapidjson.h"
+#include "clzll.h"
 #include <limits>

 #if defined(_MSC_VER) && defined(_M_AMD64) && !defined(__INTEL_COMPILER)
 #include <intrin.h>
-#pragma intrinsic(_BitScanReverse64)
 #pragma intrinsic(_umul128)
 #endif

@ -100,22 +100,8 @@ struct DiyFp {
    }

    DiyFp Normalize() const {
-        RAPIDJSON_ASSERT(f != 0); // https://stackoverflow.com/a/26809183/291737
-#if defined(_MSC_VER) && defined(_M_AMD64)
-        unsigned long index;
-        _BitScanReverse64(&index, f);
-        return DiyFp(f << (63 - index), e - (63 - index));
-#elif defined(__GNUC__) && __GNUC__ >= 4
-        int s = __builtin_clzll(f);
+        int s = static_cast<int>(clzll(f));
        return DiyFp(f << s, e - s);
-#else
-        DiyFp res = *this;
-        while (!(res.f & (static_cast<uint64_t>(1) << 63))) {
-            res.f <<= 1;
-            res.e--;
-        }
-        return res;
-#endif
    }

    DiyFp NormalizeBoundary() const {
--- a/src/3rdparty/rapidjson/internal/dtoa.h
+++ b/src/3rdparty/rapidjson/internal/dtoa.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 // This is a C++ header-only implementation of Grisu2 algorithm from the publication:
@ -58,7 +58,11 @@ inline int CountDecimalDigit32(uint32_t n) {
 }

 inline void DigitGen(const DiyFp& W, const DiyFp& Mp, uint64_t delta, char* buffer, int* len, int* K) {
-    static const uint32_t kPow10[] = { 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000 };
+    static const uint64_t kPow10[] = { 1U, 10U, 100U, 1000U, 10000U, 100000U, 1000000U, 10000000U, 100000000U,
+                                       1000000000U, 10000000000U, 100000000000U, 1000000000000U,
+                                       10000000000000U, 100000000000000U, 1000000000000000U,
+                                       10000000000000000U, 100000000000000000U, 1000000000000000000U,
+                                       10000000000000000000U };
    const DiyFp one(uint64_t(1) << -Mp.e, Mp.e);
    const DiyFp wp_w = Mp - W;
    uint32_t p1 = static_cast<uint32_t>(Mp.f >> -one.e);
@ -86,7 +90,7 @@ inline void DigitGen(const DiyFp& W, const DiyFp& Mp, uint64_t delta, char* buff
        uint64_t tmp = (static_cast<uint64_t>(p1) << -one.e) + p2;
        if (tmp <= delta) {
            *K += kappa;
-            GrisuRound(buffer, *len, delta, tmp, static_cast<uint64_t>(kPow10[kappa]) << -one.e, wp_w.f);
+            GrisuRound(buffer, *len, delta, tmp, kPow10[kappa] << -one.e, wp_w.f);
            return;
        }
    }
@ -103,7 +107,7 @@ inline void DigitGen(const DiyFp& W, const DiyFp& Mp, uint64_t delta, char* buff
        if (p2 < delta) {
            *K += kappa;
            int index = -kappa;
-            GrisuRound(buffer, *len, delta, p2, one.f, wp_w.f * (index < 9 ? kPow10[index] : 0));
+            GrisuRound(buffer, *len, delta, p2, one.f, wp_w.f * (index < 20 ? kPow10[index] : 0));
            return;
        }
    }
--- a/src/3rdparty/rapidjson/internal/ieee754.h
+++ b/src/3rdparty/rapidjson/internal/ieee754.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_IEEE754_
--- a/src/3rdparty/rapidjson/internal/itoa.h
+++ b/src/3rdparty/rapidjson/internal/itoa.h
@ -1,6 +1,6 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
 //
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
--- a/src/3rdparty/rapidjson/internal/meta.h
+++ b/src/3rdparty/rapidjson/internal/meta.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_INTERNAL_META_H_
--- a/src/3rdparty/rapidjson/internal/pow10.h
+++ b/src/3rdparty/rapidjson/internal/pow10.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_POW10_
@ -27,8 +27,8 @@ namespace internal {
 */
 inline double Pow10(int n) {
    static const double e[] = { // 1e-0...1e308: 309 * 8 bytes = 2472 bytes
-        1e+0,  
-        1e+1,  1e+2,  1e+3,  1e+4,  1e+5,  1e+6,  1e+7,  1e+8,  1e+9,  1e+10, 1e+11, 1e+12, 1e+13, 1e+14, 1e+15, 1e+16, 1e+17, 1e+18, 1e+19, 1e+20, 
+        1e+0,
+        1e+1,  1e+2,  1e+3,  1e+4,  1e+5,  1e+6,  1e+7,  1e+8,  1e+9,  1e+10, 1e+11, 1e+12, 1e+13, 1e+14, 1e+15, 1e+16, 1e+17, 1e+18, 1e+19, 1e+20,
        1e+21, 1e+22, 1e+23, 1e+24, 1e+25, 1e+26, 1e+27, 1e+28, 1e+29, 1e+30, 1e+31, 1e+32, 1e+33, 1e+34, 1e+35, 1e+36, 1e+37, 1e+38, 1e+39, 1e+40,
        1e+41, 1e+42, 1e+43, 1e+44, 1e+45, 1e+46, 1e+47, 1e+48, 1e+49, 1e+50, 1e+51, 1e+52, 1e+53, 1e+54, 1e+55, 1e+56, 1e+57, 1e+58, 1e+59, 1e+60,
        1e+61, 1e+62, 1e+63, 1e+64, 1e+65, 1e+66, 1e+67, 1e+68, 1e+69, 1e+70, 1e+71, 1e+72, 1e+73, 1e+74, 1e+75, 1e+76, 1e+77, 1e+78, 1e+79, 1e+80,
--- a/src/3rdparty/rapidjson/internal/regex.h
+++ b/src/3rdparty/rapidjson/internal/regex.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_INTERNAL_REGEX_H_
@ -23,7 +23,6 @@
 RAPIDJSON_DIAG_PUSH
 RAPIDJSON_DIAG_OFF(padded)
 RAPIDJSON_DIAG_OFF(switch-enum)
-RAPIDJSON_DIAG_OFF(implicit-fallthrough)
 #elif defined(_MSC_VER)
 RAPIDJSON_DIAG_PUSH
 RAPIDJSON_DIAG_OFF(4512) // assignment operator could not be generated
@ -32,9 +31,6 @@ RAPIDJSON_DIAG_OFF(4512) // assignment operator could not be generated
 #ifdef __GNUC__
 RAPIDJSON_DIAG_PUSH
 RAPIDJSON_DIAG_OFF(effc++)
-#if __GNUC__ >= 7
-RAPIDJSON_DIAG_OFF(implicit-fallthrough)
-#endif
 #endif

 #ifndef RAPIDJSON_REGEX_VERBOSE
@ -106,9 +102,9 @@ class GenericRegexSearch;
    - \c \\t Tab (U+0009)
    - \c \\v Vertical tab (U+000B)

-    \note This is a Thompson NFA engine, implemented with reference to 
-        Cox, Russ. "Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby,...).", 
-        https://swtch.com/~rsc/regexp/regexp1.html 
+    \note This is a Thompson NFA engine, implemented with reference to
+        Cox, Russ. "Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby,...).",
+        https://swtch.com/~rsc/regexp/regexp1.html
 */
 template <typename Encoding, typename Allocator = CrtAllocator>
 class GenericRegex {
@ -117,9 +113,9 @@ public:
    typedef typename Encoding::Ch Ch;
    template <typename, typename> friend class GenericRegexSearch;

-    GenericRegex(const Ch* source, Allocator* allocator = 0) : 
-        ownAllocator_(allocator ? 0 : RAPIDJSON_NEW(Allocator)()), allocator_(allocator ? allocator : ownAllocator_), 
-        states_(allocator_, 256), ranges_(allocator_, 256), root_(kRegexInvalidState), stateCount_(), rangeCount_(), 
+    GenericRegex(const Ch* source, Allocator* allocator = 0) :
+        ownAllocator_(allocator ? 0 : RAPIDJSON_NEW(Allocator)()), allocator_(allocator ? allocator : ownAllocator_),
+        states_(allocator_, 256), ranges_(allocator_, 256), root_(kRegexInvalidState), stateCount_(), rangeCount_(),
        anchorBegin_(), anchorEnd_()
    {
        GenericStringStream<Encoding> ss(source);
@ -151,7 +147,7 @@ private:
    static const unsigned kRangeNegationFlag = 0x80000000;

    struct Range {
-        unsigned start; // 
+        unsigned start; //
        unsigned end;
        SizeType next;
    };
@ -291,6 +287,7 @@ private:
                    if (!CharacterEscape(ds, &codepoint))
                        return; // Unsupported escape character
                    // fall through to default
+                    RAPIDJSON_DELIBERATE_FALLTHROUGH;

                default: // Pattern character
                    PushOperand(operandStack, codepoint);
@ -405,7 +402,7 @@ private:
                }
                return false;

-            default: 
+            default:
                // syntax error (e.g. unclosed kLeftParenthesis)
                return false;
        }
@ -520,6 +517,7 @@ private:
                else if (!CharacterEscape(ds, &codepoint))
                    return false;
                // fall through to default
+                RAPIDJSON_DELIBERATE_FALLTHROUGH;

            default:
                switch (step) {
@ -529,6 +527,7 @@ private:
                        break;
                    }
                    // fall through to step 0 for other characters
+                    RAPIDJSON_DELIBERATE_FALLTHROUGH;

                case 0:
                    {
@ -551,7 +550,7 @@ private:
        }
        return false;
    }
-    
+
    SizeType NewRange(unsigned codepoint) {
        Range* r = ranges_.template Push<Range>();
        r->start = r->end = codepoint;
@ -609,7 +608,7 @@ public:
    typedef typename RegexType::EncodingType Encoding;
    typedef typename Encoding::Ch Ch;

-    GenericRegexSearch(const RegexType& regex, Allocator* allocator = 0) : 
+    GenericRegexSearch(const RegexType& regex, Allocator* allocator = 0) :
        regex_(regex), allocator_(allocator), ownAllocator_(0),
        state0_(allocator, 0), state1_(allocator, 0), stateSet_()
    {
@ -668,7 +667,7 @@ private:
            for (const SizeType* s = current->template Bottom<SizeType>(); s != current->template End<SizeType>(); ++s) {
                const State& sr = regex_.GetState(*s);
                if (sr.codepoint == codepoint ||
-                    sr.codepoint == RegexType::kAnyCharacterClass || 
+                    sr.codepoint == RegexType::kAnyCharacterClass ||
                    (sr.codepoint == RegexType::kRangeCharacterClass && MatchRange(sr.rangeStart, codepoint)))
                {
                    matched = AddState(*next, sr.out) || matched;
--- a/src/3rdparty/rapidjson/internal/stack.h
+++ b/src/3rdparty/rapidjson/internal/stack.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_INTERNAL_STACK_H_
@ -98,7 +98,7 @@ public:

    void Clear() { stackTop_ = stack_; }

-    void ShrinkToFit() { 
+    void ShrinkToFit() {
        if (Empty()) {
            // If the stack is empty, completely deallocate the memory.
            Allocator::Free(stack_); // NOLINT (+clang-analyzer-unix.Malloc)
@ -142,7 +142,7 @@ public:
    }

    template<typename T>
-    T* Top() { 
+    T* Top() {
        RAPIDJSON_ASSERT(GetSize() >= sizeof(T));
        return reinterpret_cast<T*>(stackTop_ - sizeof(T));
    }
--- a/src/3rdparty/rapidjson/internal/strfunc.h
+++ b/src/3rdparty/rapidjson/internal/strfunc.h
@ -1,15 +1,15 @@
 // Tencent is pleased to support the open source community by making RapidJSON available.
-// 
-// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip. All rights reserved.
+//
+// Copyright (C) 2015 THL A29 Limited, a Tencent company, and Milo Yip.
 //
 // Licensed under the MIT License (the "License"); you may not use this file except
 // in compliance with the License. You may obtain a copy of the License at
 //
 // http://opensource.org/licenses/MIT
 //
-// Unless required by applicable law or agreed to in writing, software distributed 
-// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
-// CONDITIONS OF ANY KIND, either express or implied. See the License for the 
+// Unless required by applicable law or agreed to in writing, software distributed
+// under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
 // specific language governing permissions and limitations under the License.

 #ifndef RAPIDJSON_INTERNAL_STRFUNC_H_
@ -24,7 +24,7 @@ namespace internal {
 //! Custom strlen() which works on different character types.
 /*! \tparam Ch Character type (e.g. char, wchar_t, short)
    \param s Null-terminated input string.
-    \return Number of characters in the string. 
+    \return Number of characters in the string.
    \note This has the same semantics as strlen(), the return value is not number of Unicode codepoints.
 */
 template <typename Ch>
@ -45,6 +45,20 @@ inline SizeType StrLen(const wchar_t* s) {
    return SizeType(std::wcslen(s));
 }

+//! Custom strcmpn() which works on different character types.
+/*! \tparam Ch Character type (e.g. char, wchar_t, short)
+    \param s1 Null-terminated input string.
+    \param s2 Null-terminated input string.
+    \return 0 if equal
+*/
+template<typename Ch>
+inline int StrCmp(const Ch* s1, const Ch* s2) {
+    RAPIDJSON_ASSERT(s1 != 0);
+    RAPIDJSON_ASSERT(s2 != 0);
+    while(*s1 && (*s1 == *s2)) { s1++; s2++; }
+    return static_cast<unsigned>(*s1) < static_cast<unsigned>(*s2) ? -1 : static_cast<unsigned>(*s1) > static_cast<unsigned>(*s2);
+}
+
 //! Returns number of code points in a encoded string.
 template<typename Encoding>
 bool CountStringCodePoint(const typename Encoding::Ch* s, SizeType length, SizeType* outCount) {
--- a/Show more
+++ b/Show more
				`@ -0,0 +1 @@`
				`epee - is a small library of helpers, wrappers, tools and and so on, used to make my life easier.`