From: Ivan S. Titov <iohann.s.titov@gmail.com>
Date: 2026-05-08
Subject: [PATCH] caffe2: scrub MKL MPI/cluster libs from caffe2::mkl public link interface

caffe2/cmake/public/mkl.cmake calls find_package(MKL) and dumps every
returned lib (MKL_LIBRARIES) into caffe2::mkl's INTERFACE_LINK_LIBRARIES
without filtering. On hosts with Intel oneAPI installed, the resolver
is Intel's own MKLConfig.cmake, which returns the full HPC / Cluster
Edition lib set by default — including:

  * libmkl_scalapack_ilp64    (ScaLAPACK, MPI-distributed)
  * libmkl_cdft_core           (Cluster DFT)
  * libmkl_blacs_intelmpi_ilp64 (BLACS over Intel MPI)
  * libmkl_intel_thread        (Intel-OpenMP threading layer)

These libs only exist when Intel Cluster Edition + Compiler are also
installed; the basic intel-oneapi-mkl package omits them. They flow
through caffe2::mkl into TorchConfig.cmake's public link interface,
so every downstream consumer that links against torch::torch then
fails with "cannot find -lmkl_scalapack_ilp64" etc.

torch::torch's public APIs never reach BLACS / ScaLAPACK / distributed
FFT — distributed-tensor paths use NCCL / Gloo / MPI directly. Drop
those libs from MKL_LIBRARIES before adding to the target.

Force MKL_THREADING=gnu_thread before the find: libmkl_gnu_thread is
in the basic MKL package and uses the same libgomp every Gentoo gcc
already links against, avoiding the multi-OpenMP-runtime mixing trap
(libgomp + libiomp5 in the same process oversubscribes / deadlocks).

Verified 2026-05-08 against caffe2-2.11.0 — fixes vllm USE=cuda and
USE=cpu cumem_allocator link failures on this overlay's reference host
(intel-oneapi-mkl 2026.0 only, no Cluster + no Compiler oneAPI).

Bug: pytorch's cmake/public/mkl.cmake on main still has the same
unfiltered behaviour as of 2026-05-08; no upstream PR open for this
specific scrub. Drop this patch when an equivalent upstream fix lands.

--- a/cmake/public/mkl.cmake
+++ b/cmake/public/mkl.cmake
@@ -1,3 +1,8 @@
+# stuff overlay scrub: force GNU OpenMP threading. Intel's MKLConfig
+# defaults to MKL_THREADING=intel_thread, but libmkl_intel_thread is
+# not in the basic intel-oneapi-mkl package; libmkl_gnu_thread is
+# always available and pairs with system libgomp.
+set(MKL_THREADING gnu_thread)
 find_package(MKL QUIET)

 if(TARGET caffe2::mkl)
@@ -5,6 +10,12 @@
 endif()

 add_library(caffe2::mkl INTERFACE IMPORTED)
+# stuff overlay scrub: drop MKL MPI / cluster libs from MKL_LIBRARIES.
+# torch::torch's public APIs never reach BLACS / ScaLAPACK / distributed
+# FFT, but exposing these libs in caffe2::mkl breaks downstream linking.
+if(MKL_LIBRARIES)
+  list(FILTER MKL_LIBRARIES EXCLUDE REGEX "mkl_(scalapack|cdft|blacs)")
+endif()
 target_include_directories(caffe2::mkl INTERFACE ${MKL_INCLUDE_DIR})
 target_link_libraries(caffe2::mkl INTERFACE ${MKL_LIBRARIES})
 foreach(MKL_LIB IN LISTS MKL_LIBRARIES)