Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: OptiX linker failure caused by EmptyKernel<void> in cub/util_device.cuh #1923

Closed
1 task done
kririae opened this issue Jun 28, 2024 · 7 comments
Closed
1 task done
Labels
bug Something isn't working right.

Comments

@kririae
Copy link

kririae commented Jun 28, 2024

Is this a duplicate?

Type of Bug

Runtime Error

Component

CUB

Describe the bug

When more than one TU is compiled with CUB's cub/util_device.cuh header included, the OptiX linker will report an error upon the invocation of optixPipelineCreate, which reports the following:

Error: Symbol '_ZN3cub17CUB_200400_860_NS11EmptyKernelIvEEvv' was defined multiple times. First seen in: '__direct_callable__sample_brdf_specular_and_2_more_ID9'

How to Reproduce

The reproduction of this error(?) requires a complete OptiX environment, I forked the official OptiX_Apps as an example.

I can only use the PTX variant(instead of the OptiX IR variant) of OptiX because of this issue, so I slightly modified the code as an alignment.

When executing the example renderer like:

./rtigo3 -s ../../data/system_rtigo3_cornell_box.txt -d ../../data/system_rtigo3_cornell_box.txt

The issue appears

Error: 65547: Wayland: Failed to create standard cursor "sb_v_double_arrow"
Error: 65547: Wayland: Failed to create standard cursor "sb_h_double_arrow"
COMPILER (2) [0]: COMPILE ERROR: 
Info: Pipeline statistics
        module(s)                            :    10
        entry function(s)                    :    20
        trace call(s)                        :     2
        continuation callable call(s)        :     0
        direct callable call(s)              :     4
        basic block(s) in entry functions    :   193
        instruction(s) in entry functions    :  5113
        non-entry function(s)                :     2
        basic block(s) in non-entry functions:     2
        instruction(s) in non-entry functions:     2
        debug information                    :    no
Error: Symbol '_ZN3cub17CUB_200400_860_NS11EmptyKernelIvEEvv' was defined multiple times. First seen in: '__direct_callable__sample_brdf_diffuse_and_1_more_ID8'

ERROR: /tmp/OptiX_Apps/apps/rtigo3/src/Device.cpp(829): m_api.optixPipelineCreate(m_optixContext, &pco, &plo, programGroups.data(), (unsigned int) programGroups.size(), nullptr, nullptr, &m_pipeline) (7251)
ERROR: Application() failed to initialize successfully.

Expected behavior

I temporarily worked around this issue by adding an anonymous namespace around, and the issue disappeared.

/**
 * \brief Empty kernel for querying PTX manifest metadata (e.g., version) for the current device
 */
namespace {
template <typename T>
__global__ void EmptyKernel(void) { }
} // namespace

Reproduction link

kririae/OptiX_Apps@8ee4178

Operating System

Arch Linux

nvidia-smi output

Fri Jun 28 22:10:29 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:05:00.0  On |                  N/A |
| 38%   44C    P0            111W /  350W |    2698MiB /  24576MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off |   00000000:0B:00.0 Off |                  N/A |
| 30%   29C    P8             14W /  350W |      20MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1108      G   /usr/lib/Xorg                                  69MiB |
|    0   N/A  N/A      1693      G   Hyprland                                      768MiB |
|    0   N/A  N/A      1941      G   Xwayland                                       40MiB |
|    0   N/A  N/A      3890      G   ...erProcess --variations-seed-version        802MiB |
|    0   N/A  N/A    198848      G   kitty                                         105MiB |
|    0   N/A  N/A    579683      G   /usr/lib/kactivitymanagerd                      4MiB |
|    0   N/A  N/A    613847      G   ...ures=SpareRendererForSitePerProcess         35MiB |
|    0   N/A  N/A    715261      G   kitty                                          57MiB |
|    0   N/A  N/A   3313613      G   telegram-desktop                              135MiB |
|    0   N/A  N/A   3444323      G   kitty                                          55MiB |
|    0   N/A  N/A   3542696      G   /usr/lib/firefox/firefox                      257MiB |
|    0   N/A  N/A   3543027      G   ...bin/plasma-browser-integration-host          4MiB |
|    1   N/A  N/A      1108      G   /usr/lib/Xorg                                   4MiB |
|    1   N/A  N/A      1693      G   Hyprland                                        4MiB |
+-----------------------------------------------------------------------------------------+

NVCC version

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
@kririae kririae added the bug Something isn't working right. label Jun 28, 2024
@bernhardmgruber
Copy link
Contributor

The definition of EmptyKernel after macro expansion is:

template <typename T>
__global__ __attribute__((__visibility__("hidden")))  void EmptyKernel()
{}

which is the same for many other kernels with have inside CUB. The kernel being a template makes the function EmptyKernel implicitely inline, so it's fine to define it in the header and the linker should discard all but one compiled version. This works correctly as part of the many compilers we test in our CI.

I am not familiar with "the OptiX linker" so here is some speculation: maybe it fails to mark the kernel inline because the template parameter is not used. Can you change the kernel to

template <typename T>
CUB_DETAIL_KERNEL_ATTRIBUTES void EmptyKernel(T*)
{}

and the later definition of EmptyKernelPtr to:

using EmptyKernelPtr        = void (*)(void*);

and report whether this fixes the issue?

@kririae
Copy link
Author

kririae commented Jul 3, 2024

Thank you for your patient reply! I've tried your suggested approach, but unfortunately, the renderer still reports the same error:

Error: Symbol '_ZN3cub17CUB_200400_860_NS11EmptyKernelIvEEvPT_' was defined multiple times. First seen in: '0x71864e8064b77f71__closesthit__radiance-and-6-more'

I'd like to provide some additional information that might be helpful:

  1. We're currently using version v2.4.0. We previously used an older version without the visibility attribute (sorry about not mentioning that..), but the issue persists after upgrading to 2.4.0. Here's the PTX output of the corresponding section for the current CUB version:
// .weak	_ZN3cub17CUB_200400_860_NS11EmptyKernelIvEEvPT_
.weak .entry _ZN3cub17CUB_200400_860_NS11EmptyKernelIvEEvPT_(
	.param .u64 _ZN3cub17CUB_200400_860_NS11EmptyKernelIvEEvPT__param_0
)
{
	.loc	2 284 0
	.loc	2 284 73
	ret;
}
  1. When using the anonymous namespace approach, we get the following PTX (and this works under OptiX):
.entry __nv_static_37__4fac033d_15_bsdf_builtin_cu_c160a2af__ZN3cub17CUB_200400_860_NS48_GLOBAL__N__4fac033d_15_bsdf_builtin_cu_c160a2af11EmptyKernelIvEEvPT_(
	.param .u64 __nv_static_37__4fac033d_15_bsdf_builtin_cu_c160a2af__ZN3cub17CUB_200400_860_NS48_GLOBAL__N__4fac033d_15_bsdf_builtin_cu_c160a2af11EmptyKernelIvEEvPT__param_0
)
{
	.loc	2 285 0
	.loc	2 285 73
	ret;
}
  1. Interestingly, when compiling to OptiX IR (-optix-ir), the linker issue disappears without adding the anonymous namespace. This leads me to suspect that.. it might be an OptiX-specific issue.

  2. I'm not sure if the visibility attribute is actually discarded... compiling the v2.4.0 reports no warning but when I switched back to older version and add the attribute manually, the compiler reports that

[7/10] Building CUDA object flux/CMakeFiles/flux.dir/src/film.cu.o
/home/krr/.cache/CPM/cccl/6b735c39125bb1ca2f7ad9b8fbf6b135f24dd2ed/cub/cub/cmake/../../cub/util_device.cuh:117:88: warning: ‘visibility’ attribute ignored [-Wattributes]
  117 | __global__ __attribute__((__visibility__("hidden"))) void EmptyKernel(void) { }
      |                                                                                        ^
In file included from tmpxft_001ac178_00000000-6_film.cudafe1.stub.c:1:
/tmp/tmpxft_001ac178_00000000-6_film.cudafe1.stub.c:1:115: warning: ‘visibility’ attribute ignored [-Wattributes]
    1 | #pragma GCC diagnostic push
      |                                                                                                                   ^
[8/10] Building CUDA object flux/CMakeFiles/flux.dir/src/light.cu.o
/home/krr/.cache/CPM/cccl/6b735c39125bb1ca2f7ad9b8fbf6b135f24dd2ed/cub/cub/cmake/../../cub/util_device.cuh:117:88: warning: ‘visibility’ attribute ignored [-Wattributes]
  117 | __global__ __attribute__((__visibility__("hidden"))) void EmptyKernel(void) { }

The two files are not related to OptiX kernel.

In any case, we do have a workaround, and it doesn't affect our development (hopefully?). Thank you again for your reply.

@bernhardmgruber
Copy link
Contributor

Here's the PTX output of the corresponding section for the current CUB version:

// .weak	_ZN3cub17CUB_200400_860_NS11EmptyKernelIvEEvPT_
.weak .entry _ZN3cub17CUB_200400_860_NS11EmptyKernelIvEEvPT_(

The .weak is a strong indicator that the compiler understood the function to be inline. The linker should then discard all but one of the weak symbols. There may be indeed a problem with the linker.

  1. When using the anonymous namespace approach, we get the following PTX (and this works under OptiX):
.entry __nv_static_37__4fac033d_15_bsdf_builtin_cu_c160a2af__ZN3cub17CUB_200400_860_NS48_GLOBAL__N__4fac033d_15_bsdf_builtin_cu_c160a2af11EmptyKernelIvEEvPT_(

If I demangle that, I get .entry __nv_static_37__4fac033d_15_bsdf_builtin_cu_c160a2af_void cub::CUB_200400_860_NS::(anonymous namespace)::EmptyKernel<void>(void*)(. I don't know what the compiler makes of entities in an anonymous namespace, but __nv_static_37__4fac033d_15_bsdf_builtin_cu_c160a2af_ sounds like it generated a random enough symbol name that should not conflict with other random enough symbols of other translation units. Anonymous namespaces have a similar effect than marking a symbol static, which is also hinted at by __nv_static_.

So I guess that's your workaround for now. Anonymous namespace or mark as static. The downside is that you will have many versions of EmptyKernel in your binary, so this solution may not scale to big kernels in programs with many translation units.

This leads me to suspect that.. it might be an OptiX-specific issue.

If you can, please report this issue to the OptiX team. Thank you!

@jrhemstad
Copy link
Collaborator

This leads me to suspect that.. it might be an OptiX-specific issue.

I've already been in touch with the internal OptiX team and they are looking into it. Early indications make it sound like this is indeed an OptiX issue setting the wrong linkage.

@kririae
Copy link
Author

kririae commented Jul 4, 2024

Thank you for your explanation and help, hope to see the issue gets figured out and fixed!

@kririae kririae closed this as completed Jul 22, 2024
@jrhemstad
Copy link
Collaborator

Thank you for your explanation and help, hope to see the issue gets figured out and fixed!

Hey @kririae, I heard from the internal OptiX team that this issue has been fixed in the development branch for the OptiX linker. They weren't sure on exact dates of when the fix will be public, but it should be relatively soon!

@kririae
Copy link
Author

kririae commented Jul 31, 2024

Thank you for your explanation and help, hope to see the issue gets figured out and fixed!

Hey @kririae, I heard from the internal OptiX team that this issue has been fixed in the development branch for the OptiX linker. They weren't sure on exact dates of when the fix will be public, but it should be relatively soon!

Happy to see that! Again thank you for your patience and help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Archived in project
Development

No branches or pull requests

3 participants