Trying to compile to Nvidia PTX, but Debug adds fns that crash and Release strips everything

Hi,

I’m trying to get a PTX output running against the Nvidia Cuda driver on Windows 10, and I have 2 modes:

  • Debug build mode produces 450KB of PTX, but it crashes on run, presumably on all the Debug functions.
  • ReleaseSafe/Small/Fall strip everything so only 89 bytes of PTX appear

Ive tried doing export on the PTX main(), and comptime { _ = &main; } to try to force it to stay, neither work.

Any ideas on how to solve this?

Specific files are here:

Build: VDRProlog/build.zig at main · ghowland/VDRProlog · GitHub
CPU Kernel: VDRProlog/src/test_ptx.zig at main · ghowland/VDRProlog · GitHub
PTX Kernel: VDRProlog/src/vlp_kernel.zig at main · ghowland/VDRProlog · GitHub

I tried SPIR-V initially but have even more build problems, this seems like it’s close to being testable.

I am trying to avoid using the full CUDA setup and just use the library directly, which should work if the entry kernel can be reduced in size.

Zig 0.16.0. GPU is 1660 Ti → std.Target.nvptx.cpu.sm_75.

Thanks!

-geoff

Have you tried to use the @export builtin? Documentation link
Another way to do it is to set the kernel_module.entry field to the appropriate Entry type
Maybe something like

      const kernel_module = b.createModule(.{
          .root_source_file = b.path("src/vlp_kernel.zig"),
          .target = ptx_target,
          .optimize = .ReleaseSafe,
      });
      kernel_module.addImport("vlp_gpu_shared", gpu_shared_kernel);
+     kernel_module.entry = .{ .symbol_name = "main" };

With freestanding os, you often have to manage the exports/imports yourself. I’ve had to deal with this when using the wasm freestanding target.

I will try those, thank you!