Hi,
I’m trying to get a PTX output running against the Nvidia Cuda driver on Windows 10, and I have 2 modes:
- Debug build mode produces 450KB of PTX, but it crashes on run, presumably on all the Debug functions.
- ReleaseSafe/Small/Fall strip everything so only 89 bytes of PTX appear
Ive tried doing export on the PTX main(), and comptime { _ = &main; } to try to force it to stay, neither work.
Any ideas on how to solve this?
Specific files are here:
Build: VDRProlog/build.zig at main · ghowland/VDRProlog · GitHub
CPU Kernel: VDRProlog/src/test_ptx.zig at main · ghowland/VDRProlog · GitHub
PTX Kernel: VDRProlog/src/vlp_kernel.zig at main · ghowland/VDRProlog · GitHub
I tried SPIR-V initially but have even more build problems, this seems like it’s close to being testable.
I am trying to avoid using the full CUDA setup and just use the library directly, which should work if the entry kernel can be reduced in size.
Zig 0.16.0. GPU is 1660 Ti → std.Target.nvptx.cpu.sm_75.
Thanks!
-geoff