When building CUDA applications with CMake, one of the most common pain points is handling GPU architectures correctly. Let's explore the right way to do this in 2025.
The Problem
You've probably seen (or written) code like this:
set_target_properties(my_cuda_app PROPERTIES
CUDA_ARCHITECTURES 70) # Hardcoded for Tesla T4
This works... until you try to run on a different GPU, or distribute your code to others. Not ideal.
The Solution: Let CMake Do The Work
Modern Approach (CMake 3.23+)
The simplest solution? Don't set it at all.
cmake_minimum_required(VERSION 3.23)
project(MyCudaApp LANGUAGES CXX CUDA)
add_executable(my_app main.cu)
set_target_properties(my_app PROPERTIES
CUDA_SEPARABLE_COMPILATION ON
CXX_STANDARD 17)
# That's it! CMake will auto-detect your GPU
CMake 3.23+ automatically detects the GPU on your build machine and compiles for it.
Using Special Architecture Values
CMake provides three special values for CMAKE_CUDA_ARCHITECTURES:
# Auto-detect GPU on build machine (CMake 3.24+)
set(CMAKE_CUDA_ARCHITECTURES native)
# Compile for ALL supported architectures (slow, large binaries)
set(CMAKE_CUDA_ARCHITECTURES all)
# Compile for major architecture versions only (balanced)
set(CMAKE_CUDA_ARCHITECTURES all-major)
Flexible Configuration
For maximum flexibility, allow users to override while providing sensible defaults:
cmake_minimum_required(VERSION 3.23)
project(MyCudaApp LANGUAGES CXX CUDA)
# Default to native detection, but allow override
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES native)
endif()
add_executable(my_app main.cu)
set_target_properties(my_app PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
Users can now override at build time:
cmake -DCMAKE_CUDA_ARCHITECTURES="80;86" ..
Understanding CUDA Compute Capabilities
Here's a quick reference for common GPU architectures:
| Architecture | Compute Capability | Examples |
|---|---|---|
| Pascal | 60, 61 | P100, GTX 1080 |
| Volta | 70 | V100, Titan V |
| Turing | 75 | RTX 2080, T4 |
| Ampere | 80, 86 | A100, RTX 3090, Orin |
| Ada Lovelace | 89 | RTX 4090, L40 |
| Hopper | 90 | H100, H200 |
Real vs Virtual Architectures
When you specify architectures, you can use suffixes:
# Generate native code only (SASS) - faster runtime, GPU-specific
set(CMAKE_CUDA_ARCHITECTURES 80-real)
# Generate PTX intermediate code - slower runtime, forward compatible
set(CMAKE_CUDA_ARCHITECTURES 80-virtual)
# Generate both (default if no suffix)
set(CMAKE_CUDA_ARCHITECTURES 80)
Rule of thumb:
- Use
-realfor production builds targeting specific hardware - Omit suffix for distribution (includes PTX for forward compatibility)
- Never use
-virtualalone unless you have a specific reason
Practical Examples
Development Build (Fast Iteration)
# Just compile for your local GPU
set(CMAKE_CUDA_ARCHITECTURES native)
Distribution Build (Maximum Compatibility)
# Cover common datacenter and consumer GPUs (2017-2024)
set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86-real 89 90)
Note: Including one architecture without -real (like 89) adds PTX for forward compatibility.
Edge Deployment (Specific Hardware)
# You know your target hardware
set(CMAKE_CUDA_ARCHITECTURES 87) # NVIDIA Orin
CI/CD Pipeline
if(DEFINED ENV{CI})
# In CI, compile for multiple targets
set(CMAKE_CUDA_ARCHITECTURES 70 75 80 86 89 90)
else()
# Local development: use native
set(CMAKE_CUDA_ARCHITECTURES native)
endif()
Complete Example
Here's a production-ready CMakeLists.txt:
cmake_minimum_required(VERSION 3.23)
project(CudaVisionApp VERSION 1.0.0 LANGUAGES CXX CUDA)
# Architecture configuration
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
if(DEFINED ENV{CI})
# CI: compile for multiple common architectures
set(CMAKE_CUDA_ARCHITECTURES 70 75 80 86 89 90)
message(STATUS "CI detected: compiling for multiple architectures")
else()
# Local: auto-detect current GPU
set(CMAKE_CUDA_ARCHITECTURES native)
message(STATUS "Using native GPU architecture detection")
endif()
endif()
message(STATUS "CUDA architectures: ${CMAKE_CUDA_ARCHITECTURES}")
# Your CUDA executable
add_executable(vision_app
src/main.cu
src/kernel.cu
)
target_include_directories(vision_app PRIVATE include)
set_target_properties(vision_app PROPERTIES
CUDA_SEPARABLE_COMPILATION ON
CXX_STANDARD 17
CUDA_STANDARD 17
)
# Optional: link against cuDNN, TensorRT, etc.
# find_package(CUDAToolkit REQUIRED)
# target_link_libraries(vision_app PRIVATE CUDA::cudart)
Build Commands
# Use defaults (native detection)
cmake -B build
cmake --build build
# Override for specific architecture
cmake -B build -DCMAKE_CUDA_ARCHITECTURES="80;86"
cmake --build build
# Override for all major architectures
cmake -B build -DCMAKE_CUDA_ARCHITECTURES=all-major
cmake --build build
# Check what architectures were used
cmake -B build -DCMAKE_CUDA_ARCHITECTURES=native --trace-expand | grep CUDA_ARCHITECTURES
Common Pitfalls
❌ Don't hardcode architectures in CMakeLists.txt
set(CMAKE_CUDA_ARCHITECTURES 70) # Bad!
✅ Do provide flexible defaults
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES native)
endif()
❌ Don't use all for production
set(CMAKE_CUDA_ARCHITECTURES all) # Extremely slow builds, huge binaries
✅ Do target your actual deployment hardware
set(CMAKE_CUDA_ARCHITECTURES 80-real 86-real 89) # A100, RTX30xx, RTX40xx + PTX
Version Compatibility
| CMake Version | Feature |
|---|---|
| 3.18+ | CMAKE_CUDA_ARCHITECTURES property |
| 3.23+ | Automatic architecture detection |
| 3.24+ | native value support |
If you're stuck on CMake < 3.23, you'll need to detect manually or require users to specify architectures.
Conclusion
For most projects in 2025:
- Use CMake 3.23 or newer
- Set
CMAKE_CUDA_ARCHITECTUREStonativeas default - Allow override via command line:
-DCMAKE_CUDA_ARCHITECTURES="..." - For distribution, explicitly list your target architectures
This gives you fast local development while maintaining flexibility for CI/CD and distribution.