CMake CUDA Architecture Configuration: A Practical Guide

When building CUDA applications with CMake, one of the most common pain points is handling GPU architectures correctly. Let's explore the right way to do this in 2025.

The Problem

You've probably seen (or written) code like this:

set_target_properties(my_cuda_app PROPERTIES 
    CUDA_ARCHITECTURES 70)  # Hardcoded for Tesla T4

This works... until you try to run on a different GPU, or distribute your code to others. Not ideal.

The Solution: Let CMake Do The Work

Modern Approach (CMake 3.23+)

The simplest solution? Don't set it at all.

cmake_minimum_required(VERSION 3.23)
project(MyCudaApp LANGUAGES CXX CUDA)

add_executable(my_app main.cu)

set_target_properties(my_app PROPERTIES 
    CUDA_SEPARABLE_COMPILATION ON
    CXX_STANDARD 17)

# That's it! CMake will auto-detect your GPU

CMake 3.23+ automatically detects the GPU on your build machine and compiles for it.

Using Special Architecture Values

CMake provides three special values for CMAKE_CUDA_ARCHITECTURES:

# Auto-detect GPU on build machine (CMake 3.24+)
set(CMAKE_CUDA_ARCHITECTURES native)

# Compile for ALL supported architectures (slow, large binaries)
set(CMAKE_CUDA_ARCHITECTURES all)

# Compile for major architecture versions only (balanced)
set(CMAKE_CUDA_ARCHITECTURES all-major)

Flexible Configuration

For maximum flexibility, allow users to override while providing sensible defaults:

cmake_minimum_required(VERSION 3.23)
project(MyCudaApp LANGUAGES CXX CUDA)

# Default to native detection, but allow override
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
    set(CMAKE_CUDA_ARCHITECTURES native)
endif()

add_executable(my_app main.cu)
set_target_properties(my_app PROPERTIES CUDA_SEPARABLE_COMPILATION ON)

Users can now override at build time:

cmake -DCMAKE_CUDA_ARCHITECTURES="80;86" ..

Understanding CUDA Compute Capabilities

Here's a quick reference for common GPU architectures:

Architecture	Compute Capability	Examples
Pascal	60, 61	P100, GTX 1080
Volta	70	V100, Titan V
Turing	75	RTX 2080, T4
Ampere	80, 86	A100, RTX 3090, Orin
Ada Lovelace	89	RTX 4090, L40
Hopper	90	H100, H200

Real vs Virtual Architectures

When you specify architectures, you can use suffixes:

# Generate native code only (SASS) - faster runtime, GPU-specific
set(CMAKE_CUDA_ARCHITECTURES 80-real)

# Generate PTX intermediate code - slower runtime, forward compatible
set(CMAKE_CUDA_ARCHITECTURES 80-virtual)

# Generate both (default if no suffix)
set(CMAKE_CUDA_ARCHITECTURES 80)

Rule of thumb:

Use -real for production builds targeting specific hardware
Omit suffix for distribution (includes PTX for forward compatibility)
Never use -virtual alone unless you have a specific reason

Practical Examples

Development Build (Fast Iteration)

# Just compile for your local GPU
set(CMAKE_CUDA_ARCHITECTURES native)

Distribution Build (Maximum Compatibility)

# Cover common datacenter and consumer GPUs (2017-2024)
set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86-real 89 90)

Note: Including one architecture without -real (like 89) adds PTX for forward compatibility.

Edge Deployment (Specific Hardware)

# You know your target hardware
set(CMAKE_CUDA_ARCHITECTURES 87)  # NVIDIA Orin

CI/CD Pipeline

if(DEFINED ENV{CI})
    # In CI, compile for multiple targets
    set(CMAKE_CUDA_ARCHITECTURES 70 75 80 86 89 90)
else()
    # Local development: use native
    set(CMAKE_CUDA_ARCHITECTURES native)
endif()

Complete Example

Here's a production-ready CMakeLists.txt:

cmake_minimum_required(VERSION 3.23)
project(CudaVisionApp VERSION 1.0.0 LANGUAGES CXX CUDA)

# Architecture configuration
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
    if(DEFINED ENV{CI})
        # CI: compile for multiple common architectures
        set(CMAKE_CUDA_ARCHITECTURES 70 75 80 86 89 90)
        message(STATUS "CI detected: compiling for multiple architectures")
    else()
        # Local: auto-detect current GPU
        set(CMAKE_CUDA_ARCHITECTURES native)
        message(STATUS "Using native GPU architecture detection")
    endif()
endif()

message(STATUS "CUDA architectures: ${CMAKE_CUDA_ARCHITECTURES}")

# Your CUDA executable
add_executable(vision_app 
    src/main.cu
    src/kernel.cu
)

target_include_directories(vision_app PRIVATE include)

set_target_properties(vision_app PROPERTIES
    CUDA_SEPARABLE_COMPILATION ON
    CXX_STANDARD 17
    CUDA_STANDARD 17
)

# Optional: link against cuDNN, TensorRT, etc.
# find_package(CUDAToolkit REQUIRED)
# target_link_libraries(vision_app PRIVATE CUDA::cudart)

Build Commands

# Use defaults (native detection)
cmake -B build
cmake --build build

# Override for specific architecture
cmake -B build -DCMAKE_CUDA_ARCHITECTURES="80;86"
cmake --build build

# Override for all major architectures
cmake -B build -DCMAKE_CUDA_ARCHITECTURES=all-major
cmake --build build

# Check what architectures were used
cmake -B build -DCMAKE_CUDA_ARCHITECTURES=native --trace-expand | grep CUDA_ARCHITECTURES

Common Pitfalls

❌ Don't hardcode architectures in CMakeLists.txt

set(CMAKE_CUDA_ARCHITECTURES 70)  # Bad!

✅ Do provide flexible defaults

if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
    set(CMAKE_CUDA_ARCHITECTURES native)
endif()

❌ Don't use `all` for production

set(CMAKE_CUDA_ARCHITECTURES all)  # Extremely slow builds, huge binaries

✅ Do target your actual deployment hardware

set(CMAKE_CUDA_ARCHITECTURES 80-real 86-real 89)  # A100, RTX30xx, RTX40xx + PTX

Version Compatibility

CMake Version	Feature
3.18+	`CMAKE_CUDA_ARCHITECTURES` property
3.23+	Automatic architecture detection
3.24+	`native` value support

If you're stuck on CMake < 3.23, you'll need to detect manually or require users to specify architectures.

Conclusion

For most projects in 2025:

Use CMake 3.23 or newer
Set CMAKE_CUDA_ARCHITECTURES to native as default
Allow override via command line: -DCMAKE_CUDA_ARCHITECTURES="..."
For distribution, explicitly list your target architectures

This gives you fast local development while maintaining flexibility for CI/CD and distribution.