GPU Acceleration

Presidio supports GPU acceleration for NLP models, which can significantly improve performance when processing large volumes of text. GPU is automatically detected and used when available—no code changes required.

Prerequisites

Hardware Requirements

Hardware	Requirement
NVIDIA GPU	CUDA Toolkit 11.x or 12.x
Apple Silicon	MPS (Metal Performance Shaders) on M1/M2/M3
CPU	Automatic fallback when GPU is unavailable

Software Dependencies

NVIDIA GPUApple Silicon

Install the appropriate CUDA library matching your CUDA version:

pip install "spacy[cuda12x]"  # For CUDA 12.x
# or
pip install "spacy[cuda11x]"  # For CUDA 11.x

Silent Fallback

If cupy is not installed or the version mismatches your CUDA driver, spaCy will silently fall back to CPU. Always verify your setup using the methods below.

No additional dependencies required. MPS is detected automatically.

Verifying GPU Usage

To check if GPU is available and being used:

import torch

if torch.cuda.is_available():
    print(f"GPU available: CUDA ({torch.cuda.get_device_name(0)})")
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
    print("GPU available: MPS (Apple Silicon)")
else:
    print("Using CPU")

You can also monitor GPU usage during processing:

# For NVIDIA GPUs
watch -n 1 nvidia-smi

# For Apple Silicon
sudo powermetrics --samplers gpu_power

GPU-Enabled NLP Engines

Engine	GPU Support
TransformersNlpEngine	✅ Full
GLiNER	✅ Full
Stanza	✅ Full
spaCy (transformer models)	✅ Full
spaCy (standard models)	⚠️ Not recommended

Standard spaCy Models

GPU is recommended for Transformers, Stanza, and GLiNER workloads. Standard spaCy models (e.g., en_core_web_lg) may be slower on GPU due to data transfer overhead.

Troubleshooting

GPU not detected

Check CUDA is installed:
```
nvidia-smi
nvcc --version
```

Verify cupy can see your GPU:

import cupy as cp
print(cp.cuda.runtime.getDeviceCount())  # Should be > 0

Reinstall cupy with correct CUDA version:

pip uninstall cupy cupy-cuda11x cupy-cuda12x
pip install cupy-cuda12x  # Match your CUDA version

Out of memory errors

If you see RuntimeError: CUDA out of memory:

Use a smaller transformer model (e.g., dslim/bert-base-NER instead of larger models)
Split longer texts into smaller chunks before processing
Reduce the maximum sequence length if using Hugging Face transformers

CPU fallback

Presidio automatically uses CPU if:

No GPU is detected
CUDA or cupy are not installed
GPU initialization fails

This ensures code portability across different environments.