GPU Acceleration
Presidio supports GPU acceleration for NLP models, which can significantly improve performance when processing large volumes of text. GPU is automatically detected and used when available—no code changes required.
Prerequisites
Hardware Requirements
| Hardware | Requirement |
|---|---|
| NVIDIA GPU | CUDA Toolkit 11.x or 12.x |
| Apple Silicon | MPS (Metal Performance Shaders) on M1/M2/M3 |
| CPU | Automatic fallback when GPU is unavailable |
Software Dependencies
Install the appropriate CUDA library matching your CUDA version:
pip install "spacy[cuda12x]" # For CUDA 12.x
# or
pip install "spacy[cuda11x]" # For CUDA 11.x
Silent Fallback
If cupy is not installed or the version mismatches your CUDA driver, spaCy will silently fall back to CPU. Always verify your setup using the methods below.
No additional dependencies required. MPS is detected automatically.
Verifying GPU Usage
To check if GPU is available and being used:
import torch
if torch.cuda.is_available():
print(f"GPU available: CUDA ({torch.cuda.get_device_name(0)})")
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
print("GPU available: MPS (Apple Silicon)")
else:
print("Using CPU")
You can also monitor GPU usage during processing:
# For NVIDIA GPUs
watch -n 1 nvidia-smi
# For Apple Silicon
sudo powermetrics --samplers gpu_power
GPU-Enabled NLP Engines
| Engine | GPU Support |
|---|---|
| TransformersNlpEngine | ✅ Full |
| GLiNER | ✅ Full |
| Stanza | ✅ Full |
| spaCy (transformer models) | ✅ Full |
| spaCy (standard models) | ⚠️ Not recommended |
Standard spaCy Models
GPU is recommended for Transformers, Stanza, and GLiNER workloads. Standard spaCy models (e.g., en_core_web_lg) may be slower on GPU due to data transfer overhead.
Troubleshooting
GPU not detected
-
Check CUDA is installed:
nvidia-smi nvcc --version -
Verify cupy can see your GPU:
import cupy as cp print(cp.cuda.runtime.getDeviceCount()) # Should be > 0 -
Reinstall cupy with correct CUDA version:
pip uninstall cupy cupy-cuda11x cupy-cuda12x pip install cupy-cuda12x # Match your CUDA version
Out of memory errors
If you see RuntimeError: CUDA out of memory:
- Use a smaller transformer model (e.g.,
dslim/bert-base-NERinstead of larger models) - Split longer texts into smaller chunks before processing
- Reduce the maximum sequence length if using Hugging Face transformers
CPU fallback
Presidio automatically uses CPU if:
- No GPU is detected
- CUDA or cupy are not installed
- GPU initialization fails
This ensures code portability across different environments.
Additional Resources
Related Presidio documentation: