GPU Flavors¶
Safespring Compute offers GPU-enabled flavors for workloads that require hardware-accelerated computing, such as machine learning inference, LLM hosting, and other GPU-intensive applications.
This page includes OpenStack CLI commands. See the API Access documentation for instructions on how to install and configure the command line client.
Available GPU hardware¶
GPU flavors on Safespring are equipped with the NVIDIA A2 accelerator:
| Property | Specification |
|---|---|
| GPU model | NVIDIA A2 |
| GPU memory | 16 GB GDDR6 |
| Architecture | Ampere |
| Use case | Inference, lightweight training, video encoding |
GPU flavor naming¶
GPU flavors follow the same naming convention as standard flavors, with an additional gA2 suffix indicating the attached GPU. For example:
| Flavor name | VCPUs | RAM | GPU |
|---|---|---|---|
b2.c4r8.gA2 |
4 | 8 GB | 1x NVIDIA A2 |
Availability
GPU flavors are currently available in the STO2 site. Contact support to verify availability and to get GPU flavors enabled for your project.
Restrictions¶
- GPU flavors cannot be converted to non-GPU flavors, and vice versa. See the Flavors documentation for more details on resizing restrictions.
- Each GPU flavor provides a single GPU passthrough to the instance. Multi-GPU configurations are not available through standard flavors.
Setting up NVIDIA drivers on an instance¶
GPU flavors provide the hardware, but the instance operating system needs NVIDIA drivers installed to use the GPU. The following example uses Ubuntu 24.04.
1. Install the driver¶
sudo apt update && sudo apt upgrade -y
sudo apt install ubuntu-drivers-common
List available drivers to find the recommended version:
ubuntu-drivers devices
Install the recommended server driver:
sudo apt install nvidia-driver-580-server-open sudo reboot
2. Verify the GPU¶
After reboot, verify that the GPU is detected:
nvidia-smi
This should display the NVIDIA A2 GPU, the driver version, and CUDA version. Example output:
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.172.08 Driver Version: 570.172.08 CUDA Version: 12.9 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | |=========================================+========================+======================| | 0 NVIDIA A2 On | 00000000:00:05.0 Off | 0 | | 0% 35C P8 5W / 60W | 0MiB / 15356MiB | 0% Default | +-----------------------------------------+------------------------+----------------------+
Example: Running a local LLM with Ollama¶
A common use case for GPU flavors is hosting local LLMs for inference. The following example uses Ollama to run models and Open-WebUI to provide a browser-based chat interface.
Install Ollama¶
curl -fsSL https://ollama.com/install.sh | sh
ollama -v
Pull and run a model¶
ollama pull llama3:8b ollama run llama3:8b
You can monitor GPU utilization while the model is running:
nvidia-smi
Access Ollama remotely via SSH forwarding¶
Ollama listens on localhost:11434. To access it from your local machine, use SSH port forwarding:
ssh -L 11434:localhost:11434 ubuntu@<instance-ip>
Add a web interface with Open-WebUI¶
Install Docker on the instance:
sudo apt install -y docker.io sudo systemctl enable --now docker sudo usermod -aG docker "$USER"
Log out and back in (with port forwarding for the web interface):
ssh -L 8080:localhost:8080 ubuntu@<instance-ip>
Start Open-WebUI:
docker run -d \ --name open-webui \ --network=host \ -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \ -v open-webui:/app/backend/data \ --restart always \ ghcr.io/open-webui/open-webui:latest
Open http://localhost:8080 in your browser to access the chat interface. All processing happens locally on your instance.
GPU in Kubernetes¶
If you are using Safespring's On-demand Kubernetes service, GPU support is available through worker nodes with GPU flavors. See the Kubernetes GPU documentation for details on how to use GPUs in Kubernetes workloads.