Running LLM OpenAI Open Source Model with vLLM and GPU NVIDIA L4
Running openai/gpt-oss-20b local model with NVIDIA L4 GPU this model can actually run on a consumer RTX Series GPU with ~16GB of VRAM. I divided it into two parts: running manually and using a container using the Ubuntu 24.04 LTS operating system. Preparation Installing drivers and dependencies wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb && rm -rf cuda-keyring_1.1-1_all.deb sudo apt update && sudo apt install -y \ linux-headers-$(uname -r) \ libnvidia-compute-580 nvidia-dkms-580-open \ datacenter-gpu-manager-4-cuda-all \ datacenter-gpu-manager-exporter \ cuda-toolkit nvtop build-essential We need a host reboot to apply the GPU driver. ...