rXg Knowledge Base

GPU recommendations for rXg LLM

May 16, 2024

rXg includes LLM and RAG capabilities that depends on GPUs

The recommended production deployment system architecture is to put the GPU(s) in the Fleet Manager. rXg can perform local RAG while leveraging a remote LLM. It is recommended to use WireGuard to create an SD-WAN between the rXg edges and the rXg Fleet Manager. 

For very basic testing purposes, a GPU with 8 GB of VRAM can run quantized Mixtral 7b and Llama 3 8b.  Examples of GPUs for this purpose - Nvidia 3070 and Nvidia 4060.

The minimum PoC would be using 24 GB of VRAM can run fp16 Llama 3 8b, 2-bit Llama 3 70b, and 3-bit Mixtral 8x7b - Nvidia 3090 and Nvidia 4090.

The Nvidia 3090 Founder's Edition is readily available for a reasonable price and it occupies three slots. Most Nvidia 4090 examples require 4 slots and are extremely heavy.

The best "bang of the buck" is to install multiple 24GB GPUs such as the Nvidia 3090. The rXg will automatically utilize multiple cards. 

48 GB (2 x 24GB) of VRAM allows for the utilization of more precise quantizations of Llama 3 70b and Mixtral 8x7b.

72 GB (3 x 24GB) of VRAM allows for the utilization of Mixtral 8x22b, a model that has generates wonderful inferences. 

For production environments, the Nvidia professional line GPUs are highly recommended. The Nvidia A6000 ADA has 48 GB of VRAM, active cooling, and occupies two slots. This is the highest density Nvidia card with active cooling that will work in any chassis.

The Nvidia L40 (48GB) and H100 PCI-e (96GB) are the most powerful GPUs that can be considered. These GPUs require special chassis as the heatsinks on them are passive and they work with blower fans integrated into the chassis.


Categories
Configuration Guides
FAQ
3rd Party
Features and Capabilities
Known Issues

Tags
SoftGRE
RUCKUS
SmartZone
IPMI
Dell
Fleet Manager
ESXI
Hardware
Extreme
NAT
Bhyve
Upgrading
DHCP
Performance Improvements
DNS
Licensing
RADIUS
CLI
API
Configuration Templates
SD-WAN
IDV
NVIDIA
IPv6

Cookies help us deliver our services. By using our services, you agree to our use of cookies.