rXg Knowledge Base

What is IDV (Internal Data Plane Virtualization) and how do I deploy it?

May 24, 2024

You have been told that you need to start using IDV. This likely happened because you were trying to support over 1000 DPL on a single bare metal node.  The most obvious starting point for this question is why.

Why?

  • Lower Backend Login Times
    • The more configurations and devices you have on the rXg, the more complex the ruleset becomes. This ruleset, which encompasses queueing, routing, and the firewall, can become time-consuming to calculate and load. However, with IDV, this process is significantly expedited, allowing for quicker and more efficient operations.

  • Lower Latency
    • Latency can be introduced anytime you reload large rulesets. This is a bigger issue in MDU than in LPV.

  • Better utilize high core count machines
    • Are machines with huge core counts fully utilized? From a network throughput perspective, the answer is often no. There are limitations to how many processors can handle traffic on a given interface through a given operating system instance. However, with IDV, you can optimize your network's performance by dedicating more processors to passing packets, ensuring a more efficient system.

  • Symptoms that IDV may resolve
    • High Backend Login Times
    • Latency introduced by the rXg

Sizing
  • Residential limit DPL size per node 1000
  • LPV can get away with 2000 DPL per node.

How do I determine the hardware needed for IDV?

Didn’t you know… There’s an app for that.  Check out our requirements calculator in the tools section of the RG Nets store.

1. Browse to https://store.rgnets.com/tools
2. Log In
3. Make sure that Auto Configure is selected.
4. Let’s take the example of 3000 SUL. We can deploy all 3k SUL on a single bare-metal machine, but we will need to use IDV to break up the data plane into (3) virtual machines.


  1. Click here to expand the overall machine specifications. 
  2. Hover and click to see the specs for the CP.
  3. Hover and click to see the specs for the DP.
  4. Hover and click to see the specs for the DP.
  5. Hover and click to see the specs for the DP.
  6. Hover and click to see the total system specifications.


Note: This is a very simple configuration.  Configura tions that support higher SUL counts and high availability are also available by adjusting the configuration for the calculator. 

How will my licensing change for IDV?

You will need to work with RG Nets support modify your asset from a standalone configuration to and IDV configuration.  Open a ticket at support.rgnets.com to request this modification.  If you have an idea of how many DPL per node you will require, be sure to include this information in the ticket.

How will my configuration change for IDV?

An IDV configuration is a cluster configuration.  If you have previously built clustered rXgs, you already know the theory for building an IDV cluster.  The difference between traditional clusters and IDV is that IDV nodes are built within an rXg instance leveraging the underlying FreeBSD feature Bhyve.  Traditional clusters will use bare metal machines or some other form of virtualization like ESXI.

At a high level, we will split the users across all (3) DP Nodes.  For the sake of our example, let’s say we are planning for the following use case.

Resident Network: 2000 Simultaneous Users Max
Guest Network: 500 Simultaneous Users Max
Infrastructure: 200 Devices


On a single node, you would probably do something like this:


Standalone Node
Public IPs: (1 for the rXg WAN, recommended 1 per 100 SUL for NAT) = 31 Public IPs
Guest Network: 500 VLANs / 1 Device per VLAN (/30)
Resident Network: 400 VLANS / 5 Devices per VLAN (/29)
Infrastructure: 200 Devices / 200 Devices per VLAN (/24)

In an IDV with a (1) CC and (3) DPs, you would change to something like this:


CC
Public IPs: (1 for the rXg WAN) = 1 Public IPs
Infrastructure 200 Devices 

Node 1
Public IPs: (1 for the rXg WAN, recommended 1 per 100 SUL for NAT) = 11 Public IPs
Resident Network: 200 VLANS / 5 Devices per VLAN (/29) = 1000 SUL

Node 2
Public IPs: (1 for the rXg WAN, recommended 1 per 100 SUL for NAT) = 11 Public IPs
Resident Network: 200 VLANS / 5 Devices per VLAN (/29) = 1000 SUL

Node 3
Public IPs: (1 for the rXg WAN, recommended 1 per 100 SUL for NAT) = 11 Public IPs
Guest Network: 500 VLANs / 1 Device per VLAN (/30) = 500 SUL


This architecture splits the loads across the nodes and keeps similar rulesets grouped together. We can assume the firewall experience for the resident network will be the same for all users, whereas the network experience for the Guest and Infrastructure will be different. As much as possible, we try to group similar user experiences together on the same nodes.

Regarding public IPs, we recommend 1 public IP per 100 SUL.  So for 3000 SUL, we would recommend 30 public IP addresses be dedicated to client NAT.  Note, that you still need (1) IP for every rXg WAN.  For a standalone node, this would be (1) IP address.  In this IDV example of a CC and 3 DP nodes, we would need (4) Public IPs.

Let’s build this out from scratch…


Make a plan

I am going to build a scaled-down model for the lab.  Here’s the plan.

Resident Network: 200 Simultaneous Users Max
Guest Network: 50 Simultaneous Users Max
Infrastructure: 20 Devices


Prepare the server

I will build a virtual machine in ESXi with the following specifications for testing. This machine could also be bare metal, but I use ESXI for most lab testing.

16 Cores (4 Per Node)
32 GB RAM (8 Per Node)
400 GB SSD (100 Per Node)
24.240.254.7-10/28

It is best practice to use the current official version when building new systems.

Let’s use this video to build the cluster: https://youtu.be/I72giHt7YXA?si=SbmJjDZzpSkbo7T4 

Notes: 
  1. To complete this lab, you must create (1) additional data plane node not mentioned in the video.
  2. In an HA configuration, each bare metal system can be a virtual host host. 

If you browse to Services >> Virtualization, your config should look something like this:

If you browse to System >> Cluster, your config should look something like this:


Apply the MDU Template

I will apply the MDU template from our template library to get a head start on the configuration.

Once you download the template, you can apply it by navigating to System >> Backup >> Config Templates.
  1. Click Create New
  2. Name: MDU Template
  3. Choose File: {Upload the file from our template library}
  4. Click Create
  5. Click Edit on the row with the label MDU Template
  6. Update the template with your WAN and LAN interfaces as indicated.  Make sure you copy these directly from the rXg config.  The interface names will include cc1 in a cluster.  For example, ‘cc1 vmx0’ is my wan interface, and ‘cc1 vmx2’ is my lan interface.
  7. Click Update.
  8. Click Test on the row with the label MDU Template.
  9. If the test is successful, you can proceed with Apply.


Adapt the MDU configuration to our lab application.


Configure the WAN Interface

1. Let’s update the WAN interfaces.  Browse to Network >> WAN.
2. We must edit each WAN interface and change the configuration from DHCP to the proper gateway address.  In my case, that will be 24.240.254.1.

3. Moving forward, a crucial step is configuring the public static IPs for each dateplane. To do this, navigate to Network >> WAN >> Network Addresses and create the necessary new records.
4. If you've followed the steps correctly, your configuration should resemble this:



Configure the LAN interface


Based on our plan above, we need to create enough networks for the following:

Resident Network: 200 Simultaneous Users Max >> 100 on Node 1, 100 on Node 2
Guest Network: 50 Simultaneous Users Max  >> Node 3
Infrastructure: 20 Devices >> CC

We are going to modify the configuration created by the MDU template to look like this:



Some things to take note of:

  1. The VLANs have been distributed to each DP's highest interface (LAN).
  2. I have used auto increment to specify the number of VLANs required.
  3. The network addresses are associated with the VLANs.
  4. Notice that dp3 Post-Auth Accounts have not been added to the IP group.  We need to do that next.

Adjust IP Groups


As mentioned above, we need to be sure to update the IP groups.  
  1. Browse to Identities >> Groups >> IP Groups.
  2. Ensure that all Address pools you created for Guests and Accounts are associated with the correct IP group.  Below is my lab example after making adjustments
  3. I will also add an IP group for my infrastructure network and associate it with the management policy.  This group should have a higher priority.


Adjust DHCP Pools


Since we made some changes to the network addresses, we need to confirm that the DHCP scopes are set up correctly.  I can see here that they are not.
Adjust the DHCP pools so that they look like this:

Adjust RADIUS Realms


We also need to ensure that all VLANs are associated with the RADIUS Realms. To do so, browse to Services >> RADIUS >> RADIUS Server Realms.

Currently, my RADIUS realms look like this:

I have adjusted it to look like this:

Note: This is a key step, as RADIUS is responsible for ensuring users are load-balanced across the available dataplanes.

This completes the IDV Configuration.

Troubleshooting

The troubleshooting steps are mostly the same as those for a traditional rXg cluster. In this section, I will highlight helpful additional items. First and foremost, let’s review the GUI.

Status Check

1. You can check the overall cluster health between the nodes by browsing to System >> Cluster >> Cluster Nodes.
All nodes are reachable and have communicated within the last 10 seconds.

2. Next, we can ensure traffic flows to and from the nodes by browsing to Instruments >> Traffic Rates.
3. Checking health notices is also another good way to find out if any problems are occurring.  Browse to Archives >> Notification Logs.  Typically, it is a good idea to clear any existing health notices and see which ones come back,  Those that come back should be evaluated and investigated if needed.


Virtualization Troubleshooting

rXg leverages the virtualization technology (Bhyve) built into the underlying operating system (FreeBSD).  Outside of this article, additional information can be found by using the keywords Bhyve and FreeBSD.

How do I access the dp console?

1. SSH to the virtualization host.  In my case, this will be CC1.
2. You must elevate your session to root access to access the virtualization commands.
3. Typing vm list will provide you with a list of VM names that will be helpful in the next step.
4. Now, to access the console for any one of the dataplanes, you can type vm console dp2, for example.
5. Now press return once more and you should see a login prompt.  If you press Ctrl D, it will refresh the screen and you will be able to see things like the host name, version, IUI, and interface details.
6. The login will be root.  You can then copy and paste the first block of letters from the IUI for the password.
7. A successful login will look like this:
8. To log out, you can press ~~. (Tilde Tilde Period) 

Note: Sometimes you have to do this a couple of times.  Also, make sure your console window is active when pressing this.  If that does not work, try ~ Ctrl D. (Tilde + Ctrl D)

Virtualization Host Networking Stack

The host networking stack manages communication between the host system and VMs, as well as routing packets to and from the external network.

1. Host Physical Interfaces - These interfaces on the virtualization server directly connect the rXg application to external networks such as WAN and LAN.

2. Virtual Switch / Bridge - The bridge interface is a virtual switch that connects multiple network interfaces, allowing VMs and the host system to communicate with each other and the external network.

3. Host Taps - TAP interfaces (tap) act as virtual network interfaces that allow VMs to communicate with the host's networking stack.

4. Virtual Interfaces - The virtual switch (vnet) is a mechanism that provides network isolation for bhyve VMs. It allows each VM to have its own networking stack and IP configuration, similar to having a separate physical machine.



Helpful commands for troubleshooting the virtual network



ifconfig

Running ifconfig from the host provides a good overview of the networking.

This is a sample of the ifconfig output specifically for vmx0.  This same output is available for each virtual switch that has been created.
Note: tcpdump commands can be used with the interfaces identified above to watch the packet flow.  

Here are some examples:

tcpdump -i vm-cc1_vmx1
 tcpdump -i tap1
 tcpdump -i vmx1

Checking packets at vmx1 and tap1 will show you packets passing between the VM and the host on the CIN network.

Here is a similar view found in the GUI by browning to Services >> Virtualization >> Virtual Interfaces.



vm list

Running vm list will display a list of the virtual machines on this particular host.


vm info

Running vm info dp2, for example, will provide additional networking information from the perspective of the VM.


vm switch list 

vm switch list will give you a summary of the switches that have been created.  Typically, there is one virtual switch for each interface on the host.


This is the view available in the GUI when browsing to Services >> Virtualization >> Virtual Switches.

vm switch info

vm switch info cc1_vmx0 will provide more details about a particular vswitch.


Frequently Asked Questions

1. What logs can be checked regarding clustering?
  • From the cc, you can monitor the following logs:
  • tfpostgres (database logging)
  • tfr (core rxg log)
  • You can check tfr on each dataplane node using the vm console commands referenced above or SSH directly to the dataplane. 
  • Another good check is to confirm communication between the CCs and DPs. I typically run tcpdump -i vmx1, where vmx1 is my CIN network. From here, I will monitor to make sure I see two-way traffic to each node.
  • Health Alerts are another good source of information.  Check health alerts for any clustering-related notifications. 
2. How can I detect an issue with a single data plane?
  • I normally start by checking how many login sessions exist on that node.  It should be about the same as other nodes with the same configuration.  I would expect dp2 and dp3 to have approximately the same number of login sessions in this lab example.
  • Another good sanity check is to click the Instruments link in the upper right corner and compare the traffic rates between the nodes and interfaces.
This is a lab system, so there is not very much useful information, but in a production system, you would want to look for severe drops in traffic.  This would indicate an issue with the corresponding node.

3. Do you have any general tips for client connectivity troubleshooting in an IDV environment?
  • In a clustered environment, the GUI will be your best friend as it will provide a consolidated view of all nodes.  Start with running the client IP or MAC in the global search window in the upper right corner.  This will pull all the necessary information back to a single view that can help you to quickly zero in on any potential issues.  
  • Global search can answer questions like:
    • Does the rXg see this client?  (Check the MAC scaffold)
    • Did we get a RADIUS request from the client? (Check the RADIUS log scaffold)
    • Did the client get a DHCP address? (Check the DHCP scaffold)
    • Did the client get NAT’d properly? (Checht the NAT scaffold)
  • You can also check the rxgd log.  `tfr | grep VMs`

Cookies help us deliver our services. By using our services, you agree to our use of cookies.