Onboarding VLAN Conflict Causing Captive Portal Bypass

April 08, 2026

Subsystem: Captive Portal, PF Firewall, VLANs, RADIUS Severity: High -- Intermittent portal bypass for new onboarding users; hard to diagnose; can affect 5-10% of traffic

Keywords: portal bypass, VLAN conflict, open SSID, members_ACC, no rdr exemption, pf rule, onboarding IP, residential MAC, portal not loading, CNA missing, semi-static cluster, PF anchor

Problem / Question

New onboarding users on the captive portal SSID do not see the CNA (Captive Network Assistant) popup
Test devices browse directly to external sites instead of being redirected to the portal
Issue is intermittent -- some devices get portal, others don't; restarting rxgd sometimes helps temporarily
curl http://neverssl.com from an onboarding device returns the external page (not a 302 redirect)
Admin UI shows zero-byte login sessions for the affected devices

Environment

Deployment type: Semi-static cluster with 5+ nodes (e.g., 7-node Bhyve cluster)
Onboarding architecture: One dedicated node with disable_pf_triggers=false and dynamic PF rdr rules; all other residential nodes have disable_pf_triggers=true
WLANs: Open (unencrypted) onboarding SSID with auth=none, encryption=none
RADIUS: Separate realm for onboarding and residential accounts; same device MACs may appear in both realms
Does NOT affect: Standalone single-node deployments; DPU-based deployments; sites using encrypted or MAC-auth onboarding SSIDs

Root Cause

When an open onboarding SSID allows previously authenticated residential devices to reconnect, they:

Get DHCP leases on the onboarding VLAN with onboarding IPs
While simultaneously having active leases on their residential VLANs
The expire_sessions background process detects this MAC on two VLANs simultaneously and logs it as a VLAN conflict
As part of conflict handling, expire_sessions adds the onboarding IP to the residential account group's PF members table (e.g., members_ACC5)
The PF anchor hierarchy contains a no rdr rule: no rdr from <members_ACC5> to any
This rule evaluates BEFORE the onboarding rdr redirect rules
When a new onboarding device gets assigned a polluted onboarding IP, it matches the no rdr rule and is exempted from captive portal redirect

Affected code/config: - rxgd/bin/expire_sessions:388-395 -- VLAN conflict detection - rxgd/Rxg/App/Pfctl.pm:604 -- PF rule loading and anchor refresh logic - console/app/models/pf_rule.rb -- PF rule generation; no rdr rule ordering

PART 1: DIAGNOSIS

Step 1: Check if PF rdr rules are loaded and evaluating

Command: bash pfctl -a "rGRP/r21IP28" -vsn | grep rdr

What to look for: - Presence of rdr rules for onboarding HTTP/HTTPS - If output is empty, the anchor was never loaded

Step 2: Check if onboarding IPs are in the residential account group's PF table

Command: bash pfctl -t members_ACC5 -T show

What to look for: - Presence of onboarding range IPs (e.g., 10.20.x.x) in the table -- these are residential devices stuck on the onboarding VLAN

Problem state: 10.20.5.156 10.20.10.161 10.20.11.88 10.20.12.151

Normal state: ```

Output is empty, or contains only residential IPs -- NOT onboarding range


---

## Step 3: Check VLAN conflict log volume

**Command:**
```bash
grep -c "ignoring IP.*behind another cluster node with a more recent VLAN" /var/log/rxgd.log

What to look for: - > 10K entries in 24 hours indicates systemic pollution

Step 4: Check PF state table for affected devices

Command: bash pfctl -s state | grep "10\.20\." | head -20

Problem state (bypass -- direct external access): tcp 10.20.12.176:54321 <-> 93.184.216.34:80 ESTABLISHED:ESTABLISHED

Normal state (redirection working): tcp 10.20.12.176:54321 <-> 127.0.0.1:80 ESTABLISHED:ESTABLISHED

PART 2: IMMEDIATE REMEDIATION

Flush the polluted PF table to unblock new devices; this is temporary and must be combined with monitoring.

Step 1: Flush the polluted residential account group PF table

pfctl -t members_ACC5 -T flush

Step 2: Kill stale PF states from the onboarding VLAN

pfctl -k 10.20.0.0/16

Step 3: Reload rxgd to restore clean PF tables

service rxgd restart

Step 4: Test portal with a new device

Connect a test device to the onboarding SSID and browse to any HTTP site. Expected: 302 redirect, then portal splash page loads.

PART 3: PERMANENT PROTECTION

Step 1: Install a 5-minute auto-flush cron job

On the onboarding node, add to root's crontab: ```bash crontab -e

Add this line (adjust table name and IP range):

*/5 * * * * /sbin/pfctl -t members_ACC5 -T flush && /sbin/pfctl -k 10.20.0.0/16 ```

Step 2: Reduce VLAN conflict rate (medium-term)

Option A: Lower DHCP lease time on onboarding pool

In rXg Admin UI, navigate to Network > DHCP > Pools > [Onboarding VLAN]: - Change "Lease Time" from 1800s (30 min) to 600s (10 min) or 300s (5 min)

Step 3: Permanent fix -- Enable MAC-auth on onboarding SSID (long-term)

On the wireless controller: 1. Navigate to WLANs > [Onboarding SSID] 2. Change Security > Authentication from Open to MAC (Open with MAC-auth) 3. Bind the Onboarding RADIUS server 4. Deploy to all APs

What this does: - Devices with registered MACs (residential accounts) are rejected at association time - Only truly new devices are allowed on the SSID - No more VLAN conflicts from dual-connected residential devices

MONITORING COMMANDS REFERENCE

VLAN Conflict Rate: bash grep "ignoring IP.*behind another cluster node" /var/log/rxgd.log | wc -l tail -f /var/log/rxgd.log | grep "behind another cluster node"

Onboarding IP Pollution: bash pfctl -t members_ACC5 -T show | grep "10\.20\." | wc -l

Portal Redirect Activity: bash pfctl -a "rGRP/r21IP28" -vsn | grep "state creations"

PF State Table (onboarding VLAN): bash pfctl -s state | grep "10\.20\." | wc -l

PREVENTION CHECKLIST

[ ] Confirm open SSID is not mission-critical -- if possible, require MAC-auth
[ ] Check cron job is installed and running
[ ] Monitor VLAN conflict log volume daily -- alert if rate exceeds 100/hour
[ ] Verify PF table stays clean -- pfctl -t members_ACC5 -T show | wc -l should stay < 5
[ ] Test portal with new devices weekly
[ ] Track zero-byte login sessions
[ ] Schedule permanent MAC-auth rollout on onboarding SSID

QUICK REFERENCE CARD

DIAGNOSE:

pfctl -a "rGRP/r21IP28" -vsn | grep "state creations"
pfctl -t members_ACC5 -T show | grep "10\.20\."
grep -c "behind another cluster node" /var/log/rxgd.log

FIX:

pfctl -t members_ACC5 -T flush
pfctl -k 10.20.0.0/16
service rxgd restart
echo "*/5 * * * * /sbin/pfctl -t members_ACC5 -T flush && /sbin/pfctl -k 10.20.0.0/16" | crontab -

VERIFY:

pfctl -t members_ACC5 -T show | wc -l  # Should output 0
pfctl -a "rGRP/r21IP28" -vsn | grep "rdr" | head -3
curl -v http://neverssl.com 2>&1 | grep -E "HTTP/|Location"  # Should show 302

back to articles home

Onboarding VLAN Conflict Causing Captive Portal Bypass

Problem / Question

Environment

Root Cause

PART 1: DIAGNOSIS

Step 1: Check if PF rdr rules are loaded and evaluating

Step 2: Check if onboarding IPs are in the residential account group's PF table

Output is empty, or contains only residential IPs -- NOT onboarding range

Step 4: Check PF state table for affected devices

PART 2: IMMEDIATE REMEDIATION

Step 1: Flush the polluted residential account group PF table

Step 2: Kill stale PF states from the onboarding VLAN

Step 3: Reload rxgd to restore clean PF tables

Step 4: Test portal with a new device

PART 3: PERMANENT PROTECTION

Step 1: Install a 5-minute auto-flush cron job

Add this line (adjust table name and IP range):

Step 2: Reduce VLAN conflict rate (medium-term)

Step 3: Permanent fix -- Enable MAC-auth on onboarding SSID (long-term)

MONITORING COMMANDS REFERENCE

PREVENTION CHECKLIST

QUICK REFERENCE CARD

DIAGNOSE:

FIX:

VERIFY:

Categories

Configuration Guides

FAQ

3rd Party

Features and Capabilities

Known Issues

Tags

SoftGRE

RUCKUS

SmartZone

IPMI

Dell

Fleet Manager

ESXI

Hardware

Extreme

NAT

Bhyve

Upgrading

DHCP

Performance Improvements

DNS

Licensing

RADIUS

CLI

API

Configuration Templates

SD-WAN

IDV

NVIDIA

IPv6

Site Surveys

OpenWiFi