Onboarding VLAN Conflict Causing Captive Portal Bypass
April 08, 2026
Subsystem: Captive Portal, PF Firewall, VLANs, RADIUS Severity: High -- Intermittent portal bypass for new onboarding users; hard to diagnose; can affect 5-10% of traffic
Keywords: portal bypass, VLAN conflict, open SSID, members_ACC, no rdr exemption, pf rule, onboarding IP, residential MAC, portal not loading, CNA missing, semi-static cluster, PF anchor
Problem / Question
- New onboarding users on the captive portal SSID do not see the CNA (Captive Network Assistant) popup
- Test devices browse directly to external sites instead of being redirected to the portal
- Issue is intermittent -- some devices get portal, others don't; restarting rxgd sometimes helps temporarily
curl http://neverssl.comfrom an onboarding device returns the external page (not a 302 redirect)- Admin UI shows zero-byte login sessions for the affected devices
Environment
- Deployment type: Semi-static cluster with 5+ nodes (e.g., 7-node Bhyve cluster)
- Onboarding architecture: One dedicated node with
disable_pf_triggers=falseand dynamic PF rdr rules; all other residential nodes havedisable_pf_triggers=true - WLANs: Open (unencrypted) onboarding SSID with auth=none, encryption=none
- RADIUS: Separate realm for onboarding and residential accounts; same device MACs may appear in both realms
- Does NOT affect: Standalone single-node deployments; DPU-based deployments; sites using encrypted or MAC-auth onboarding SSIDs
Root Cause
When an open onboarding SSID allows previously authenticated residential devices to reconnect, they:
- Get DHCP leases on the onboarding VLAN with onboarding IPs
- While simultaneously having active leases on their residential VLANs
- The expire_sessions background process detects this MAC on two VLANs simultaneously and logs it as a VLAN conflict
- As part of conflict handling,
expire_sessionsadds the onboarding IP to the residential account group's PF members table (e.g.,members_ACC5) - The PF anchor hierarchy contains a
no rdrrule:no rdr from <members_ACC5> to any - This rule evaluates BEFORE the onboarding rdr redirect rules
- When a new onboarding device gets assigned a polluted onboarding IP, it matches the
no rdrrule and is exempted from captive portal redirect
Affected code/config:
- rxgd/bin/expire_sessions:388-395 -- VLAN conflict detection
- rxgd/Rxg/App/Pfctl.pm:604 -- PF rule loading and anchor refresh logic
- console/app/models/pf_rule.rb -- PF rule generation; no rdr rule ordering
PART 1: DIAGNOSIS
Step 1: Check if PF rdr rules are loaded and evaluating
Command:
bash
pfctl -a "rGRP/r21IP28" -vsn | grep rdr
What to look for:
- Presence of rdr rules for onboarding HTTP/HTTPS
- If output is empty, the anchor was never loaded
Step 2: Check if onboarding IPs are in the residential account group's PF table
Command:
bash
pfctl -t members_ACC5 -T show
What to look for: - Presence of onboarding range IPs (e.g., 10.20.x.x) in the table -- these are residential devices stuck on the onboarding VLAN
Problem state:
10.20.5.156
10.20.10.161
10.20.11.88
10.20.12.151
Normal state: ```
Output is empty, or contains only residential IPs -- NOT onboarding range
---
## Step 3: Check VLAN conflict log volume
**Command:**
```bash
grep -c "ignoring IP.*behind another cluster node with a more recent VLAN" /var/log/rxgd.log
What to look for: - > 10K entries in 24 hours indicates systemic pollution
Step 4: Check PF state table for affected devices
Command:
bash
pfctl -s state | grep "10\.20\." | head -20
Problem state (bypass -- direct external access):
tcp 10.20.12.176:54321 <-> 93.184.216.34:80 ESTABLISHED:ESTABLISHED
Normal state (redirection working):
tcp 10.20.12.176:54321 <-> 127.0.0.1:80 ESTABLISHED:ESTABLISHED
PART 2: IMMEDIATE REMEDIATION
Flush the polluted PF table to unblock new devices; this is temporary and must be combined with monitoring.
Step 1: Flush the polluted residential account group PF table
pfctl -t members_ACC5 -T flush
Step 2: Kill stale PF states from the onboarding VLAN
pfctl -k 10.20.0.0/16
Step 3: Reload rxgd to restore clean PF tables
service rxgd restart
Step 4: Test portal with a new device
Connect a test device to the onboarding SSID and browse to any HTTP site. Expected: 302 redirect, then portal splash page loads.
PART 3: PERMANENT PROTECTION
Step 1: Install a 5-minute auto-flush cron job
On the onboarding node, add to root's crontab: ```bash crontab -e
Add this line (adjust table name and IP range):
*/5 * * * * /sbin/pfctl -t members_ACC5 -T flush && /sbin/pfctl -k 10.20.0.0/16 ```
Step 2: Reduce VLAN conflict rate (medium-term)
Option A: Lower DHCP lease time on onboarding pool
In rXg Admin UI, navigate to Network > DHCP > Pools > [Onboarding VLAN]: - Change "Lease Time" from 1800s (30 min) to 600s (10 min) or 300s (5 min)
Step 3: Permanent fix -- Enable MAC-auth on onboarding SSID (long-term)
On the wireless controller: 1. Navigate to WLANs > [Onboarding SSID] 2. Change Security > Authentication from Open to MAC (Open with MAC-auth) 3. Bind the Onboarding RADIUS server 4. Deploy to all APs
What this does: - Devices with registered MACs (residential accounts) are rejected at association time - Only truly new devices are allowed on the SSID - No more VLAN conflicts from dual-connected residential devices
MONITORING COMMANDS REFERENCE
VLAN Conflict Rate:
bash
grep "ignoring IP.*behind another cluster node" /var/log/rxgd.log | wc -l
tail -f /var/log/rxgd.log | grep "behind another cluster node"
Onboarding IP Pollution:
bash
pfctl -t members_ACC5 -T show | grep "10\.20\." | wc -l
Portal Redirect Activity:
bash
pfctl -a "rGRP/r21IP28" -vsn | grep "state creations"
PF State Table (onboarding VLAN):
bash
pfctl -s state | grep "10\.20\." | wc -l
PREVENTION CHECKLIST
- [ ] Confirm open SSID is not mission-critical -- if possible, require MAC-auth
- [ ] Check cron job is installed and running
- [ ] Monitor VLAN conflict log volume daily -- alert if rate exceeds 100/hour
- [ ] Verify PF table stays clean --
pfctl -t members_ACC5 -T show | wc -lshould stay < 5 - [ ] Test portal with new devices weekly
- [ ] Track zero-byte login sessions
- [ ] Schedule permanent MAC-auth rollout on onboarding SSID
QUICK REFERENCE CARD
DIAGNOSE:
pfctl -a "rGRP/r21IP28" -vsn | grep "state creations"
pfctl -t members_ACC5 -T show | grep "10\.20\."
grep -c "behind another cluster node" /var/log/rxgd.log
FIX:
pfctl -t members_ACC5 -T flush
pfctl -k 10.20.0.0/16
service rxgd restart
echo "*/5 * * * * /sbin/pfctl -t members_ACC5 -T flush && /sbin/pfctl -k 10.20.0.0/16" | crontab -
VERIFY:
pfctl -t members_ACC5 -T show | wc -l # Should output 0
pfctl -a "rGRP/r21IP28" -vsn | grep "rdr" | head -3
curl -v http://neverssl.com 2>&1 | grep -E "HTTP/|Location" # Should show 302