rXg Knowledge Base

Onboarding VLAN Conflict Causing Captive Portal Bypass

April 08, 2026

Subsystem: Captive Portal, PF Firewall, VLANs, RADIUS Severity: High -- Intermittent portal bypass for new onboarding users; hard to diagnose; can affect 5-10% of traffic

Keywords: portal bypass, VLAN conflict, open SSID, members_ACC, no rdr exemption, pf rule, onboarding IP, residential MAC, portal not loading, CNA missing, semi-static cluster, PF anchor


Problem / Question

  • New onboarding users on the captive portal SSID do not see the CNA (Captive Network Assistant) popup
  • Test devices browse directly to external sites instead of being redirected to the portal
  • Issue is intermittent -- some devices get portal, others don't; restarting rxgd sometimes helps temporarily
  • curl http://neverssl.com from an onboarding device returns the external page (not a 302 redirect)
  • Admin UI shows zero-byte login sessions for the affected devices

Environment

  • Deployment type: Semi-static cluster with 5+ nodes (e.g., 7-node Bhyve cluster)
  • Onboarding architecture: One dedicated node with disable_pf_triggers=false and dynamic PF rdr rules; all other residential nodes have disable_pf_triggers=true
  • WLANs: Open (unencrypted) onboarding SSID with auth=none, encryption=none
  • RADIUS: Separate realm for onboarding and residential accounts; same device MACs may appear in both realms
  • Does NOT affect: Standalone single-node deployments; DPU-based deployments; sites using encrypted or MAC-auth onboarding SSIDs

Root Cause

When an open onboarding SSID allows previously authenticated residential devices to reconnect, they:

  1. Get DHCP leases on the onboarding VLAN with onboarding IPs
  2. While simultaneously having active leases on their residential VLANs
  3. The expire_sessions background process detects this MAC on two VLANs simultaneously and logs it as a VLAN conflict
  4. As part of conflict handling, expire_sessions adds the onboarding IP to the residential account group's PF members table (e.g., members_ACC5)
  5. The PF anchor hierarchy contains a no rdr rule: no rdr from <members_ACC5> to any
  6. This rule evaluates BEFORE the onboarding rdr redirect rules
  7. When a new onboarding device gets assigned a polluted onboarding IP, it matches the no rdr rule and is exempted from captive portal redirect

Affected code/config: - rxgd/bin/expire_sessions:388-395 -- VLAN conflict detection - rxgd/Rxg/App/Pfctl.pm:604 -- PF rule loading and anchor refresh logic - console/app/models/pf_rule.rb -- PF rule generation; no rdr rule ordering


PART 1: DIAGNOSIS

Step 1: Check if PF rdr rules are loaded and evaluating

Command: bash pfctl -a "rGRP/r21IP28" -vsn | grep rdr

What to look for: - Presence of rdr rules for onboarding HTTP/HTTPS - If output is empty, the anchor was never loaded


Step 2: Check if onboarding IPs are in the residential account group's PF table

Command: bash pfctl -t members_ACC5 -T show

What to look for: - Presence of onboarding range IPs (e.g., 10.20.x.x) in the table -- these are residential devices stuck on the onboarding VLAN

Problem state: 10.20.5.156 10.20.10.161 10.20.11.88 10.20.12.151

Normal state: ```

Output is empty, or contains only residential IPs -- NOT onboarding range


---

## Step 3: Check VLAN conflict log volume

**Command:**
```bash
grep -c "ignoring IP.*behind another cluster node with a more recent VLAN" /var/log/rxgd.log

What to look for: - > 10K entries in 24 hours indicates systemic pollution


Step 4: Check PF state table for affected devices

Command: bash pfctl -s state | grep "10\.20\." | head -20

Problem state (bypass -- direct external access): tcp 10.20.12.176:54321 <-> 93.184.216.34:80 ESTABLISHED:ESTABLISHED

Normal state (redirection working): tcp 10.20.12.176:54321 <-> 127.0.0.1:80 ESTABLISHED:ESTABLISHED


PART 2: IMMEDIATE REMEDIATION

Flush the polluted PF table to unblock new devices; this is temporary and must be combined with monitoring.

Step 1: Flush the polluted residential account group PF table

pfctl -t members_ACC5 -T flush

Step 2: Kill stale PF states from the onboarding VLAN

pfctl -k 10.20.0.0/16

Step 3: Reload rxgd to restore clean PF tables

service rxgd restart

Step 4: Test portal with a new device

Connect a test device to the onboarding SSID and browse to any HTTP site. Expected: 302 redirect, then portal splash page loads.


PART 3: PERMANENT PROTECTION

Step 1: Install a 5-minute auto-flush cron job

On the onboarding node, add to root's crontab: ```bash crontab -e

Add this line (adjust table name and IP range):

*/5 * * * * /sbin/pfctl -t members_ACC5 -T flush && /sbin/pfctl -k 10.20.0.0/16 ```


Step 2: Reduce VLAN conflict rate (medium-term)

Option A: Lower DHCP lease time on onboarding pool

In rXg Admin UI, navigate to Network > DHCP > Pools > [Onboarding VLAN]: - Change "Lease Time" from 1800s (30 min) to 600s (10 min) or 300s (5 min)


Step 3: Permanent fix -- Enable MAC-auth on onboarding SSID (long-term)

On the wireless controller: 1. Navigate to WLANs > [Onboarding SSID] 2. Change Security > Authentication from Open to MAC (Open with MAC-auth) 3. Bind the Onboarding RADIUS server 4. Deploy to all APs

What this does: - Devices with registered MACs (residential accounts) are rejected at association time - Only truly new devices are allowed on the SSID - No more VLAN conflicts from dual-connected residential devices


MONITORING COMMANDS REFERENCE

VLAN Conflict Rate: bash grep "ignoring IP.*behind another cluster node" /var/log/rxgd.log | wc -l tail -f /var/log/rxgd.log | grep "behind another cluster node"

Onboarding IP Pollution: bash pfctl -t members_ACC5 -T show | grep "10\.20\." | wc -l

Portal Redirect Activity: bash pfctl -a "rGRP/r21IP28" -vsn | grep "state creations"

PF State Table (onboarding VLAN): bash pfctl -s state | grep "10\.20\." | wc -l


PREVENTION CHECKLIST

  • [ ] Confirm open SSID is not mission-critical -- if possible, require MAC-auth
  • [ ] Check cron job is installed and running
  • [ ] Monitor VLAN conflict log volume daily -- alert if rate exceeds 100/hour
  • [ ] Verify PF table stays clean -- pfctl -t members_ACC5 -T show | wc -l should stay < 5
  • [ ] Test portal with new devices weekly
  • [ ] Track zero-byte login sessions
  • [ ] Schedule permanent MAC-auth rollout on onboarding SSID

QUICK REFERENCE CARD

DIAGNOSE:

pfctl -a "rGRP/r21IP28" -vsn | grep "state creations"
pfctl -t members_ACC5 -T show | grep "10\.20\."
grep -c "behind another cluster node" /var/log/rxgd.log

FIX:

pfctl -t members_ACC5 -T flush
pfctl -k 10.20.0.0/16
service rxgd restart
echo "*/5 * * * * /sbin/pfctl -t members_ACC5 -T flush && /sbin/pfctl -k 10.20.0.0/16" | crontab -

VERIFY:

pfctl -t members_ACC5 -T show | wc -l  # Should output 0
pfctl -a "rGRP/r21IP28" -vsn | grep "rdr" | head -3
curl -v http://neverssl.com 2>&1 | grep -E "HTTP/|Location"  # Should show 302

Cookies help us deliver our services. By using our services, you agree to our use of cookies.