TAC-grade reference · v6
Network & Security
Troubleshooting Commands
Troubleshooting Commands
IOS · IOS-XE · NX-OS · ACI · SD-WAN · FTD · FMC · Palo Alto · F5 BIG-IP · Kemp — CCNA to CCIE/TAC-level debugging.
0
Commands
18
Categories
0
CCIE/TAC
23
Scenarios
26
TAC Cases
10
Platforms
Browse by category
Pinned — must-know commands
Real-world scenarios
Production incidents with TAC-style methodology, commands, and root cause analysis.
Cisco TAC Scenarios
26 enterprise-grade production troubleshooting cases — ACI, FTD, Palo Alto, ISE, SD-WAN, BGP, MPLS, STP, AWS, F5, Multicast, UCS. Think like TAC.
Cheatsheet
IOS · IOS-XE · NX-OS · ACI · SD-WAN · FTD · FMC · Palo Alto
About Network Security CMD
TAC-grade reference for network and security engineers — CCNA through CCIE/TAC-level debugging across IOS, IOS-XE, NX-OS, ACI, SD-WAN, Cisco FTD/FMC, UCS, Multicast, Palo Alto, F5 BIG-IP, and Kemp LoadMaster. Built for production RCA under pressure.
0
Commands
0
CCIE/TAC
26
TAC Cases
Troubleshooting methodology
›OSI bottom-up always — confirm L1 before chasing L3.
›On FTD:
show asp drop and packet-tracer before any packet capture.›On Palo Alto:
show session all filter beats a packet capture 80% of the time.›SD-WAN: always verify control plane (OMP) before data plane (BFD).
›Scope all Cisco debugs with an ACL. Run
no debug all immediately after.›Document every change — the "fix" 3 weeks ago is the root cause today.
›On F5 BIG-IP: start with
tmsh show /sys crypto cert before chasing SSL handshake failures — expired cert is the #1 cause.›On Kemp LoadMaster: set HA health check interval to 5–10 s in production. Default 60 s means a 3-minute outage before failover triggers.
Scenario Labs
Hands-on reference labs with full configurations, topology diagrams, and step-by-step deployment guides. Click a lab to open it.
Advanced
NX-OS · BGP · EVPN
VXLAN BGP EVPN DCI Lab
Full dual-DC VXLAN fabric with BGP EVPN control plane, L3VNI, VRF-lite, anycast gateway, and DCI stretch scenarios.
VXLAN
BGP EVPN
NX-OS
DCI
Open lab →
Coming Soon
More Labs
Additional labs for BGP, SD-WAN, FTD, Palo Alto, F5 BIG-IP, and more will be added here.
VXLAN BGP EVPN DCI Lab
BGP ESTABLISHED
NVE PEER 22.22.22.22 UP
VNIs 10010 · 10020 · 10030 · 50001
eBGP Multihop 3 · OSPF Area 0
L3VNI 50001 · VRF TENANT-A
All Systems Operational · AS 65001 AS 65002
Topology
EVE-NG Setup
DC1 Config
DC2 Config
Lab Tasks
Production
Stretch Examples
⬡ Lab Topology Diagram
DC1 — vxlan-switch-DC1
AS 65001
Lo0: 1.1.1.1/32 (BGP RID)
Lo1: 11.11.11.11/32 (VTEP)
Eth1/1: 10.0.0.1/30
NX-OS 9.3(3) · N9K
DC2 — vxlan-switch-DC2
AS 65002
Lo0: 2.2.2.2/32 (BGP RID)
Lo1: 22.22.22.22/32 (VTEP)
Eth1/1: 10.0.0.2/30
NX-OS 9.3(3) · N9K
VNI Assignments
L2VNI 10010 → VLAN 10
L2VNI 10020 → VLAN 20 (DC1)
L2VNI 10030 → VLAN 30 (DC2)
L3VNI 50001 → VRF TENANT-A
VLAN 999 → L3VNI SVI
Key Fixes Applied
send-community extended
Static RT (1:VNI) — eBGP
VLAN 999 SVI for L3VNI
Anycast GW: 192.168.10.1
route-map: no match stmt
⚙ EVE-NG Build Steps
Recommended image: Cisco Nexus 9K (N9K-C93240YC or similar) on NX-OS 9.3(3)+. Upload the image to EVE-NG via
/opt/unetlab/addons/qemu/ before starting.RAM note: Each Nexus 9K node needs ~4–6 GB minimum. Make sure your EVE-NG server has at least 12 GB free before booting both nodes.
1
Create a New Lab
Log into EVE-NG web UI → click
Add New Lab → name it VXLAN-DCI-Lab. Add a description and click Save.2
Add vxlan-switch-DC1
Right-click canvas →
· Name:
Add Node → select Cisco Nexus 9K· Name:
vxlan-switch-DC1 · RAM: 4096 MB · Ethernet interfaces: 4
3
Add vxlan-switch-DC2
Repeat step 2 → Name:
vxlan-switch-DC2 · Same image, same RAM and interface count.
4
Add 4× VPC Host Nodes
Right-click →
·
·
·
·
Add Node → Virtual PC (VPCS)·
Host-A — DC1, VLAN 10 — 192.168.10.10·
Host-B — DC1, VLAN 20 — 192.168.20.10·
Host-C — DC2, VLAN 10 — 192.168.10.20·
Host-D — DC2, VLAN 30 — 192.168.30.10
5
Wire the Topology
Hover a node → drag connector to link:
·
·
·
·
·
·
DC1 Eth1/1 ↔ DC2 Eth1/1 (DCI Link)·
DC1 Eth1/2 ↔ Host-A e0·
DC1 Eth1/3 ↔ Host-B e0·
DC2 Eth1/2 ↔ Host-C e0·
DC2 Eth1/3 ↔ Host-D e0
6
Add Management Cloud (Optional)
Right-click →
Connect
Add Network → type Management (Cloud0).Connect
VTEP-1 Gi4 and VTEP-2 Gi4 to it for SSH access from your laptop.
7
Start All Nodes
Select all (Ctrl+A) → right-click →
Start. Wait 2–3 minutes for CSR1000v to fully boot. Console icon turns green when ready.
8
Configure VPC Host IPs
Open each VPC console and set IP + default gateway:
·
·
·
·
·
Host-A: ip 192.168.10.10 255.255.255.0 192.168.10.1·
Host-B: ip 192.168.20.10 255.255.255.0 192.168.20.1·
Host-C: ip 192.168.10.20 255.255.255.0 192.168.10.1·
Host-D: ip 192.168.30.10 255.255.255.0 192.168.30.1
Interface note: On CSR1000v in EVE-NG, interfaces begin at GigabitEthernet1, not 0. The DC1/DC2 config tabs use Gi1, Gi2, Gi3 accordingly.
◈ vxlan-switch-DC1 — Full NX-OS Configuration
Platform: Nexus 9K · NX-OS 9.3(3). Apply each block in order. Two loopbacks are required — Lo0 for BGP RID, Lo1 as VTEP source.
01Enable Required Features
nv overlay evpn feature ospf feature bgp feature fabric forwarding feature interface-vlan feature vn-segment-vlan-based feature nv overlay
02Anycast Gateway MAC
! Must be identical on both VTEPs fabric forwarding anycast-gateway-mac 0001.0001.0001
03VLANs with VN-Segment Bindings
! ── Stretched VLAN ──────────────────────────────────────── vlan 10 name STRETCHED-VLAN-10 vn-segment 10010 ! ── DC1 local VLAN ──────────────────────────────────────── vlan 20 name DC1-LOCAL-VLAN-20 vn-segment 10020 ! ── L3VNI dedicated VLAN — required for VRF routing ────── vlan 999 name L3VNI-TENANT-A vn-segment 50001
04VRF TENANT-A Definition
vrf context TENANT-A vni 50001 rd auto address-family ipv4 unicast ! Static RTs required for eBGP DCI (auto RT uses ASN — won't match peer) route-target import 1:50001 route-target import 1:50001 evpn route-target import 65002:50001 route-target export 1:50001 route-target export 1:50001 evpn route-target export 65001:50001
05SVIs (Anycast Gateways)
! ── VLAN 10 — identical IP on both VTEPs (anycast) ─────── interface Vlan10 description TENANT-A-VLAN10-GW no shutdown vrf member TENANT-A ip address 192.168.10.1/24 fabric forwarding mode anycast-gateway ! ── VLAN 20 — DC1 local subnet ─────────────────────────── interface Vlan20 description TENANT-A-VLAN20-GW no shutdown vrf member TENANT-A ip address 192.168.20.1/24 fabric forwarding mode anycast-gateway ! ── VLAN 999 — L3VNI SVI — no IP required ──────────────── interface Vlan999 no shutdown vrf member TENANT-A
06Underlay Interfaces & Loopbacks
! ── DCI uplink to DC2 ───────────────────────────────────── interface Ethernet1/1 description DCI-TO-vxlan-switch-DC2 mtu 9216 ip address 10.0.0.1/30 ip router ospf 10 area 0.0.0.0 no shutdown ! ── BGP Router-ID loopback ──────────────────────────────── interface loopback0 description BGP-ROUTER-ID ip address 1.1.1.1/32 ip router ospf 10 area 0.0.0.0 ! ── VTEP source loopback ────────────────────────────────── interface loopback1 description VTEP-SOURCE ip address 11.11.11.11/32 ip router ospf 10 area 0.0.0.0
07Host-Facing Access Ports
interface Ethernet1/2 description Host-A — VLAN 10 switchport switchport access vlan 10 spanning-tree port type edge no shutdown interface Ethernet1/3 description Host-B — VLAN 20 switchport switchport access vlan 20 spanning-tree port type edge no shutdown
08OSPF Underlay
router ospf 10 router-id 1.1.1.1
09NVE Interface (VTEP)
interface nve1 no shutdown host-reachability protocol bgp source-interface loopback1 ! ── L2VNI — VLAN 10 stretched ──────────────────────────── member vni 10010 ingress-replication protocol bgp ! ── L2VNI — VLAN 20 DC1 local ─────────────────────────── member vni 10020 ingress-replication protocol bgp ! ── L3VNI — VRF TENANT-A ───────────────────────────────── member vni 50001 associate-vrf
10Route-Map for Connected Redistribution
! No match statement — permit all connected routes into BGP ! "match route-type local" was the original bug — it only matched /32s route-map RM-CONNECTED permit 10
11BGP EVPN Control Plane
router bgp 65001 router-id 1.1.1.1 address-family l2vpn evpn retain route-target all ! ── eBGP peer toward DC2 ────────────────────────────────── neighbor 2.2.2.2 remote-as 65002 update-source loopback0 ebgp-multihop 3 address-family l2vpn evpn ! Critical — without this, EVPN updates are silently dropped send-community extended ! ── VRF TENANT-A L3 redistribution ─────────────────────── vrf TENANT-A address-family ipv4 unicast advertise l2vpn evpn redistribute direct route-map RM-CONNECTED
12EVPN L2VNI Instances
evpn vni 10010 l2 rd auto ! Static RT import — required for eBGP cross-AS route acceptance route-target import auto route-target import 65002:10010 route-target export auto route-target export 65001:10010 vni 10020 l2 rd auto route-target import auto route-target export auto route-target export 65001:10020
13Verify & Save
! ── Validation commands ─────────────────────────────────── show ip ospf neighbor show bgp l2vpn evpn summary show nve peers show nve vni show ip route vrf TENANT-A ping 22.22.22.22 source-interface loopback1 ! NX-OS ping syntax — use source-interface, NOT source loopback1 ! Alt: ping 22.22.22.22 source 11.11.11.11 ! ── Save ────────────────────────────────────────────────── copy running-config startup-config
◈ vxlan-switch-DC2 — Full NX-OS Configuration
Mirror of DC1 — different IPs, AS, VLANs. VLAN 10 anycast gateway IP must be identical to DC1 (192.168.10.1). Apply in the same order.
01Enable Required Features
nv overlay evpn feature ospf feature bgp feature fabric forwarding feature interface-vlan feature vn-segment-vlan-based feature nv overlay
02Anycast Gateway MAC
! Must be identical on both VTEPs fabric forwarding anycast-gateway-mac 0001.0001.0001
03VLANs with VN-Segment Bindings
! ── Stretched VLAN ──────────────────────────────────────── vlan 10 name STRETCHED-VLAN-10 vn-segment 10010 ! ── DC2 local VLAN ──────────────────────────────────────── vlan 30 name DC2-LOCAL-VLAN-30 vn-segment 10030 ! ── L3VNI dedicated VLAN — required for VRF routing ────── vlan 999 name L3VNI-TENANT-A vn-segment 50001
04VRF TENANT-A Definition
vrf context TENANT-A vni 50001 rd auto address-family ipv4 unicast route-target import 1:50001 route-target import 1:50001 evpn route-target import 65001:50001 route-target export 1:50001 route-target export 1:50001 evpn route-target export 65002:50001
05SVIs (Anycast Gateways)
! ── VLAN 10 — MUST match DC1 IP exactly (anycast) ──────── interface Vlan10 description TENANT-A-VLAN10-GW no shutdown vrf member TENANT-A ip address 192.168.10.1/24 ! 192.168.10.1 — NOT .2 — anycast gateway must be identical on both VTEPs fabric forwarding mode anycast-gateway ! ── VLAN 30 — DC2 local subnet ─────────────────────────── interface Vlan30 description TENANT-A-VLAN30-GW no shutdown vrf member TENANT-A ip address 192.168.30.1/24 fabric forwarding mode anycast-gateway ! ── VLAN 999 — L3VNI SVI — no IP required ──────────────── interface Vlan999 no shutdown vrf member TENANT-A
06Underlay Interfaces & Loopbacks
! ── DCI uplink to DC1 ───────────────────────────────────── interface Ethernet1/1 description DCI-TO-vxlan-switch-DC1 mtu 9216 ip address 10.0.0.2/30 ip router ospf 10 area 0.0.0.0 no shutdown ! ── BGP Router-ID loopback ──────────────────────────────── interface loopback0 description BGP-ROUTER-ID ip address 2.2.2.2/32 ip router ospf 10 area 0.0.0.0 ! ── VTEP source loopback ────────────────────────────────── interface loopback1 description VTEP-SOURCE ip address 22.22.22.22/32 ip router ospf 10 area 0.0.0.0
07Host-Facing Access Ports
interface Ethernet1/2 description Host-C — VLAN 10 switchport switchport access vlan 10 spanning-tree port type edge no shutdown interface Ethernet1/3 description Host-D — VLAN 30 switchport switchport access vlan 30 spanning-tree port type edge no shutdown
08OSPF Underlay
router ospf 10 router-id 2.2.2.2
09NVE Interface (VTEP)
interface nve1 no shutdown host-reachability protocol bgp source-interface loopback1 ! ── L2VNI — VLAN 10 stretched ──────────────────────────── member vni 10010 ingress-replication protocol bgp ! ── L2VNI — VLAN 30 DC2 local ─────────────────────────── member vni 10030 ingress-replication protocol bgp ! ── L3VNI — VRF TENANT-A ───────────────────────────────── member vni 50001 associate-vrf
10Route-Map for Connected Redistribution
! No match statement — permit all connected routes into BGP route-map RM-CONNECTED permit 10
11BGP EVPN Control Plane
router bgp 65002 router-id 2.2.2.2 address-family l2vpn evpn retain route-target all ! ── eBGP peer toward DC1 ────────────────────────────────── neighbor 1.1.1.1 remote-as 65001 update-source loopback0 ebgp-multihop 3 address-family l2vpn evpn send-community extended ! ── VRF TENANT-A L3 redistribution ─────────────────────── vrf TENANT-A address-family ipv4 unicast advertise l2vpn evpn redistribute direct route-map RM-CONNECTED
12EVPN L2VNI Instances
evpn vni 10010 l2 rd auto route-target import auto route-target import 65001:10010 route-target export auto route-target export 65002:10010 vni 10030 l2 rd auto route-target import auto route-target export auto route-target export 65002:10030
13Verify & Save
! ── Validation commands ─────────────────────────────────── show ip ospf neighbor show bgp l2vpn evpn summary show nve peers show nve vni show ip route vrf TENANT-A ping 11.11.11.11 source-interface loopback1 ! Alt syntax: ping 11.11.11.11 source 22.22.22.22 ! ── Save ────────────────────────────────────────────────── copy running-config startup-config
✓ Lab Validation Tasks
0 / 12 completed
Click each task to mark it complete as you work through the lab. Complete the phases in order — each depends on the previous.
Phase 1 — Underlay Verification
✓
UNDERLAYPing 10.0.0.2 from VTEP-1 — confirm dark fiber link is up
✓
UNDERLAYPing 2.2.2.2 source Loopback0 from VTEP-1 — confirm OSPF loopback reachability
✓
UNDERLAYRun show ip ospf neighbor — confirm FULL adjacency state
Phase 2 — BGP EVPN Control Plane
✓
BGPRun show bgp l2vpn evpn summary — confirm session is Established
✓
BGPRun show nve peers — confirm remote VTEP peer shows UP
✓
BGPRun show l2vpn evpn vni — confirm VNI 10010 and 50001 are active
Phase 3 — L2 Stretch Validation
✓
L2Ping Host-C 192.168.10.20 from Host-A 192.168.10.10 — same VLAN across DCs
✓
L2Run show l2vpn evpn mac on VTEP-1 — confirm Host-C MAC is learned remotely from DC2
✓
L2Run show bgp l2vpn evpn route-type 2 — confirm Type-2 MAC/IP routes are exchanged
Phase 4 — L3 VRF Stretch Validation
✓
L3Ping Host-D 192.168.30.10 from Host-B 192.168.20.10 — cross-DC inter-subnet routing via VRF
✓
L3Run show ip route vrf TENANT-A on VTEP-1 — confirm 192.168.30.0/24 learned via EVPN
✓
L3Run show bgp l2vpn evpn route-type 5 — confirm Type-5 IP prefix routes are present
⬆ Taking This Lab to Production
What you built in the lab is a functional 2-node VXLAN BGP EVPN DCI with a single OSPF underlay, direct eBGP peering, and one tenant VRF. Production deployments require hardening every layer: underlay redundancy, BGP security, scalable route reflection, tenant isolation, monitoring, and change control. Work through each phase below in order.
01 Underlay — Redundancy & Resilience
CRITICAL Replace single DCI link with redundant paths
A single
Eth1/1 link is a hard single point of failure. In production, use two independent physical paths (diverse fiber routes, separate providers). Bundle them as a port-channel or use dual separate OSPF-equal-cost paths. Configure ip ecmp load-sharing on the Nexus for both links to carry traffic simultaneously. Verify with show ip route ospf — you should see two equal-cost next-hops per VTEP loopback.
CRITICAL Set MTU consistently across the entire DCI path
VXLAN adds 50 bytes of overhead (14 Ethernet + 8 UDP + 8 VXLAN + 20 IP outer). Every device in the path — switches, routers, provider handoff equipment — must support jumbo frames. Set
mtu 9216 on all DCI-facing interfaces end to end. Validate with: ping <remote-VTEP> source-interface loopback1 df-bit packet-size 8972. Fragmentation here causes silent, intermittent drops that are extremely hard to debug.
HIGH Use BFD for fast failure detection on DCI links
Default OSPF dead intervals (40 seconds) are too slow for production. Enable BFD on the DCI interface and tie it to OSPF for sub-second failure detection:
interface Ethernet1/1 → bfd interval 300 min_rx 300 multiplier 3 and router ospf 10 → bfd. This reduces convergence from 40s to under 1 second when the DCI link fails.
MEDIUM Separate underlay and overlay loopbacks — already done ✓
You already have Lo0 (BGP RID) and Lo1 (VTEP source) separated. This is correct production practice. In some designs a third loopback is used for management. Keep them logically separated — never reuse the VTEP loopback as a management address.
02 BGP — Security & Scalability
CRITICAL Add BGP MD5 authentication on the EVPN peer
The lab has no BGP password. In production, all BGP sessions must be authenticated. On NX-OS: under the neighbor stanza add
password 3 <encrypted-key>. Use type-3 or type-7 encryption minimum; avoid plaintext type-0 in configs. Rotate keys periodically and store them in your secrets manager, not in the config file itself.
CRITICAL Deploy a Route Reflector for scale — don't use direct eBGP for large fabrics
Direct eBGP peering between two VTEPs works fine for a 2-node lab. In production with 10+ VTEPs per DC, every VTEP would need a full mesh of iBGP sessions — that scales as O(n²). Instead, deploy dedicated RR nodes (or use spine switches as RRs) per DC, and have each VTEP peer only with the RRs. DCI then runs eBGP only between the RRs of each DC, not between every VTEP. This is the standard scalable design for any fabric larger than 4 nodes.
HIGH Implement BGP prefix limits per neighbor
A misconfigured peer or route leak can flood the RIB with thousands of unexpected prefixes, causing memory exhaustion and BGP instability. Configure:
neighbor <ip> maximum-prefix <N> <threshold%> warning-only first to baseline normal counts, then enforce hard limits. In a 2-DC lab with 4 subnets, 50 prefixes is a safe hard limit. In production, set it to 2–3× your expected maximum.
HIGH Replace auto Route Targets with a documented RT allocation scheme
You already moved to static RTs (
1:50001, 65001:10010 etc.) to fix the eBGP mismatch — that's correct. In production, formalize this into a RT allocation policy. A common scheme: <DC-ASN>:<VNI> for export, and explicitly import the peer DC's ASN-based RT. Document this in a network design doc and enforce it via configuration templates (Ansible/Terraform). This avoids RT collisions as you add more tenants.
MEDIUM Enable BGP graceful restart
Configure
graceful-restart under the BGP process on both switches. This allows the forwarding plane to continue during a BGP control plane restart (e.g., planned NX-OS ISSU upgrade), preventing a full traffic outage when BGP restarts. Pair with graceful-restart-helper on the peer.
03 Multi-Tenancy & VRF Segmentation
CRITICAL Assign a unique L3VNI per tenant VRF — never share
The lab has one VRF (TENANT-A) with L3VNI 50001. In production with multiple tenants, each VRF must get its own unique L3VNI, its own VLAN (like VLAN 999), and its own dedicated SVI. Reusing L3VNIs across VRFs causes cross-tenant traffic leakage — a critical security incident. Maintain a VNI allocation table (a spreadsheet or IPAM tool) to track assignments: VNI range, associated VRF, tenant name, DC, and owner.
HIGH Apply VRF-level ACLs or firewall zoning between tenants
VXLAN/EVPN provides L3 separation via VRFs but does not enforce security policy between them. If tenants need internet access or cross-VRF routing (e.g., shared services), route inter-VRF traffic through a firewall. On Nexus, use VRF-aware PBR to redirect inter-tenant traffic to a firewall cluster. Do not use static inter-VRF routes on the switch itself in a multi-tenant environment.
MEDIUM Enable ARP suppression on L2VNIs
In the lab, ARP broadcasts flood across the VXLAN fabric. In production with hundreds of hosts, this generates significant unnecessary traffic. Enable: under each
member vni in NVE, add suppress-arp. The VTEP will respond to ARP requests locally using the EVPN MAC/IP table instead of flooding. This requires EVPN Type-2 routes (MAC+IP) to be populated first — verify with show bgp l2vpn evpn route-type 2.
04 Security Hardening
CRITICAL Lock down management access
The lab uses default
admin with no ACL restrictions. In production: create role-based accounts (no shared admin), restrict SSH access via ip access-list MGMT-ACCESS applied to the vty lines and mgmt0, disable Telnet, and enable AAA with TACACS+ or RADIUS for centralized auth and audit logging. Rotate the local admin password and store it in a privileged access manager (CyberArk, HashiCorp Vault, etc.).
CRITICAL Remove SNMP MD5 community strings — migrate to SNMPv3
The running config shows
snmp-server user admin network-admin auth md5 with the hash in plaintext. In production, use SNMPv3 with auth sha and priv aes128 minimum, restrict to a dedicated SNMP VRF, and limit access to your NMS host IPs only. Remove any SNMPv1/v2c community strings.
HIGH Enable Control Plane Policing (CoPP) — already set to strict ✓
Your config already has
copp profile strict — this is correct. Verify the strict policy is not accidentally allowing excessive OSPF/BGP traffic that could overwhelm the control plane. Review with show policy-map interface control-plane and tune drop counters after a week of baseline traffic.
HIGH Harden the OSPF underlay — use authentication
The lab OSPF has no authentication. A rogue device connected to the DCI link could inject false LSAs and redirect traffic. In production, enable OSPF MD5 authentication on the DCI area:
ip ospf authentication message-digest on the DCI interface and define ip ospf message-digest-key 1 md5 <key>. For higher security, consider OSPFv3 with IPSec or move the underlay to an authenticated IS-IS.
05 Observability & Operations
CRITICAL Configure syslog, SNMP traps, and NetFlow to a central collector
The lab has no logging destination. In production: configure
logging server <SIEM-IP> 6 use-vrf management, enable SNMP traps for BGP state changes (snmp-server enable traps bgp) and NVE peer changes. Configure Flexible NetFlow or sFlow on host-facing ports for traffic visibility inside the VXLAN fabric. Forward to an Elastic/Splunk stack or purpose-built NMS.
HIGH Build a runbook for common failure scenarios
Document step-by-step recovery procedures for: DCI link failure, BGP session drop, NVE peer loss, L3VNI going down, and incorrect RT causing route black-hole. The troubleshooting path you went through in this lab is exactly the runbook content. Include the correct NX-OS commands (e.g., note that
ping X source-interface loopbackN works but ping X source loopbackN does not on NX-OS). Store the runbook in your team wiki and link it from your monitoring alerts.
HIGH Implement configuration management and change control
Use Ansible or Terraform with Cisco NX-OS modules to template and version-control all switch configurations. Store configs in Git. Every change goes through a peer-review MR process. Use NAPALM or Batfish for pre-change config validation — Batfish can simulate route changes before you push them to production. Schedule changes in a maintenance window with a verified rollback plan.
BEST PRACTICE Set up continuous VXLAN fabric health monitoring
Use NX-OS Event Manager (EEM) or streaming telemetry (gRPC/YANG) to alert on: NVE peer count dropping, VNI state changes, BGP prefix count deviations, and NVE TX/RX ucast counters going to zero (as seen in this lab — that was the key indicator the dataplane was broken). Stream to Prometheus + Grafana for dashboarding. Key metrics to track:
show nve peers peer count, show nve vni state, show interface nve1 counters.
06 Upgrade Path from This Lab Config
CHECKLIST Pre-production checklist — verify all before go-live
Underlay: Redundant DCI paths · BFD enabled · MTU 9216 end-to-end · OSPF auth · no single points of failure
Overlay: Static RTs documented and templated · ARP suppression enabled · L3VNI per-tenant · VLAN 999 (or equivalent) on all VTEPs
BGP: MD5 auth on all peers · Prefix limits configured · Graceful restart enabled · RR design (not full mesh) for scale
Security: AAA/TACACS+ · SNMPv3 only · CoPP strict · No default credentials · VRF-aware ACLs on mgmt
Ops: Configs in Git · Syslog/SNMP/telemetry to collector · Runbook written · Change control process · Rollback tested
Validation: End-to-end host pings across all VLANs · VRF route tables verified · NVE peers Up · EVPN Type-2 and Type-5 routes exchanged · Failover tested with DCI link pulled
Overlay: Static RTs documented and templated · ARP suppression enabled · L3VNI per-tenant · VLAN 999 (or equivalent) on all VTEPs
BGP: MD5 auth on all peers · Prefix limits configured · Graceful restart enabled · RR design (not full mesh) for scale
Security: AAA/TACACS+ · SNMPv3 only · CoPP strict · No default credentials · VRF-aware ACLs on mgmt
Ops: Configs in Git · Syslog/SNMP/telemetry to collector · Runbook written · Change control process · Rollback tested
Validation: End-to-end host pings across all VLANs · VRF route tables verified · NVE peers Up · EVPN Type-2 and Type-5 routes exchanged · Failover tested with DCI link pulled
⇔ VXLAN Stretch Scenarios — L2 & L3
Pattern library for common DCI stretch scenarios on this NX-OS lab. Every example builds on the base config already running. For each scenario: add the VNI/VLAN/SVI/NVE/EVPN blocks shown — the underlay, BGP session, and L3VNI (50001) are already in place and don't need to change.
◎ Golden Rules for Any Stretch
Anycast GW IP must be identical on both VTEPs for stretched VLANs. Different IPs = broken return path.
Anycast MAC must be identical — already set to
0001.0001.0001 globally. Don't change per-VLAN.VNI must be unique across the fabric. Maintain a VNI allocation table. Never reuse a VNI for a different VLAN.
Static RTs required for eBGP DCI. Import the peer's ASN-based RT. Auto-RT uses local ASN and won't match cross-AS.
L3VNI (50001) handles inter-subnet routing automatically once subnets are redistributed. No extra config needed per new VLAN.
route-map RM-CONNECTED has no match stmt — all new SVIs are automatically redistributed into BGP EVPN as Type-5 routes.
L2 STRETCH
Example 1 — Stretch a Single VLAN (e.g. VLAN 100)
Extends an existing VLAN across both DCs so hosts in both sites share the same IP subnet and L2 broadcast domain. The anycast gateway on both VTEPs serves as the default gateway — hosts see the same MAC and IP regardless of which DC they're in. Use this for workload mobility, live VM migration, or active-active host placement.
VM live migration
Active-active compute
Shared services VLAN
Cluster heartbeat
DC1 — vxlan-switch-DC1
vlan 100 name STRETCHED-VLAN-100 vn-segment 10100 interface Vlan100 no shutdown vrf member TENANT-A ip address 192.168.100.1/24 fabric forwarding mode anycast-gateway interface nve1 member vni 10100 ingress-replication protocol bgp evpn vni 10100 l2 rd auto route-target import auto route-target import 65002:10100 route-target export auto route-target export 65001:10100 ! Host-facing port (adjust interface as needed) interface Ethernet1/4 switchport switchport access vlan 100 spanning-tree port type edge no shutdown
DC2 — vxlan-switch-DC2
vlan 100 name STRETCHED-VLAN-100 vn-segment 10100 interface Vlan100 no shutdown vrf member TENANT-A ip address 192.168.100.1/24 ! Same IP as DC1 — anycast fabric forwarding mode anycast-gateway interface nve1 member vni 10100 ingress-replication protocol bgp evpn vni 10100 l2 rd auto route-target import auto route-target import 65001:10100 route-target export auto route-target export 65002:10100 ! Host-facing port (adjust interface as needed) interface Ethernet1/4 switchport switchport access vlan 100 spanning-tree port type edge no shutdown
✓ Verify
show nve vni ! nve1 10100 UnicastBGP Up CP L2 [100] show bgp l2vpn evpn route-type 3 ! IMET routes for VNI 10100 from both VTEPs show bgp l2vpn evpn route-type 2 ! MAC/IP routes once hosts are active ping 192.168.100.x vrf TENANT-A ! host-to-host across DCs
L2 STRETCH
Example 2 — Stretch Multiple VLANs in One Change (VLAN 200, 201, 202)
Same pattern as Example 1, applied in bulk. Each VLAN gets its own unique VNI and SVI. All three share the same VRF and L3VNI — inter-subnet routing between them works automatically via the existing L3VNI 50001.
Multi-tier app stretch
Web / App / DB tiers
DR site activation
BOTH SWITCHES — apply same block, swap RT ASN (DC1: export 65001:XXXX import 65002:XXXX / DC2: reverse)
! ── VLANs ───────────────────────────────────────────────── vlan 200 name WEB-TIER vn-segment 10200 vlan 201 name APP-TIER vn-segment 10201 vlan 202 name DB-TIER vn-segment 10202 ! ── SVIs — identical IPs on both VTEPs ──────────────────── interface Vlan200 no shutdown ; vrf member TENANT-A ip address 10.200.0.1/24 fabric forwarding mode anycast-gateway interface Vlan201 no shutdown ; vrf member TENANT-A ip address 10.201.0.1/24 fabric forwarding mode anycast-gateway interface Vlan202 no shutdown ; vrf member TENANT-A ip address 10.202.0.1/24 fabric forwarding mode anycast-gateway ! ── NVE members ─────────────────────────────────────────── interface nve1 member vni 10200 ingress-replication protocol bgp member vni 10201 ingress-replication protocol bgp member vni 10202 ingress-replication protocol bgp ! ── EVPN instances — DC1 shown; DC2: swap 65001↔65002 ──── evpn vni 10200 l2 rd auto route-target import auto ; route-target import 65002:10200 route-target export auto ; route-target export 65001:10200 vni 10201 l2 rd auto route-target import auto ; route-target import 65002:10201 route-target export auto ; route-target export 65001:10201 vni 10202 l2 rd auto route-target import auto ; route-target import 65002:10202 route-target export auto ; route-target export 65001:10202
L3 ONLY
Example 3 — DC-Local VLAN with L3 Reachability Across DCI
A VLAN that exists only in one DC but whose subnet is routable from the other DC via the L3VNI. No L2 extension — hosts in DC2 can reach the DC1-local subnet via Type-5 prefix routes, but the VLAN itself doesn't exist in DC2. This is the pattern already used for VLAN 20 (DC1) and VLAN 30 (DC2) in this lab.
DC-local storage VLANs
Local management subnets
Asymmetric workloads
Backup/replication targets
DC1 ONLY — VLAN 300 local to DC1
vlan 300 name DC1-STORAGE vn-segment 10300 interface Vlan300 no shutdown vrf member TENANT-A ip address 172.16.30.1/24 fabric forwarding mode anycast-gateway interface nve1 member vni 10300 ingress-replication protocol bgp evpn vni 10300 l2 rd auto route-target import auto route-target export auto route-target export 65001:10300 ! ── DC2 does NOT need VLAN 300 or VNI 10300 ! The 172.16.30.0/24 subnet reaches DC2 automatically ! as a Type-5 prefix route via L3VNI 50001 ──────────
DC2 — no VLAN config needed
! Nothing to configure on DC2 for the VLAN itself. ! The Type-5 route for 172.16.30.0/24 is advertised ! by DC1 via BGP EVPN and installed in DC2's ! TENANT-A VRF routing table automatically. ! Verify on DC2: show ip route vrf TENANT-A ! Expect: ! 172.16.30.0/24 via 11.11.11.11 ! tunnelid: 0x0b0b0b0b encap: VXLAN show bgp l2vpn evpn route-type 5 ! Expect Type-5 entry: ! [5]:[0]:[24]:[172.16.30.0]/224 ! via 11.11.11.11 (DC1 VTEP) ! Extcommunity: RT:65001:10300 ! RT:1:50001 ENCAP:8 ! From DC2, ping DC1-local subnet gateway: ping 172.16.30.1 vrf TENANT-A
L2 + L3
Example 4 — Add a Second Tenant VRF (TENANT-B)
Adds a completely isolated second tenant with its own VRF, L3VNI, and VLAN — no traffic can cross between TENANT-A and TENANT-B at the switch level. Each tenant gets its own dedicated L3VNI VLAN (VLAN 998 here) and a separate BGP RT space so routes stay fully isolated.
Multi-tenant environments
Dev/Prod isolation
Compliance segmentation
Separate customer VRFs
BOTH SWITCHES — swap ASN in RTs for DC2 (65001↔65002)
! ── Step 1: L3VNI VLAN for TENANT-B ────────────────────── vlan 998 name L3VNI-TENANT-B vn-segment 50002 ! ── Step 2: VRF definition ──────────────────────────────── vrf context TENANT-B vni 50002 rd auto address-family ipv4 unicast route-target import 2:50002 route-target import 2:50002 evpn route-target import 65002:50002 ! DC1 only — DC2 imports 65001:50002 route-target export 2:50002 route-target export 2:50002 evpn route-target export 65001:50002 ! DC1 only — DC2 exports 65002:50002 ! ── Step 3: L3VNI SVI ───────────────────────────────────── interface Vlan998 no shutdown vrf member TENANT-B ! ── Step 4: TENANT-B VLAN + SVI ────────────────────────── vlan 110 name TENANTB-VLAN110 vn-segment 10110 interface Vlan110 no shutdown vrf member TENANT-B ip address 10.110.0.1/24 fabric forwarding mode anycast-gateway ! ── Step 5: NVE members ─────────────────────────────────── interface nve1 member vni 10110 ingress-replication protocol bgp member vni 50002 associate-vrf ! ── Step 6: EVPN L2 instance ────────────────────────────── evpn vni 10110 l2 rd auto route-target import auto route-target import 65002:10110 route-target export auto route-target export 65001:10110 ! ── Step 7: BGP VRF for TENANT-B redistribution ────────── router bgp 65001 vrf TENANT-B address-family ipv4 unicast advertise l2vpn evpn redistribute direct route-map RM-CONNECTED
✓ Verify
show nve vni ! VNI 10110 Up L2, 50002 Up L3 [TENANT-B] show ip route vrf TENANT-B ! Should see remote subnets via VXLAN ping 10.110.0.1 vrf TENANT-B ! Cross-DC gateway ping ! Confirm isolation — this must FAIL (cross-VRF blocked): ping 192.168.10.1 vrf TENANT-B ! Must timeout — TENANT-A unreachable from TENANT-B
L2 OPTIMISATION
Example 5 — Enable ARP Suppression on Stretched VLANs
By default, ARP broadcasts flood across the VXLAN fabric to all remote VTEPs. ARP suppression tells the local VTEP to answer ARP requests from its own EVPN MAC/IP table instead of flooding — dramatically reducing BUM (Broadcast/Unknown/Multicast) traffic across the DCI link. Requires EVPN Type-2 MAC+IP routes to be populated first.
High host-count VLANs
DCI bandwidth reduction
Noisy broadcast domains
BOTH SWITCHES — add suppress-arp to each L2VNI in NVE
interface nve1 member vni 10010 ingress-replication protocol bgp suppress-arp ! Add this line to each stretched L2VNI member vni 10100 ingress-replication protocol bgp suppress-arp
✓ Verify
show nve vni ! Flags column should now show SA (Suppress ARP) for affected VNIs ! Interface VNI State Mode Type Flags ! nve1 10010 Up CP L2 SA show ip arp suppression-cache detail ! Shows local ARP cache entries being served by the VTEP ! Generate ARP traffic then check NVE counters: show interface nve1 counters ! TX mcast should be significantly lower with ARP suppression enabled
REFERENCE
VNI Allocation Table — This Lab
Keep this table updated as you add VLANs. Never reuse a VNI. Keep L2VNIs and L3VNIs in separate ranges to avoid confusion.
! ── VNI Allocation — vxlan-switch-DC1 / DC2 Lab ──────────
!
! Range Purpose Assigned
! ─────────────────────────────────────────────────────────
! 10010 L2VNI VLAN 10 STRETCHED-VLAN-10 ✓
! 10020 L2VNI VLAN 20 DC1-LOCAL-VLAN-20 ✓
! 10030 L2VNI VLAN 30 DC2-LOCAL-VLAN-30 ✓
! 10100 L2VNI VLAN 100 STRETCHED-VLAN-100 Example 1
! 10110 L2VNI VLAN 110 TENANTB-VLAN110 Example 4
! 10200 L2VNI VLAN 200 WEB-TIER Example 2
! 10201 L2VNI VLAN 201 APP-TIER Example 2
! 10202 L2VNI VLAN 202 DB-TIER Example 2
! 10300 L2VNI VLAN 300 DC1-STORAGE (local) Example 3
!
! 50001 L3VNI VRF TENANT-A ✓
! 50002 L3VNI VRF TENANT-B Example 4
!
! ─────────────────────────────────────────────────────────
! Next available L2VNI: 10301+
! Next available L3VNI: 50003+
! ─────────────────────────────────────────────────────────