Design Methods
Chapter 2 — Core Switch Security Hardening Design Guide
2.1 Design Principles & Methodology
Effective core switch security hardening requires a structured design methodology that balances security controls with operational requirements. The fundamental design principles—Isolate, Minimize, Authenticate, Rate-Limit, Observe, and Rehearse—form the foundation of every hardening decision. These principles are not independent; they reinforce each other to create a defense-in-depth posture that is resilient against both external attacks and internal misconfigurations.
The design methodology follows a four-phase lifecycle: Design (define policies, architecture, and risk assessment), Implement (deploy configurations, apply controls, automate provisioning), Verify (audit configurations, vulnerability scanning, penetration testing), and Monitor (continuous monitoring, threat detection, incident response, performance tracking). This lifecycle is applied to each of the eight hardening domains described in this chapter.
2.2 Eight Hardening Domains
The hardening baseline is organized into eight implementable domains. Each domain addresses a specific attack surface, has defined implementation points, and includes measurable acceptance criteria. The domains are designed to be implemented in sequence, as later domains depend on the foundation established by earlier ones.
Domain 1: Management Access Consolidation
Management access consolidation reduces the attack surface by forcing all administrative access through OOB or management VRF and controlled source IPs. This is the most fundamental hardening control because it prevents attackers who have compromised a production host from directly reaching the management plane of core switches.
Acceptance Criteria: Only jump-host IPs can reach the management IP; no in-band access from production VLANs; port scan from unauthorized source shows no open management ports; all management access attempts are logged.
Domain 2: AAA + Least Privilege
Central authentication and authorization with role separation and command accounting ensures that every administrative action is attributed to a specific individual, authorized by policy, and recorded for audit. This domain eliminates shared accounts, reduces privilege creep, and provides the evidence chain required for compliance and incident investigation.
Acceptance Criteria: Break-glass local account exists but generates alert on use; all privileged commands are logged with user attribution; role matrix tested with positive and negative test cases; AAA server failure triggers defined fallback behavior (fail-close preferred for new sessions, fail-open only for active sessions).
Domain 3: Secure Protocol Baseline
Protocol hardening eliminates legacy and insecure management protocols, enforces strong cryptographic standards, and minimizes the set of services listening on the management plane. Every open port and enabled service is a potential attack vector; the principle of minimal exposure requires disabling everything that is not explicitly required.
| Service | Required Action | Replacement | Acceptance Test |
|---|---|---|---|
| Telnet | Disable completely | SSH v2 with strong ciphers | Port scan shows TCP/23 closed |
| HTTP | Disable or redirect to HTTPS | HTTPS with TLS 1.2+ | Port scan shows TCP/80 closed or 301 redirect |
| SNMPv1/v2c | Disable completely | SNMPv3 authPriv | SNMPv2c query fails; v3 with correct credentials succeeds |
| TFTP | Disable | SCP/SFTP for file transfers | UDP/69 closed; SCP transfer succeeds |
| CDP/LLDP | Disable on untrusted links | Keep on trusted uplinks with controls | No CDP/LLDP frames on untrusted ports |
| Finger/ident | Disable | N/A | Port scan shows closed |
| SSH | Enforce v2, strong ciphers only | Remove weak ciphers (3DES, RC4, MD5) | Crypto scanner shows only approved suites |
Domain 4: Config Lifecycle Controls
Configuration lifecycle management ensures that every change to the core switch is approved, versioned, backed up, and reversible. Without this domain, a single unauthorized or erroneous configuration change can cause an outage with no recovery path. Config lifecycle controls also provide the evidence required for compliance audits and post-incident analysis.
Acceptance Criteria: Restore test completes within defined RTO; unauthorized config changes generate SIEM alerts within defined detection time; config diff shows no unauthorized changes; backup job success rate >99% over 30-day period.
Domain 5: Control Plane Protection (CoPP/CPP)
Control plane policing (CoPP) or control plane protection (CPP) is the primary defense against CPU exhaustion attacks. Without CoPP, a flood of routing protocol packets, ICMP messages, or ARP requests can saturate the control plane CPU, causing routing adjacency flaps, management access failures, and ultimately network outages. CoPP must be carefully tuned to protect the CPU while preserving legitimate protocol traffic.
The CoPP design process begins with baselining actual protocol traffic rates in the production environment, then applying a safety headroom multiplier, and finally enforcing a total CPU protection cap through proportional scaling. See Chapter 9 for the interactive CoPP Rate Sizing Calculator that automates this process.
| Protocol Class | Typical Baseline (pps) | Recommended Policer (pps) | Action on Exceed |
|---|---|---|---|
| BGP/OSPF/IS-IS hellos | 100–500 | 1,500–3,000 | Drop + count |
| BFD | 200–2,000 | 5,000–10,000 | Drop + count (critical: tune carefully) |
| ICMP to CPU | 50–200 | 500–1,000 | Drop + count |
| ARP to CPU | 100–500 | 1,000–2,000 | Drop + count |
| SSH/HTTPS management | 10–50 | 200–500 | Drop + count |
| SNMPv3 | 10–100 | 300–600 | Drop + count |
| NTP | 1–10 | 100–200 | Drop + count |
| Default/unclassified | Variable | 500–1,000 | Drop + count |
Acceptance Criteria: Controlled traffic flood test (at 10x normal rate) does not spike CPU beyond 60%; routing adjacencies remain stable during flood test; CoPP drop counters increment correctly for each class; BFD sessions do not flap during test.
Domain 6: L2/L3 Protocol Hardening
Layer 2 and Layer 3 protocol hardening prevents a class of attacks that exploit trust relationships in network protocols. ARP spoofing, DHCP rogue servers, IPv6 neighbor discovery attacks, STP topology manipulation, and routing protocol neighbor spoofing can all cause significant outages or enable man-in-the-middle attacks. These controls are applied at the data plane level and must be carefully scoped to avoid disrupting legitimate operations.
| Protection | Threat Mitigated | Implementation | Acceptance Test |
|---|---|---|---|
| Dynamic ARP Inspection (DAI) | ARP spoofing/poisoning | Enable on untrusted VLANs; trust uplinks | Spoofed ARP blocked; legitimate ARP passes |
| DHCP Snooping | Rogue DHCP server | Enable; trust only uplink ports | Rogue DHCP offer dropped; legitimate DHCP works |
| IPv6 ND Inspection | ND spoofing/RA attacks | Enable RA guard on access-facing ports | Rogue RA blocked; legitimate RA passes |
| STP BPDU Guard | Rogue switch topology attack | Enable on all edge/access ports | Rogue BPDU shuts port; legitimate STP unaffected |
| STP Root Guard | Root bridge takeover | Enable on all non-root uplinks | Superior BPDU triggers root-inconsistent state |
| Routing Auth (MD5/SHA) | Routing neighbor spoofing | Configure on all routing adjacencies | Neighbor with wrong key fails to form adjacency |
| BGP TTL Security (GTSM) | Remote BGP attacks | Enable on eBGP sessions | Packets with TTL < threshold dropped |
Domain 7: Logging, Telemetry, and Time
A centralized audit trail with consistent timestamps is the foundation of both security operations and compliance. Without reliable logging, it is impossible to detect attacks, investigate incidents, or demonstrate compliance. The logging domain covers Syslog configuration, streaming telemetry or NetFlow/IPFIX, and NTP synchronization with appropriate restrictions.
Acceptance Criteria: Test events appear in SIEM within defined latency; NTP stratum and drift within policy; log timestamps correlate across devices within 1 second; CoPP drop events generate SIEM alerts; config change events generate SIEM alerts with user attribution.
Domain 8: HA Security Consistency
High-availability designs (MLAG, stacking, chassis with dual supervisors) introduce a risk that is often overlooked: the security posture may change during failover if policies are not properly synchronized between peers. HA security consistency ensures that ACLs, CoPP policies, AAA configuration, and management access controls are identical on both peers and remain intact during and after a failover event.
Acceptance Criteria: During planned switchover, ACL/CoPP/AAA behavior remains consistent; management access is maintained throughout switchover; routing adjacencies recover within defined time; no security policy regression observed post-failover; config diff between peers shows zero differences for security-relevant sections.
2.3 Design Decision Matrix
The following matrix helps practitioners select the appropriate hardening depth for each domain based on the deployment context. Three tiers are defined: Baseline (minimum for any production deployment), Enhanced (recommended for sensitive environments), and Advanced (for critical infrastructure or high-security environments).
| Domain | Baseline | Enhanced | Advanced |
|---|---|---|---|
| Mgmt Access | OOB or mgmt VRF + allowlist | + MFA on jump host | + PAM/session recording + zero-trust |
| AAA | TACACS+ authN + accounting | + command authZ + RBAC | + SOAR integration + anomaly detection |
| Secure Services | Disable Telnet/HTTP/SNMPv1-2c | + SSH key-only auth | + certificate-based auth + FIDO2 |
| Config Lifecycle | Daily backup + manual diff | + automated compliance scan | + GitOps + automated remediation |
| CoPP/CPP | Default platform template | + custom per-protocol tuning | + adaptive policing + telemetry-driven |
| L2/L3 Hardening | DAI + DHCP snooping + STP guards | + routing auth + ND inspection | + MACsec + BGP RPKI |
| Logging | Syslog to SIEM | + streaming telemetry | + behavioral analytics + UEBA |
| HA Consistency | Manual config sync verification | + automated diff + failover test | + continuous compliance monitoring |
2.4 Implementation Sequence
The eight hardening domains should be implemented in a specific sequence to minimize operational risk. Implementing CoPP before establishing management access, for example, could cause a management lockout if the CoPP policy is too aggressive. The recommended sequence ensures that each domain builds on a stable foundation established by the previous domains.
- Establish OOB/management VRF isolation — ensures management access is available throughout the hardening process.
- Deploy AAA infrastructure — configure TACACS+/RADIUS, test authentication and accounting before enforcing.
- Apply secure protocol baseline — disable legacy services after verifying SSH/HTTPS access works correctly.
- Configure config lifecycle controls — establish backup and diff capability before making further changes.
- Baseline CoPP/CPP traffic rates — measure actual protocol rates before applying policing policies.
- Apply CoPP/CPP policies — start with permissive rates and tighten based on baseline measurements.
- Apply L2/L3 protocol hardening — enable DAI, DHCP snooping, STP guards, and routing authentication.
- Verify logging, telemetry, and NTP — confirm all events are reaching SIEM with correct timestamps.
- Perform HA consistency validation — verify peer sync and conduct planned failover test.
- Execute acceptance test plan — document results and store as compliance evidence.