Design Methods

Chapter 2 — Core Switch Security Hardening Design Guide

2.1 Design Principles & Methodology

Effective core switch security hardening requires a structured design methodology that balances security controls with operational requirements. The fundamental design principles—Isolate, Minimize, Authenticate, Rate-Limit, Observe, and Rehearse—form the foundation of every hardening decision. These principles are not independent; they reinforce each other to create a defense-in-depth posture that is resilient against both external attacks and internal misconfigurations.

The design methodology follows a four-phase lifecycle: Design (define policies, architecture, and risk assessment), Implement (deploy configurations, apply controls, automate provisioning), Verify (audit configurations, vulnerability scanning, penetration testing), and Monitor (continuous monitoring, threat detection, incident response, performance tracking). This lifecycle is applied to each of the eight hardening domains described in this chapter.

Security Design Principles and Hardening Methodology Flowchart

Figure 2.1: Security Design Principles and Hardening Methodology — Eight Domains with Design-Implement-Verify-Monitor Lifecycle

2.2 Eight Hardening Domains

The hardening baseline is organized into eight implementable domains. Each domain addresses a specific attack surface, has defined implementation points, and includes measurable acceptance criteria. The domains are designed to be implemented in sequence, as later domains depend on the foundation established by earlier ones.

Domain 1: Management Access Consolidation

Management access consolidation reduces the attack surface by forcing all administrative access through OOB or management VRF and controlled source IPs. This is the most fundamental hardening control because it prevents attackers who have compromised a production host from directly reaching the management plane of core switches.

Implementation Points: Configure dedicated management VRF or OOB interface; apply source-IP allowlist (deny-all + permit jump-host IPs only); disable management access from production VLANs; configure management firewall rules; verify no route leak between production and management VRFs.

Acceptance Criteria: Only jump-host IPs can reach the management IP; no in-band access from production VLANs; port scan from unauthorized source shows no open management ports; all management access attempts are logged.

Domain 2: AAA + Least Privilege

Central authentication and authorization with role separation and command accounting ensures that every administrative action is attributed to a specific individual, authorized by policy, and recorded for audit. This domain eliminates shared accounts, reduces privilege creep, and provides the evidence chain required for compliance and incident investigation.

Implementation Points: Configure TACACS+ or RADIUS with command authorization; define RBAC roles (read-only, operator, admin, emergency); implement command accounting to SIEM; configure break-glass local account with monitoring; enforce MFA on jump host; rotate shared secrets on schedule.

Acceptance Criteria: Break-glass local account exists but generates alert on use; all privileged commands are logged with user attribution; role matrix tested with positive and negative test cases; AAA server failure triggers defined fallback behavior (fail-close preferred for new sessions, fail-open only for active sessions).

Domain 3: Secure Protocol Baseline

Protocol hardening eliminates legacy and insecure management protocols, enforces strong cryptographic standards, and minimizes the set of services listening on the management plane. Every open port and enabled service is a potential attack vector; the principle of minimal exposure requires disabling everything that is not explicitly required.

Service	Required Action	Replacement	Acceptance Test
Telnet	Disable completely	SSH v2 with strong ciphers	Port scan shows TCP/23 closed
HTTP	Disable or redirect to HTTPS	HTTPS with TLS 1.2+	Port scan shows TCP/80 closed or 301 redirect
SNMPv1/v2c	Disable completely	SNMPv3 authPriv	SNMPv2c query fails; v3 with correct credentials succeeds
TFTP	Disable	SCP/SFTP for file transfers	UDP/69 closed; SCP transfer succeeds
CDP/LLDP	Disable on untrusted links	Keep on trusted uplinks with controls	No CDP/LLDP frames on untrusted ports
Finger/ident	Disable	N/A	Port scan shows closed
SSH	Enforce v2, strong ciphers only	Remove weak ciphers (3DES, RC4, MD5)	Crypto scanner shows only approved suites

Domain 4: Config Lifecycle Controls

Configuration lifecycle management ensures that every change to the core switch is approved, versioned, backed up, and reversible. Without this domain, a single unauthorized or erroneous configuration change can cause an outage with no recovery path. Config lifecycle controls also provide the evidence required for compliance audits and post-incident analysis.

Implementation Points: Automated backup triggered by config change events; golden config repository with version control; pre-change and post-change diff generation; rollback procedure tested and documented; change approval workflow integrated with ITSM; automated compliance scan comparing running config to baseline template.

Acceptance Criteria: Restore test completes within defined RTO; unauthorized config changes generate SIEM alerts within defined detection time; config diff shows no unauthorized changes; backup job success rate >99% over 30-day period.

Domain 5: Control Plane Protection (CoPP/CPP)

Control plane policing (CoPP) or control plane protection (CPP) is the primary defense against CPU exhaustion attacks. Without CoPP, a flood of routing protocol packets, ICMP messages, or ARP requests can saturate the control plane CPU, causing routing adjacency flaps, management access failures, and ultimately network outages. CoPP must be carefully tuned to protect the CPU while preserving legitimate protocol traffic.

The CoPP design process begins with baselining actual protocol traffic rates in the production environment, then applying a safety headroom multiplier, and finally enforcing a total CPU protection cap through proportional scaling. See Chapter 9 for the interactive CoPP Rate Sizing Calculator that automates this process.

Protocol Class	Typical Baseline (pps)	Recommended Policer (pps)	Action on Exceed
BGP/OSPF/IS-IS hellos	100–500	1,500–3,000	Drop + count
BFD	200–2,000	5,000–10,000	Drop + count (critical: tune carefully)
ICMP to CPU	50–200	500–1,000	Drop + count
ARP to CPU	100–500	1,000–2,000	Drop + count
SSH/HTTPS management	10–50	200–500	Drop + count
SNMPv3	10–100	300–600	Drop + count
NTP	1–10	100–200	Drop + count
Default/unclassified	Variable	500–1,000	Drop + count

Acceptance Criteria: Controlled traffic flood test (at 10x normal rate) does not spike CPU beyond 60%; routing adjacencies remain stable during flood test; CoPP drop counters increment correctly for each class; BFD sessions do not flap during test.

Domain 6: L2/L3 Protocol Hardening

Layer 2 and Layer 3 protocol hardening prevents a class of attacks that exploit trust relationships in network protocols. ARP spoofing, DHCP rogue servers, IPv6 neighbor discovery attacks, STP topology manipulation, and routing protocol neighbor spoofing can all cause significant outages or enable man-in-the-middle attacks. These controls are applied at the data plane level and must be carefully scoped to avoid disrupting legitimate operations.

Protection	Threat Mitigated	Implementation	Acceptance Test
Dynamic ARP Inspection (DAI)	ARP spoofing/poisoning	Enable on untrusted VLANs; trust uplinks	Spoofed ARP blocked; legitimate ARP passes
DHCP Snooping	Rogue DHCP server	Enable; trust only uplink ports	Rogue DHCP offer dropped; legitimate DHCP works
IPv6 ND Inspection	ND spoofing/RA attacks	Enable RA guard on access-facing ports	Rogue RA blocked; legitimate RA passes
STP BPDU Guard	Rogue switch topology attack	Enable on all edge/access ports	Rogue BPDU shuts port; legitimate STP unaffected
STP Root Guard	Root bridge takeover	Enable on all non-root uplinks	Superior BPDU triggers root-inconsistent state
Routing Auth (MD5/SHA)	Routing neighbor spoofing	Configure on all routing adjacencies	Neighbor with wrong key fails to form adjacency
BGP TTL Security (GTSM)	Remote BGP attacks	Enable on eBGP sessions	Packets with TTL < threshold dropped

Domain 7: Logging, Telemetry, and Time

A centralized audit trail with consistent timestamps is the foundation of both security operations and compliance. Without reliable logging, it is impossible to detect attacks, investigate incidents, or demonstrate compliance. The logging domain covers Syslog configuration, streaming telemetry or NetFlow/IPFIX, and NTP synchronization with appropriate restrictions.

Implementation Points: Configure Syslog to send severity levels 0–6 to centralized SIEM; use TCP transport with TLS where supported; configure local buffer as fallback; restrict NTP to internal stratum-2 servers with ACL; enable streaming telemetry for interface counters, CPU/memory, and CoPP drops; configure log timestamps with millisecond precision and timezone.

Acceptance Criteria: Test events appear in SIEM within defined latency; NTP stratum and drift within policy; log timestamps correlate across devices within 1 second; CoPP drop events generate SIEM alerts; config change events generate SIEM alerts with user attribution.

Domain 8: HA Security Consistency

High-availability designs (MLAG, stacking, chassis with dual supervisors) introduce a risk that is often overlooked: the security posture may change during failover if policies are not properly synchronized between peers. HA security consistency ensures that ACLs, CoPP policies, AAA configuration, and management access controls are identical on both peers and remain intact during and after a failover event.

Implementation Points: Verify that all security-relevant configuration (ACL, CoPP, AAA, NTP, Syslog) is synchronized across MLAG/stack peers; perform planned failover test and verify management access, routing adjacencies, and CoPP behavior are unchanged; document any configuration that is not automatically synchronized and establish manual sync procedure.

Acceptance Criteria: During planned switchover, ACL/CoPP/AAA behavior remains consistent; management access is maintained throughout switchover; routing adjacencies recover within defined time; no security policy regression observed post-failover; config diff between peers shows zero differences for security-relevant sections.

2.3 Design Decision Matrix

The following matrix helps practitioners select the appropriate hardening depth for each domain based on the deployment context. Three tiers are defined: Baseline (minimum for any production deployment), Enhanced (recommended for sensitive environments), and Advanced (for critical infrastructure or high-security environments).

Domain	Baseline	Enhanced	Advanced
Mgmt Access	OOB or mgmt VRF + allowlist	+ MFA on jump host	+ PAM/session recording + zero-trust
AAA	TACACS+ authN + accounting	+ command authZ + RBAC	+ SOAR integration + anomaly detection
Secure Services	Disable Telnet/HTTP/SNMPv1-2c	+ SSH key-only auth	+ certificate-based auth + FIDO2
Config Lifecycle	Daily backup + manual diff	+ automated compliance scan	+ GitOps + automated remediation
CoPP/CPP	Default platform template	+ custom per-protocol tuning	+ adaptive policing + telemetry-driven
L2/L3 Hardening	DAI + DHCP snooping + STP guards	+ routing auth + ND inspection	+ MACsec + BGP RPKI
Logging	Syslog to SIEM	+ streaming telemetry	+ behavioral analytics + UEBA
HA Consistency	Manual config sync verification	+ automated diff + failover test	+ continuous compliance monitoring

2.4 Implementation Sequence

The eight hardening domains should be implemented in a specific sequence to minimize operational risk. Implementing CoPP before establishing management access, for example, could cause a management lockout if the CoPP policy is too aggressive. The recommended sequence ensures that each domain builds on a stable foundation established by the previous domains.

Establish OOB/management VRF isolation — ensures management access is available throughout the hardening process.
Deploy AAA infrastructure — configure TACACS+/RADIUS, test authentication and accounting before enforcing.
Apply secure protocol baseline — disable legacy services after verifying SSH/HTTPS access works correctly.
Configure config lifecycle controls — establish backup and diff capability before making further changes.
Baseline CoPP/CPP traffic rates — measure actual protocol rates before applying policing policies.
Apply CoPP/CPP policies — start with permissive rates and tighten based on baseline measurements.
Apply L2/L3 protocol hardening — enable DAI, DHCP snooping, STP guards, and routing authentication.
Verify logging, telemetry, and NTP — confirm all events are reaching SIEM with correct timestamps.
Perform HA consistency validation — verify peer sync and conduct planned failover test.
Execute acceptance test plan — document results and store as compliance evidence.

Critical Warning: Never apply CoPP policies in a production environment without first baselining actual protocol traffic rates in a lab or during a maintenance window. Overly aggressive CoPP can cause routing adjacency flaps, which may trigger cascading failures across the network. Always test CoPP changes in a controlled environment first.