Design Methods

Chapter 2 — Core Switch Security Hardening Design Guide


2.1 Design Principles & Methodology

Effective core switch security hardening requires a structured design methodology that balances security controls with operational requirements. The fundamental design principles—Isolate, Minimize, Authenticate, Rate-Limit, Observe, and Rehearse—form the foundation of every hardening decision. These principles are not independent; they reinforce each other to create a defense-in-depth posture that is resilient against both external attacks and internal misconfigurations.

The design methodology follows a four-phase lifecycle: Design (define policies, architecture, and risk assessment), Implement (deploy configurations, apply controls, automate provisioning), Verify (audit configurations, vulnerability scanning, penetration testing), and Monitor (continuous monitoring, threat detection, incident response, performance tracking). This lifecycle is applied to each of the eight hardening domains described in this chapter.

Security Design Principles and Hardening Methodology Flowchart
Figure 2.1: Security Design Principles and Hardening Methodology — Eight Domains with Design-Implement-Verify-Monitor Lifecycle

2.2 Eight Hardening Domains

The hardening baseline is organized into eight implementable domains. Each domain addresses a specific attack surface, has defined implementation points, and includes measurable acceptance criteria. The domains are designed to be implemented in sequence, as later domains depend on the foundation established by earlier ones.

Domain 1: Management Access Consolidation

Management access consolidation reduces the attack surface by forcing all administrative access through OOB or management VRF and controlled source IPs. This is the most fundamental hardening control because it prevents attackers who have compromised a production host from directly reaching the management plane of core switches.

Implementation Points: Configure dedicated management VRF or OOB interface; apply source-IP allowlist (deny-all + permit jump-host IPs only); disable management access from production VLANs; configure management firewall rules; verify no route leak between production and management VRFs.

Acceptance Criteria: Only jump-host IPs can reach the management IP; no in-band access from production VLANs; port scan from unauthorized source shows no open management ports; all management access attempts are logged.

Domain 2: AAA + Least Privilege

Central authentication and authorization with role separation and command accounting ensures that every administrative action is attributed to a specific individual, authorized by policy, and recorded for audit. This domain eliminates shared accounts, reduces privilege creep, and provides the evidence chain required for compliance and incident investigation.

Implementation Points: Configure TACACS+ or RADIUS with command authorization; define RBAC roles (read-only, operator, admin, emergency); implement command accounting to SIEM; configure break-glass local account with monitoring; enforce MFA on jump host; rotate shared secrets on schedule.

Acceptance Criteria: Break-glass local account exists but generates alert on use; all privileged commands are logged with user attribution; role matrix tested with positive and negative test cases; AAA server failure triggers defined fallback behavior (fail-close preferred for new sessions, fail-open only for active sessions).

Domain 3: Secure Protocol Baseline

Protocol hardening eliminates legacy and insecure management protocols, enforces strong cryptographic standards, and minimizes the set of services listening on the management plane. Every open port and enabled service is a potential attack vector; the principle of minimal exposure requires disabling everything that is not explicitly required.

ServiceRequired ActionReplacementAcceptance Test
TelnetDisable completelySSH v2 with strong ciphersPort scan shows TCP/23 closed
HTTPDisable or redirect to HTTPSHTTPS with TLS 1.2+Port scan shows TCP/80 closed or 301 redirect
SNMPv1/v2cDisable completelySNMPv3 authPrivSNMPv2c query fails; v3 with correct credentials succeeds
TFTPDisableSCP/SFTP for file transfersUDP/69 closed; SCP transfer succeeds
CDP/LLDPDisable on untrusted linksKeep on trusted uplinks with controlsNo CDP/LLDP frames on untrusted ports
Finger/identDisableN/APort scan shows closed
SSHEnforce v2, strong ciphers onlyRemove weak ciphers (3DES, RC4, MD5)Crypto scanner shows only approved suites

Domain 4: Config Lifecycle Controls

Configuration lifecycle management ensures that every change to the core switch is approved, versioned, backed up, and reversible. Without this domain, a single unauthorized or erroneous configuration change can cause an outage with no recovery path. Config lifecycle controls also provide the evidence required for compliance audits and post-incident analysis.

Implementation Points: Automated backup triggered by config change events; golden config repository with version control; pre-change and post-change diff generation; rollback procedure tested and documented; change approval workflow integrated with ITSM; automated compliance scan comparing running config to baseline template.

Acceptance Criteria: Restore test completes within defined RTO; unauthorized config changes generate SIEM alerts within defined detection time; config diff shows no unauthorized changes; backup job success rate >99% over 30-day period.

Domain 5: Control Plane Protection (CoPP/CPP)

Control plane policing (CoPP) or control plane protection (CPP) is the primary defense against CPU exhaustion attacks. Without CoPP, a flood of routing protocol packets, ICMP messages, or ARP requests can saturate the control plane CPU, causing routing adjacency flaps, management access failures, and ultimately network outages. CoPP must be carefully tuned to protect the CPU while preserving legitimate protocol traffic.

The CoPP design process begins with baselining actual protocol traffic rates in the production environment, then applying a safety headroom multiplier, and finally enforcing a total CPU protection cap through proportional scaling. See Chapter 9 for the interactive CoPP Rate Sizing Calculator that automates this process.

Protocol ClassTypical Baseline (pps)Recommended Policer (pps)Action on Exceed
BGP/OSPF/IS-IS hellos100–5001,500–3,000Drop + count
BFD200–2,0005,000–10,000Drop + count (critical: tune carefully)
ICMP to CPU50–200500–1,000Drop + count
ARP to CPU100–5001,000–2,000Drop + count
SSH/HTTPS management10–50200–500Drop + count
SNMPv310–100300–600Drop + count
NTP1–10100–200Drop + count
Default/unclassifiedVariable500–1,000Drop + count

Acceptance Criteria: Controlled traffic flood test (at 10x normal rate) does not spike CPU beyond 60%; routing adjacencies remain stable during flood test; CoPP drop counters increment correctly for each class; BFD sessions do not flap during test.

Domain 6: L2/L3 Protocol Hardening

Layer 2 and Layer 3 protocol hardening prevents a class of attacks that exploit trust relationships in network protocols. ARP spoofing, DHCP rogue servers, IPv6 neighbor discovery attacks, STP topology manipulation, and routing protocol neighbor spoofing can all cause significant outages or enable man-in-the-middle attacks. These controls are applied at the data plane level and must be carefully scoped to avoid disrupting legitimate operations.

ProtectionThreat MitigatedImplementationAcceptance Test
Dynamic ARP Inspection (DAI)ARP spoofing/poisoningEnable on untrusted VLANs; trust uplinksSpoofed ARP blocked; legitimate ARP passes
DHCP SnoopingRogue DHCP serverEnable; trust only uplink portsRogue DHCP offer dropped; legitimate DHCP works
IPv6 ND InspectionND spoofing/RA attacksEnable RA guard on access-facing portsRogue RA blocked; legitimate RA passes
STP BPDU GuardRogue switch topology attackEnable on all edge/access portsRogue BPDU shuts port; legitimate STP unaffected
STP Root GuardRoot bridge takeoverEnable on all non-root uplinksSuperior BPDU triggers root-inconsistent state
Routing Auth (MD5/SHA)Routing neighbor spoofingConfigure on all routing adjacenciesNeighbor with wrong key fails to form adjacency
BGP TTL Security (GTSM)Remote BGP attacksEnable on eBGP sessionsPackets with TTL < threshold dropped

Domain 7: Logging, Telemetry, and Time

A centralized audit trail with consistent timestamps is the foundation of both security operations and compliance. Without reliable logging, it is impossible to detect attacks, investigate incidents, or demonstrate compliance. The logging domain covers Syslog configuration, streaming telemetry or NetFlow/IPFIX, and NTP synchronization with appropriate restrictions.

Implementation Points: Configure Syslog to send severity levels 0–6 to centralized SIEM; use TCP transport with TLS where supported; configure local buffer as fallback; restrict NTP to internal stratum-2 servers with ACL; enable streaming telemetry for interface counters, CPU/memory, and CoPP drops; configure log timestamps with millisecond precision and timezone.

Acceptance Criteria: Test events appear in SIEM within defined latency; NTP stratum and drift within policy; log timestamps correlate across devices within 1 second; CoPP drop events generate SIEM alerts; config change events generate SIEM alerts with user attribution.

Domain 8: HA Security Consistency

High-availability designs (MLAG, stacking, chassis with dual supervisors) introduce a risk that is often overlooked: the security posture may change during failover if policies are not properly synchronized between peers. HA security consistency ensures that ACLs, CoPP policies, AAA configuration, and management access controls are identical on both peers and remain intact during and after a failover event.

Implementation Points: Verify that all security-relevant configuration (ACL, CoPP, AAA, NTP, Syslog) is synchronized across MLAG/stack peers; perform planned failover test and verify management access, routing adjacencies, and CoPP behavior are unchanged; document any configuration that is not automatically synchronized and establish manual sync procedure.

Acceptance Criteria: During planned switchover, ACL/CoPP/AAA behavior remains consistent; management access is maintained throughout switchover; routing adjacencies recover within defined time; no security policy regression observed post-failover; config diff between peers shows zero differences for security-relevant sections.

2.3 Design Decision Matrix

The following matrix helps practitioners select the appropriate hardening depth for each domain based on the deployment context. Three tiers are defined: Baseline (minimum for any production deployment), Enhanced (recommended for sensitive environments), and Advanced (for critical infrastructure or high-security environments).

DomainBaselineEnhancedAdvanced
Mgmt AccessOOB or mgmt VRF + allowlist+ MFA on jump host+ PAM/session recording + zero-trust
AAATACACS+ authN + accounting+ command authZ + RBAC+ SOAR integration + anomaly detection
Secure ServicesDisable Telnet/HTTP/SNMPv1-2c+ SSH key-only auth+ certificate-based auth + FIDO2
Config LifecycleDaily backup + manual diff+ automated compliance scan+ GitOps + automated remediation
CoPP/CPPDefault platform template+ custom per-protocol tuning+ adaptive policing + telemetry-driven
L2/L3 HardeningDAI + DHCP snooping + STP guards+ routing auth + ND inspection+ MACsec + BGP RPKI
LoggingSyslog to SIEM+ streaming telemetry+ behavioral analytics + UEBA
HA ConsistencyManual config sync verification+ automated diff + failover test+ continuous compliance monitoring

2.4 Implementation Sequence

The eight hardening domains should be implemented in a specific sequence to minimize operational risk. Implementing CoPP before establishing management access, for example, could cause a management lockout if the CoPP policy is too aggressive. The recommended sequence ensures that each domain builds on a stable foundation established by the previous domains.

  1. Establish OOB/management VRF isolation — ensures management access is available throughout the hardening process.
  2. Deploy AAA infrastructure — configure TACACS+/RADIUS, test authentication and accounting before enforcing.
  3. Apply secure protocol baseline — disable legacy services after verifying SSH/HTTPS access works correctly.
  4. Configure config lifecycle controls — establish backup and diff capability before making further changes.
  5. Baseline CoPP/CPP traffic rates — measure actual protocol rates before applying policing policies.
  6. Apply CoPP/CPP policies — start with permissive rates and tighten based on baseline measurements.
  7. Apply L2/L3 protocol hardening — enable DAI, DHCP snooping, STP guards, and routing authentication.
  8. Verify logging, telemetry, and NTP — confirm all events are reaching SIEM with correct timestamps.
  9. Perform HA consistency validation — verify peer sync and conduct planned failover test.
  10. Execute acceptance test plan — document results and store as compliance evidence.
Critical Warning: Never apply CoPP policies in a production environment without first baselining actual protocol traffic rates in a lab or during a maintenance window. Overly aggressive CoPP can cause routing adjacency flaps, which may trigger cascading failures across the network. Always test CoPP changes in a controlled environment first.