Operations & Maintenance

Chapter 12 — Core Switch Security Hardening Design Guide

Security hardening is not a one-time activity. The threat landscape evolves continuously, new vulnerabilities are discovered in network operating systems, and organizational changes introduce new requirements that may conflict with existing hardening controls. An effective operations and maintenance (O&M) program ensures that the security posture established during initial deployment is maintained, continuously improved, and adapted to changing conditions throughout the lifecycle of the core switch. This chapter defines the recurring O&M activities, their frequencies, and the processes for managing changes, incidents, and end-of-life transitions.

12.1 Recurring Security Maintenance Schedule

The recurring maintenance schedule defines the minimum frequency for each security maintenance activity. Activities are categorized by frequency: daily automated checks, weekly manual reviews, monthly comprehensive audits, and annual lifecycle assessments. All activities must be documented in the organization's ITSM system with completion records retained for audit purposes.

Frequency	Activity	Method	Owner	Documentation
Daily	Review SIEM alerts for security events	SIEM dashboard review; automated alert triage	SOC Analyst	Alert disposition log
	Verify NTP synchronization on all switches	Automated monitoring; alert on drift >1 second	NOC Monitoring	Automated alert ticket
	Check BGP/OSPF session status	SNMP monitoring; automated alert on session down	NOC Monitoring	Automated alert ticket
	Verify configuration backup completed	Backup system status check; verify file timestamp	NOC Monitoring	Backup completion log
Weekly	Review authentication failure logs	SIEM query for failed login events; trend analysis	Security Engineer	Weekly security review report
	Review CoPP drop counters	CLI check or SNMP polling; compare to baseline	Network Engineer	Weekly performance report
	Verify physical security (port blockers, tamper seals)	Visual inspection of all switches in scope	Data Center Operations	Physical security inspection log
	Test OOB management connectivity	SSH via OOB path; verify console server access	Network Engineer	OOB test log
Monthly	Full configuration audit vs. approved baseline	Automated config diff against baseline template	Security Engineer	Configuration audit report
	Review and rotate credentials (service accounts)	AAA server credential rotation; update documentation	Security Engineer	Credential rotation log
	Review user access rights (AAA authorization)	Export user list from AAA; review against HR records	Security Manager	Access review report
	Check for new CVEs affecting platform software	Vendor security advisories; CVE database query	Security Engineer	CVE review log; patch plan if needed
	Review TCAM utilization trends	SNMP polling data; capacity planning review	Network Engineer	Capacity planning report
Annual	Full security hardening re-assessment vs. current standards	Manual review against updated CIS/NIST benchmarks	Security Architect	Annual hardening assessment report
	Platform software upgrade planning	Review vendor roadmap; plan upgrade window	Network Architect	Software upgrade plan
	Disaster recovery and failover test	Planned failover test; verify security controls post-failover	Network + Security Team	DR test report; sign-off

12.2 Change Management for Hardening Controls

Changes to hardening controls must follow a formal change management process to prevent unauthorized modifications and ensure that changes do not inadvertently weaken the security posture. The change management process for hardening-related changes is more stringent than for routine network changes, requiring security team review and approval in addition to standard network change approval.

Change Type	Examples	Approval Required	Testing Required	Rollback Plan
Emergency Security Change	Blocking active attack; patching critical CVE	Security Manager (verbal OK); document post-change	Minimal; verify attack blocked; verify no service impact	Pre-staged rollback config; 15-minute rollback window
Security Hardening Enhancement	Adding new ACL rule; tightening CoPP rate; enabling new auth	Security Engineer + Security Manager + Change Advisory Board	Full lab testing; acceptance test on production	Config backup before change; tested rollback procedure
Routine Network Change	Adding VLAN; updating BGP prefix filter; interface config	Network Engineer + Change Advisory Board	Lab testing if available; production verification	Config backup before change; rollback procedure documented
Software Upgrade	NOS version upgrade; security patch	Network Architect + Security Manager + Change Advisory Board	Full lab testing on identical platform; acceptance test	Downgrade procedure tested in lab; rollback window defined
Emergency Access	Console access for recovery; local account use	Security Manager approval; two-person rule	N/A (emergency)	Document all actions; review within 24 hours

12.3 Software Lifecycle and Patch Management

Network operating system software contains vulnerabilities that are discovered and disclosed on an ongoing basis. A structured patch management process ensures that critical security patches are applied within defined timeframes while maintaining network stability. The following table defines the patch management SLAs based on vulnerability severity.

Severity	CVSS Score	Patch SLA	Process	Exceptions
Critical	9.0 – 10.0	72 hours (emergency change)	Emergency change process; immediate lab test; expedited production deployment	Requires CISO approval; compensating controls must be documented
High	7.0 – 8.9	14 days	Standard change process; lab test; scheduled maintenance window	Extension up to 30 days with documented risk acceptance
Medium	4.0 – 6.9	60 days	Standard change process; batch with other changes if possible	Extension up to 90 days with documented risk acceptance
Low	0.1 – 3.9	Next scheduled maintenance window (up to 180 days)	Batch with other changes; standard change process	May defer to next major software upgrade

12.4 Hardening Drift Detection and Remediation

Configuration drift — the gradual deviation of a device's running configuration from the approved baseline — is one of the most common causes of security control failures in production environments. Drift can occur due to emergency changes that were not properly documented, software upgrades that reset certain settings, or operator errors. The following table defines the drift detection and remediation process.

Drift Type	Detection Method	Severity	Remediation SLA	Root Cause Analysis
Critical hardening control removed or disabled	Automated config diff; SIEM alert	Critical	Restore within 4 hours	Mandatory RCA within 24 hours; process improvement required
ACL rule added or modified outside change process	Config diff; change management audit	High	Review and remediate within 24 hours	RCA within 48 hours; disciplinary process if unauthorized
New local user account created	Automated account audit; AAA review	High	Disable and investigate within 4 hours	Mandatory RCA; potential security incident
Logging or monitoring configuration changed	Config diff; SIEM gap detection	High	Restore within 24 hours	RCA within 48 hours
Minor configuration deviation (non-security-critical)	Monthly config audit	Low	Remediate within next maintenance window	Document and update change record

12.5 End-of-Life and Refresh Planning

Core switches that have reached end-of-life (EoL) status from their vendor no longer receive security patches, making them increasingly vulnerable over time. The EoL planning process must begin at least 18 months before the vendor's announced EoL date to allow sufficient time for procurement, testing, and migration. The following table defines the EoL planning milestones and the security risk escalation associated with each phase.

Phase	Timeline	Activity	Security Risk	Action Required
EoL Announced	18+ months before EoL	Begin replacement planning; evaluate successor platforms	Low — patches still available	Initiate procurement process; plan migration design
End of Software Maintenance	12 months before EoL	Final software version selected; no new patches expected	Medium — no new patches for new CVEs	Accelerate replacement; implement compensating controls
End of Support	6 months before EoL	No vendor support; no security patches	High — unpatched vulnerabilities accumulate	Replacement must be in progress; risk acceptance required
Post-EoL Operation	After EoL date	Operating beyond supported lifecycle	Critical — no security patches; increasing exposure	Immediate replacement required; CISO risk acceptance mandatory