Operations & Maintenance

Chapter 12 — Core Switch Security Hardening Design Guide


Security hardening is not a one-time activity. The threat landscape evolves continuously, new vulnerabilities are discovered in network operating systems, and organizational changes introduce new requirements that may conflict with existing hardening controls. An effective operations and maintenance (O&M) program ensures that the security posture established during initial deployment is maintained, continuously improved, and adapted to changing conditions throughout the lifecycle of the core switch. This chapter defines the recurring O&M activities, their frequencies, and the processes for managing changes, incidents, and end-of-life transitions.

12.1 Recurring Security Maintenance Schedule

The recurring maintenance schedule defines the minimum frequency for each security maintenance activity. Activities are categorized by frequency: daily automated checks, weekly manual reviews, monthly comprehensive audits, and annual lifecycle assessments. All activities must be documented in the organization's ITSM system with completion records retained for audit purposes.

FrequencyActivityMethodOwnerDocumentation
DailyReview SIEM alerts for security eventsSIEM dashboard review; automated alert triageSOC AnalystAlert disposition log
Verify NTP synchronization on all switchesAutomated monitoring; alert on drift >1 secondNOC MonitoringAutomated alert ticket
Check BGP/OSPF session statusSNMP monitoring; automated alert on session downNOC MonitoringAutomated alert ticket
Verify configuration backup completedBackup system status check; verify file timestampNOC MonitoringBackup completion log
WeeklyReview authentication failure logsSIEM query for failed login events; trend analysisSecurity EngineerWeekly security review report
Review CoPP drop countersCLI check or SNMP polling; compare to baselineNetwork EngineerWeekly performance report
Verify physical security (port blockers, tamper seals)Visual inspection of all switches in scopeData Center OperationsPhysical security inspection log
Test OOB management connectivitySSH via OOB path; verify console server accessNetwork EngineerOOB test log
MonthlyFull configuration audit vs. approved baselineAutomated config diff against baseline templateSecurity EngineerConfiguration audit report
Review and rotate credentials (service accounts)AAA server credential rotation; update documentationSecurity EngineerCredential rotation log
Review user access rights (AAA authorization)Export user list from AAA; review against HR recordsSecurity ManagerAccess review report
Check for new CVEs affecting platform softwareVendor security advisories; CVE database querySecurity EngineerCVE review log; patch plan if needed
Review TCAM utilization trendsSNMP polling data; capacity planning reviewNetwork EngineerCapacity planning report
AnnualFull security hardening re-assessment vs. current standardsManual review against updated CIS/NIST benchmarksSecurity ArchitectAnnual hardening assessment report
Platform software upgrade planningReview vendor roadmap; plan upgrade windowNetwork ArchitectSoftware upgrade plan
Disaster recovery and failover testPlanned failover test; verify security controls post-failoverNetwork + Security TeamDR test report; sign-off

12.2 Change Management for Hardening Controls

Changes to hardening controls must follow a formal change management process to prevent unauthorized modifications and ensure that changes do not inadvertently weaken the security posture. The change management process for hardening-related changes is more stringent than for routine network changes, requiring security team review and approval in addition to standard network change approval.

Change TypeExamplesApproval RequiredTesting RequiredRollback Plan
Emergency Security ChangeBlocking active attack; patching critical CVESecurity Manager (verbal OK); document post-changeMinimal; verify attack blocked; verify no service impactPre-staged rollback config; 15-minute rollback window
Security Hardening EnhancementAdding new ACL rule; tightening CoPP rate; enabling new authSecurity Engineer + Security Manager + Change Advisory BoardFull lab testing; acceptance test on productionConfig backup before change; tested rollback procedure
Routine Network ChangeAdding VLAN; updating BGP prefix filter; interface configNetwork Engineer + Change Advisory BoardLab testing if available; production verificationConfig backup before change; rollback procedure documented
Software UpgradeNOS version upgrade; security patchNetwork Architect + Security Manager + Change Advisory BoardFull lab testing on identical platform; acceptance testDowngrade procedure tested in lab; rollback window defined
Emergency AccessConsole access for recovery; local account useSecurity Manager approval; two-person ruleN/A (emergency)Document all actions; review within 24 hours

12.3 Software Lifecycle and Patch Management

Network operating system software contains vulnerabilities that are discovered and disclosed on an ongoing basis. A structured patch management process ensures that critical security patches are applied within defined timeframes while maintaining network stability. The following table defines the patch management SLAs based on vulnerability severity.

SeverityCVSS ScorePatch SLAProcessExceptions
Critical9.0 – 10.072 hours (emergency change)Emergency change process; immediate lab test; expedited production deploymentRequires CISO approval; compensating controls must be documented
High7.0 – 8.914 daysStandard change process; lab test; scheduled maintenance windowExtension up to 30 days with documented risk acceptance
Medium4.0 – 6.960 daysStandard change process; batch with other changes if possibleExtension up to 90 days with documented risk acceptance
Low0.1 – 3.9Next scheduled maintenance window (up to 180 days)Batch with other changes; standard change processMay defer to next major software upgrade

12.4 Hardening Drift Detection and Remediation

Configuration drift — the gradual deviation of a device's running configuration from the approved baseline — is one of the most common causes of security control failures in production environments. Drift can occur due to emergency changes that were not properly documented, software upgrades that reset certain settings, or operator errors. The following table defines the drift detection and remediation process.

Drift TypeDetection MethodSeverityRemediation SLARoot Cause Analysis
Critical hardening control removed or disabledAutomated config diff; SIEM alertCriticalRestore within 4 hoursMandatory RCA within 24 hours; process improvement required
ACL rule added or modified outside change processConfig diff; change management auditHighReview and remediate within 24 hoursRCA within 48 hours; disciplinary process if unauthorized
New local user account createdAutomated account audit; AAA reviewHighDisable and investigate within 4 hoursMandatory RCA; potential security incident
Logging or monitoring configuration changedConfig diff; SIEM gap detectionHighRestore within 24 hoursRCA within 48 hours
Minor configuration deviation (non-security-critical)Monthly config auditLowRemediate within next maintenance windowDocument and update change record

12.5 End-of-Life and Refresh Planning

Core switches that have reached end-of-life (EoL) status from their vendor no longer receive security patches, making them increasingly vulnerable over time. The EoL planning process must begin at least 18 months before the vendor's announced EoL date to allow sufficient time for procurement, testing, and migration. The following table defines the EoL planning milestones and the security risk escalation associated with each phase.

PhaseTimelineActivitySecurity RiskAction Required
EoL Announced18+ months before EoLBegin replacement planning; evaluate successor platformsLow — patches still availableInitiate procurement process; plan migration design
End of Software Maintenance12 months before EoLFinal software version selected; no new patches expectedMedium — no new patches for new CVEsAccelerate replacement; implement compensating controls
End of Support6 months before EoLNo vendor support; no security patchesHigh — unpatched vulnerabilities accumulateReplacement must be in progress; risk acceptance required
Post-EoL OperationAfter EoL dateOperating beyond supported lifecycleCritical — no security patches; increasing exposureImmediate replacement required; CISO risk acceptance mandatory