Operations & Maintenance
Chapter 12 — Core Switch Security Hardening Design Guide
Security hardening is not a one-time activity. The threat landscape evolves continuously, new vulnerabilities are discovered in network operating systems, and organizational changes introduce new requirements that may conflict with existing hardening controls. An effective operations and maintenance (O&M) program ensures that the security posture established during initial deployment is maintained, continuously improved, and adapted to changing conditions throughout the lifecycle of the core switch. This chapter defines the recurring O&M activities, their frequencies, and the processes for managing changes, incidents, and end-of-life transitions.
12.1 Recurring Security Maintenance Schedule
The recurring maintenance schedule defines the minimum frequency for each security maintenance activity. Activities are categorized by frequency: daily automated checks, weekly manual reviews, monthly comprehensive audits, and annual lifecycle assessments. All activities must be documented in the organization's ITSM system with completion records retained for audit purposes.
| Frequency | Activity | Method | Owner | Documentation |
|---|---|---|---|---|
| Daily | Review SIEM alerts for security events | SIEM dashboard review; automated alert triage | SOC Analyst | Alert disposition log |
| Verify NTP synchronization on all switches | Automated monitoring; alert on drift >1 second | NOC Monitoring | Automated alert ticket | |
| Check BGP/OSPF session status | SNMP monitoring; automated alert on session down | NOC Monitoring | Automated alert ticket | |
| Verify configuration backup completed | Backup system status check; verify file timestamp | NOC Monitoring | Backup completion log | |
| Weekly | Review authentication failure logs | SIEM query for failed login events; trend analysis | Security Engineer | Weekly security review report |
| Review CoPP drop counters | CLI check or SNMP polling; compare to baseline | Network Engineer | Weekly performance report | |
| Verify physical security (port blockers, tamper seals) | Visual inspection of all switches in scope | Data Center Operations | Physical security inspection log | |
| Test OOB management connectivity | SSH via OOB path; verify console server access | Network Engineer | OOB test log | |
| Monthly | Full configuration audit vs. approved baseline | Automated config diff against baseline template | Security Engineer | Configuration audit report |
| Review and rotate credentials (service accounts) | AAA server credential rotation; update documentation | Security Engineer | Credential rotation log | |
| Review user access rights (AAA authorization) | Export user list from AAA; review against HR records | Security Manager | Access review report | |
| Check for new CVEs affecting platform software | Vendor security advisories; CVE database query | Security Engineer | CVE review log; patch plan if needed | |
| Review TCAM utilization trends | SNMP polling data; capacity planning review | Network Engineer | Capacity planning report | |
| Annual | Full security hardening re-assessment vs. current standards | Manual review against updated CIS/NIST benchmarks | Security Architect | Annual hardening assessment report |
| Platform software upgrade planning | Review vendor roadmap; plan upgrade window | Network Architect | Software upgrade plan | |
| Disaster recovery and failover test | Planned failover test; verify security controls post-failover | Network + Security Team | DR test report; sign-off |
12.2 Change Management for Hardening Controls
Changes to hardening controls must follow a formal change management process to prevent unauthorized modifications and ensure that changes do not inadvertently weaken the security posture. The change management process for hardening-related changes is more stringent than for routine network changes, requiring security team review and approval in addition to standard network change approval.
| Change Type | Examples | Approval Required | Testing Required | Rollback Plan |
|---|---|---|---|---|
| Emergency Security Change | Blocking active attack; patching critical CVE | Security Manager (verbal OK); document post-change | Minimal; verify attack blocked; verify no service impact | Pre-staged rollback config; 15-minute rollback window |
| Security Hardening Enhancement | Adding new ACL rule; tightening CoPP rate; enabling new auth | Security Engineer + Security Manager + Change Advisory Board | Full lab testing; acceptance test on production | Config backup before change; tested rollback procedure |
| Routine Network Change | Adding VLAN; updating BGP prefix filter; interface config | Network Engineer + Change Advisory Board | Lab testing if available; production verification | Config backup before change; rollback procedure documented |
| Software Upgrade | NOS version upgrade; security patch | Network Architect + Security Manager + Change Advisory Board | Full lab testing on identical platform; acceptance test | Downgrade procedure tested in lab; rollback window defined |
| Emergency Access | Console access for recovery; local account use | Security Manager approval; two-person rule | N/A (emergency) | Document all actions; review within 24 hours |
12.3 Software Lifecycle and Patch Management
Network operating system software contains vulnerabilities that are discovered and disclosed on an ongoing basis. A structured patch management process ensures that critical security patches are applied within defined timeframes while maintaining network stability. The following table defines the patch management SLAs based on vulnerability severity.
| Severity | CVSS Score | Patch SLA | Process | Exceptions |
|---|---|---|---|---|
| Critical | 9.0 – 10.0 | 72 hours (emergency change) | Emergency change process; immediate lab test; expedited production deployment | Requires CISO approval; compensating controls must be documented |
| High | 7.0 – 8.9 | 14 days | Standard change process; lab test; scheduled maintenance window | Extension up to 30 days with documented risk acceptance |
| Medium | 4.0 – 6.9 | 60 days | Standard change process; batch with other changes if possible | Extension up to 90 days with documented risk acceptance |
| Low | 0.1 – 3.9 | Next scheduled maintenance window (up to 180 days) | Batch with other changes; standard change process | May defer to next major software upgrade |
12.4 Hardening Drift Detection and Remediation
Configuration drift — the gradual deviation of a device's running configuration from the approved baseline — is one of the most common causes of security control failures in production environments. Drift can occur due to emergency changes that were not properly documented, software upgrades that reset certain settings, or operator errors. The following table defines the drift detection and remediation process.
| Drift Type | Detection Method | Severity | Remediation SLA | Root Cause Analysis |
|---|---|---|---|---|
| Critical hardening control removed or disabled | Automated config diff; SIEM alert | Critical | Restore within 4 hours | Mandatory RCA within 24 hours; process improvement required |
| ACL rule added or modified outside change process | Config diff; change management audit | High | Review and remediate within 24 hours | RCA within 48 hours; disciplinary process if unauthorized |
| New local user account created | Automated account audit; AAA review | High | Disable and investigate within 4 hours | Mandatory RCA; potential security incident |
| Logging or monitoring configuration changed | Config diff; SIEM gap detection | High | Restore within 24 hours | RCA within 48 hours |
| Minor configuration deviation (non-security-critical) | Monthly config audit | Low | Remediate within next maintenance window | Document and update change record |
12.5 End-of-Life and Refresh Planning
Core switches that have reached end-of-life (EoL) status from their vendor no longer receive security patches, making them increasingly vulnerable over time. The EoL planning process must begin at least 18 months before the vendor's announced EoL date to allow sufficient time for procurement, testing, and migration. The following table defines the EoL planning milestones and the security risk escalation associated with each phase.
| Phase | Timeline | Activity | Security Risk | Action Required |
|---|---|---|---|---|
| EoL Announced | 18+ months before EoL | Begin replacement planning; evaluate successor platforms | Low — patches still available | Initiate procurement process; plan migration design |
| End of Software Maintenance | 12 months before EoL | Final software version selected; no new patches expected | Medium — no new patches for new CVEs | Accelerate replacement; implement compensating controls |
| End of Support | 6 months before EoL | No vendor support; no security patches | High — unpatched vulnerabilities accumulate | Replacement must be in progress; risk acceptance required |
| Post-EoL Operation | After EoL date | Operating beyond supported lifecycle | Critical — no security patches; increasing exposure | Immediate replacement required; CISO risk acceptance mandatory |