Incident And Problem Manager

GP, ZA, South Africa

Job Description

PURPOSE OF THE ROLE



This role is within the Infrastructure Management Department but is responsible for incidents across the entire business. This role is responsible for incident and problem management of the operational sites of Teraco.



OBJECTIVES



MAIN FUNCTIONS OF THE JOB



Problem Management:

Analysing incidents to identify recurring patterns Conduct root cause analysis to understand the underlying causes of problems. Developing and implementing corrective actions to address root causes and eliminate future incidents. Working with relevant teams to implement solutions and updates to prevent similar problems. Ensure response teams are coordinated and effective in investigating and resolving major complex problems. (Responsible team will assume incident management responsibility for a given event) Collaborate with subject matter experts to resolve complex problems & track problem lifecycle from identification to resolution. Track tickets for all corrective actions and validate that the corrective actions are implemented as required. Maintain a problem knowledge base and documentation to share learnings across the organization to facilitate quicker resolution of similar incidents in the future Manage problem resolution bridges, provide timely and clear updates to stakeholders, and document critical action items to drive resolutions. Own and lead a structured Root Cause Analysis (RCA) process to resolve major incidents and problems. Facilitate root cause and corrective action plan meetings, after the implementation of the correction. Ensure the responsible managers, documenting incident details and post-incident analysis to learn from events, and that incident reports reflect all root causes, corrections and corrective actions. Drive teams to document and submit incident reports within OLA and SLA Signatory on all incident reports across the business. In collaboration with the Client Experience Manager, identify improved reporting formats and templates. Drive consistency across Teraco's operational organisation. Review incident response plans and procedures and identify improvement opportunities using data and metrics

Incident and Problem Management Framework:

Implement a clear and concise Incident and Problem Management framework to ensure incidents are handled in line with established policies and procedures, and to increase efficiency of incident response Establish various root cause analysis techniques to identify the root causes and coach leadership in effective root cause analysis where required to drive a culture of effective root cause analysis. Ensure communication plans are in place and ready for activation during major incidents Create communication and escalation framework to ensure stakeholders are kept up to date about the incident status and impact. DCO staff will assume incident management responsibility for a given incident & Facilitate communication during incidents to ensure coordinated response. Collaborate with the Client Experience Manager on client impacting incidents, to ensure client's interests are central to Teraco's response to incidents, and that there is effective communication with clients.


SKILLS REQUIREMENT

Strong root cause analysis (RCA) methodology (e.g., 5 Whys, Fishbone diagram, Fault Tree Analysis) Data analysis and pattern recognition for incident trend identification Excellent written and verbal communication, especially in high-pressure situations Experience drafting, reviewing, and communicating incident reports Ability to facilitate and document cross-functional meetings and corrective action plans Ability to design and implement incident/problem management frameworks Continuous improvement mindset to enhance processes and reporting Leading RCA and post-incident review meetings Driving accountability in corrective action implementation


QUALIFICATIONS AND EXPERIENCE



Bachelor's degree in a relevant field (e.g., IT, Engineering, Business Management, or similar) preferred, or equivalent experience Certifications (highly beneficial): ITIL v3/v4 Foundation or Intermediate Level RCA/Problem Solving training (e.g., Kepner-Tregoe, Six Sigma Yellow/Green Belt) ISO standards familiarity (especially ISO 27001, 50001 or ISO 9001) Experience Requirements 5+ years in incident and/or problem management roles, ideally within data center, IT infrastructure, telecoms, or similar high-availability environments Experience in managing major incidents and leading post-mortems Proven track record of implementing effective corrective and preventive action plans Familiarity with operational workflows in critical facilities (e.g., infrastructure systems, networks) Experience collaborating with client-facing and technical teams Background in managing communication during major service disruptions * Experience working within ITIL or other service management frameworks

Beware of fraud agents! do not pay money to get a job

MNCJobs.co.za will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD1417422
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Contract
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    GP, ZA, South Africa
  • Education
    Not mentioned