We are seeking a detail-oriented and analytical Reliability Engineer to join our team in Johannesburg, South Africa. In this role, you will be responsible to create a bridge between development and operations by applying a software engineering mindset to system administration. To focus on operations/on-call duties and developing systems and software that help increase site reliability and performance. To build self-service tools for users that rely on such services; to collaborate with product developers to ensure that the designed solution responds to non-functional requirements such as availability, performance, security, and maintainability.
Automate CI/CD pipeline for both legacy architecture and containerised platforms using infrastructure as code and software development skills so as to increase the speed and quality of software delivery.
Define and implement mechanisms to monitor service-level indicators for the underlying service by setting units of measurement that define the service level that customers can expect of the system, defining the desired outputs of the system in terms of availability, and communicating the expected reliability of the service to customers in order to facilitate the speed at which business can release new features and services.
Design and implement monitoring solutions in order to identify performance errors and maintain service availability.
Develop software to automate manual processes to expedite problem detection and mitigation.
Drive the improvement of service performance metrics such latency, page load speed and ETL by proactively identifying performance issues across the system so that customers are enabled to make full use of the system.
Provide insights into the design and implementation of services with a focus on security, resiliency, scale, and performance by having a rich understanding of the end-to-end configuration, technical dependencies, and overall behavioural characteristics of the production service/s.
Collaborate with cross-functional teams to implement reliability improvements and best practices
Qualifications
Bachelor's or Master's Degrees in Computer Engineering, Software engineering
Site reliability engineer certification
Experience:
5-7 years of experience in reliability engineering or related field
Ability to use structured and OOP programming in at least one high-level language like JavaScript, Ruby, Python, Java, or C++
Coding experience exceeding simple scripts
A proactive approach to troubleshooting bottlenecks, problems, and areas of improvement
Knowledge of Cloud Computing
Data analytics skills
Computer science skills
Linux and Unix, Docker and Kubernetes
Incident Management
DevOps
Strong knowledge of reliability-centered maintenance principles and practices
Excellent analytical and problem-solving skills
Strong attention to detail and ability to manage multiple projects efficiently
Additional Information
Behavioural Competencies:
Adopting Practical Approaches
Articulating Information
Checking Things
Developing Expertise
Documenting Facts
Embracing Change
Examining Information
Interpreting Data
Managing Tasks
Producing Output
Taking Action
Team Working
Technical Competencies:
Application Knowledge for Support
Business Continuity and Disaster Recovery Planning
Information Technology Architecture
Infrastructure and Platforms Support
IT Design Driven Development
Service Management Processes
Use of Build and Test Automation
Use of Version Control
Please note:
All our recruitment processes comply with the applicable local laws and regulations. We will never ask for money or any from of payment as part of our recruitment process. If you experience this, please contact our Fraud line on +27 800222050 or TransactionFraudOpsSA@standardbank.co.za
Beware of fraud agents! do not pay money to get a job
MNCJobs.co.za will not be responsible for any payment made to a third-party. All Terms of Use are applicable.