Reach Digital Health is transforming how public healthcare is delivered. Using innovative digital tools, we connect people, especially those who cannot easily access traditional care, to the information and support they need to live healthier lives. From maternal and child health to HIV/AIDS support and immunisation, our work helps close critical gaps in healthcare and ensures that underserved communities are not left behind.
With more than 16 years of experience, we know that technology alone is not enough. Real impact comes from combining our scalable, multi-channel technology with the partnerships, systems and expertise needed to drive meaningful change. By joining Reach, you will be part of a mission-driven team tackling some of the world's toughest health challenges, making healthcare more inclusive and helping save lives every day.
Why Work With Us
Our team is guided by our values:
grit, empathy, collaboration, simplicity, and curiosity
. By joining Reach, you will be part of a mission-driven team tackling some of the world's toughest health challenges, making healthcare more inclusive and helping save lives every day.
At Reach, you will do work that matters while enjoying the balance you deserve. We are proud to be one of the first South African companies to embrace a four-day work week, giving our team more time for life outside of work. Alongside competitive salaries, we invest in your growth through ongoing training and career development, creating opportunities to thrive in a supportive and innovative environment.
We put people at the centre of everything we do - both internally and in our work. We are creating an inclusive, diverse environment where everyone feels welcome, accepted, and supported. We are a progressive and equal-opportunity employer.
About the role
Join Reach as our Site Reliability Engineering Lead and play a central role in designing and maintaining the secure infrastructure that powers vital health services. You'll lead the SRE team, automate processes, and improve system reliability while ensuring adherence to data privacy regulations and security best practices, all while working on projects that directly impact communities in need. Your ideas and innovations will have real-world effects on healthcare access and outcomes.
The role requires advanced infrastructure engineering and security expertise with a passion for healthcare technology and data compliance.
Key Focus Areas
You will primarily be responsible for:
Team Management and Growth:
+ Foster the professional development of the SRE team through mentorship, one-on-one sessions, and skill-building opportunities. Collaboration:
+ Work closely with cross-functional teams, including development and operations, to implement best practices and foster a culture of collaboration and innovation. Infrastructure reliability and performance:
+ Monitoring, measuring, and improving the reliability and performance of our systems
+ Identify and address bottlenecks, optimize system performance, and implement strategies for scaling infrastructure to meet growing demands.
+ Maintenance, upgrades, and security updates Automation and tooling:
+ You will design and develop software and scripts that automate and streamline various aspects of infrastructure and operations
+ Assisting other teams with deployment and updates of their applications and services. Administration:
+ Administration of our infrastructure accounts and critical services, providing strategic oversight for our hosting infrastructure and vendor relationships. Owns the hosting and billing lifecycle, from monitoring and analysis to implementing cost-optimization strategies, ensuring financial efficiency and predictability across our platforms. Data Management and Security:
+ Lead Information Security Management System (ISMS) compliance initiatives including policy development, risk assessment processes, and security framework implementation, while managing security tools (antivirus, password management, security awareness training), ensuring data, security and infrastructure policies and best practices are adhered to, working with Legal and Projects teams to develop and enforce policies and procedures for data collection, storage, and access to ensure compliance with data privacy regulations, implementing and monitoring security measures to protect sensitive health information, and managing data backups and disaster recovery. Innovation:
+ You will research and evaluate new technologies and methodologies that can enhance our systems and processes, and implement proof-of-concepts and prototypes to demonstrate their feasibility and value.
Responsibilities and Duties
Lead a team of Site Reliability Engineers, providing mentorship, guidance, and technical expertise.
Establish and enforce SRE best practices to improve system reliability and operational efficiency.
Collaborate with development teams to design, implement, and maintain scalable and reliable infrastructure.
Develop and implement incident response plans, ensuring timely resolution of system outages and performance issues.
Conduct performance reviews, set goals, and facilitate professional development for team members.
Drive the implementation of automation tools, software and processes to improve infrastructure and operational efficiency of our systems and ensure they follow best practices.
Monitor system health, analyze trends, and implement proactive measures to prevent incidents.
Advise on and/or contribute to new or emerging technologies that might be relevant to Reach.
Work closely with the Head of Engineering and other Engineering Leads to ensure alignment within the engineering department.
Design and develop tools and software that automate and improve the infrastructure and operation of our systems and ensure they follow best practices.
Perform code reviews, testing and debugging and troubleshooting of the software and tools developed by the SRE team and assist other engineering teams with the same.
Design and implement security features, conduct security audits and risk assessments, manage enterprise security tools, and coordinate penetration testing exercises while serving as technical point of contact for external security audits.
Develop and enforce Information Security Management System (ISMS) compliance policies aligned with POPIA and ISO 27001, including risk treatment processes, data protection policies, and business continuity frameworks. Lead security awareness programs, manage security training and phishing campaigns, and collaborate with teams to ensure alignment between technical and regulatory requirements across organisational systems.
Suggest and implement improvements to current ways of working / processes (or gaps in the processes) that are relevant to the current and future success of the SRE team and Reach as a whole.
Qualifications
An honours degree in Computer Science or Engineering or equivalent experience.
8+ years of experience as a senior site reliability engineer, senior software engineer, or system administrator, working with large-scale, distributed, and cloud-based systems.
4+ years of experience as a team lead, manager, or mentor, leading and developing site reliability engineers or software engineers.
Skills and Experience Required
Proficient in one or more programming languages, such as Python, Go, Java, or C++.
Proficient in one or more scripting languages, such as Bash, Perl, or Ruby.
Proficient in one or more cloud platforms, such as AWS, Azure, or GCP.
Proficient in one or more UNIX-like operating systems.
Proficient in one or more configuration management and deployment tools, such as Ansible, Chef, Puppet, or Terraform.
Proficient in one or more monitoring and alerting tools, such as Prometheus, Grafana, Datadog, or Splunk.
Proficient in one or more container and orchestration tools, such as Docker, Kubernetes.
Proficient in one or more web servers and proxies, such as Apache, Nginx, or Envoy.
Proficient in one or more databases and data stores, such as MySQL, PostgreSQL, MongoDB, or Redis.
Proficient in one or more version control and collaboration tools, such as Git.
Knowledgeable in the concepts and principles of site reliability engineering, such as SLIs, SLOs, error budgets, incident management, postmortems, and blameless culture.
Knowledgeable in the concepts and principles of software engineering, such as design patterns, code quality, testing, debugging, and documentation.
Knowledgeable in the concepts and principles of performance engineering, such as profiling, benchmarking, load testing, and capacity planning.
Knowledgeable in the concepts and principles of distributed computing, such as concurrency, parallelism, synchronisation, and consensus.
Excellent communication and collaboration skills, and ability to work effectively in a cross-functional and remote team environment.
Excellent problem-solving and analytical skills, and ability to troubleshoot and resolve complex issues in a timely and efficient manner.
Excellent learning and innovation skills, and ability to research and evaluate new technologies and methodologies.
Experience implementing ISO 27001 or POPIA standards with expertise in security audits, policy development, and regulatory compliance.
Proficiency managing enterprise security tools (antivirus, password management, SIEM), penetration testing oversight, and incident response procedures.
Experience leading security awareness programs, developing security frameworks, and implementing organization-wide security policies and training initiatives.
How to Apply
Ready to make a difference in public health? We welcome applicants from all backgrounds and encourage candidates of all genders, races, ages, religions, sexual orientations, and abilities to apply. Reach Digital Health is an equal opportunity and affirmative action employer, committed to creating a diverse and inclusive workplace.
Submit your application today and join our mission-driven team to make a real impact in public health.
Beware of fraud agents! do not pay money to get a job
MNCJobs.co.za will not be responsible for any payment made to a third-party. All Terms of Use are applicable.