Job Summary Key Responsibilities:
- Contribute to the global design and implementation of scalable and fault tolerant infrastructure systems that support engineering and operational needs.
- Contribute to the deployment, configuration, and maintenance of distributed storage and database systems
- Analyse system failures, performance issues, and misconfigurations across hardware, software, and network layers.
- Lead and mentor the computer systems engineers and contribute to strategic technical planning.
Qualification:
- BTech in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 13 years
Experience:
- BENG/MTech in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 9 years experience
- MENG in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 7 years experience.
- PHD in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 5 years
- 3+ years in a technical leadership or software/system architectural role with direct responsibility for large-/platform-scale distributed systems.
- Demonstrated hands-on experience in infrastructure design and automation, distributed systems, observability, CI/CD, container orchestration (e.g. Kubernetes), DevOps/SRE practices and cloud-native technologies.
- Experience leading teams or initiatives that intersect with data platforms, storage, networking, and systems engineering domains
Knowledge:
- In-depth understanding of systems engineering principles, including performance optimisation, fault tolerance, and resource scheduling in Linux-based environments.
- Strong knowledge of containerised environments (Docker, Podman), orchestration platforms (Kubernetes, Helm), and runtime architectures (containerd, CRI).
- Expertise in infrastructure-as-code, continuous integration/deployment (CI/CD), and configuration management tools (e.g., GitLab CI, Ansible, Terraform, ArgoCD).
- Advanced understanding of distributed computing and storage architectures, including Ceph, S3, NFS, and local/clustered file systems.
- Operational and architectural fluency in relational and NoSQL database systems (e.g., PostgreSQL, MySQL, MongoDB), including replication, backups, and performance tuning.
- Working knowledge of networking fundamentals, security protocols, and systems-level observability (e.g., Prometheus, Grafana, ELK/EFK stack).
- Familiarity with the HPC ecosystem (e.g., SLURM, job schedulers) is beneficial for environments supporting scientific or research computing.
Please call us on 0100300127
NB: Should you not hear from us within 6 weeks, please consider your application unsuccessful.
The Hiring House
Company
MNCJobs.co.za will not be responsible for any payment made to a third-party. All Terms of Use are applicable.