Senior Compute Systems Engineer

Cape Town, Western Cape, South Africa

Job Description

JOB DESCRIPTION

  • Contribute to the global design and implementation of scalable and fault tolerant infrastructure systems that support engineering and operational needs.
  • Contribute to the deployment, configuration, and maintenance of distributed storage and database systems
  • Analyse system failures, performance issues, and misconfigurations across hardware, software, and network layers.
  • Lead and mentor the computer systems engineers and contribute to strategic technical planning.
JOB REQUIREMENTS
Qualification:
  • BTech in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 13 years experience
  • BENG/MTech in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 9 years experience
  • MENG in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 7 years experience
  • PHD in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 5 year
Experience:
  • 3+ years in a technical leadership or software/system architectural role with direct responsibility for large-/platform-scale distributed systems.
  • Demonstrated hands-on experience in infrastructure design and automation, distributed systems, observability, CI/CD, container orchestration (e.g. Kubernetes), DevOps/SRE practices and cloud-native technologies.
  • Experience leading teams or initiatives that intersect with data platforms, storage, networking, and systems engineering domains
Knowledge:
  • In-depth understanding of systems engineering principles, including performance optimisation, fault tolerance, and resource scheduling in Linux-based environments.
  • Strong knowledge of containerised environments (Docker, Podman), orchestration platforms (Kubernetes, Helm), and runtime architectures (containerd, CRI).
  • Expertise in infrastructure-as-code, continuous integration/deployment (CI/CD), and configuration management tools (e.g., GitLab CI, Ansible, Terraform, ArgoCD).
  • Advanced understanding of distributed computing and storage architectures, including Ceph, S3, NFS, and local/clustered file systems.
  • Operational and architectural fluency in relational and NoSQL database systems (e.g., PostgreSQL, MySQL, MongoDB), including replication, backups, and performance tuning.
  • Working knowledge of networking fundamentals, security protocols, and systems-level observability (e.g., Prometheus, Grafana, ELK/EFK stack).
  • Familiarity with the HPC ecosystem (e.g., SLURM, job schedulers) is beneficial for environments supporting scientific or research computing
Competency Essential:
  • Demonstrated technical leadership (3+ years), leading cross-functional efforts across systems, storage, and database infrastructure, driving technical decisions from architecture through implementation.
  • Systems engineering expertise, with a focus on Linux administration, infrastructure automation, service orchestration, and performance optimisation across diverse environments.
  • Expertise in distributed systems architecture, including the design and deployment of scalable, resilient services using microservices, event-driven, and cloud-native design patterns.
  • Containerisation and orchestration fluency, including production-grade usage of Kubernetes, Docker, and Helm for system and application-level deployments.
  • Infrastructure automation and CI/CD, using tools such as GitLab CI, ArgoCD, FluxCD, Jenkins, or GitHub Actions to streamline and secure platform operations.
  • Complementary DevOps and SRE practices, blending infrastructure-as-code, configuration management, and release automation (DevOps) with incident response, monitoring, SLIs/SLOs, and system reliability engineering (SRE).
  • Linux expertise, including advanced troubleshooting, kernel tuning, system orchestration, and optimisation at scale.
  • Technical delivery and planning capabilities, including backlog scoping, cross-team collaboration, and Agile sprint execution.
  • Database administration skills, with operational experience in administering relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB), including high availability, backups, replication, and performance tuning.
  • Diagnostic skills, with a root-cause-first approach, and a strong bias for ownership, accountability, and long-term operational stability.

Skills Required

Beware of fraud agents! do not pay money to get a job

MNCJobs.co.za will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD1542968
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Cape Town, Western Cape, South Africa
  • Education
    Not mentioned