Ensure uptime, stability, and incident resolution of our current production SIEM/SOC/IR/Observability systems
This includes sysadmin functions like resource and capacity management
Our customer SIEM tenants are backed by an SLA with punitive measures, ensuring the uptime SLA is met is critical to our business
Use your AWS & Linux experience to improve architecture
Manage the lifecycle of SIEM infrastructure and applications, including provisioning, scaling, patching, and upgrading.
Provide L1 and L2 support relating to SIEM Infrastructure and Applications.
Perform after-hours break fix support for the platform should a production instance be affected
Ensure that these occurrences are an absolute fringe-case
Maintain our current IaC and deployment code
Develop and implement new CICD pipelines to:
Scan all containers for vulnerabilities
Securely and reliably deploy code with no interruption
Improve automation across the board to make sure you are spending less time fixing broken things or deploying, but rather building new features/rules/enhancements for our customers security posture
Document, plan and migrate our current SIEM & SOC applications to EKS or similar
Document, plan and implement new SIEM/SOC tools (or improve existing ones) and architect new automations
Own the current DR plan, review and ensure we can commit to the SLA uptime we have promised our customers, identifying risks or proposing new architecture to suit the environment
Apply your industry knowledge to improve our service offering and mentor junior resources
Increase/improve the security controls of our platform
Skills & Experience
Proven experience running and supporting production SIEM/SOC/IR/observability platforms with strict uptime/SLA requirements
Strong Linux systems administration: performance/capacity management, patching/upgrades, troubleshooting, incident response
Strong AWS experience with a track record of improving architecture for reliability, scale, and cost efficiency
Infrastructure lifecycle management: provisioning, scaling, hardening, maintenance, and upgrade planning
Hands-on L1/L2 operational support capability, including after-hours break-fix for production incidents
IaC and deployment automation experience (e.g., Terraform/CloudFormation) and maintaining existing codebases
CI/CD engineering to enable secure, reliable, low/no-downtime deployments
Container and platform security practices: vulnerability scanning, remediation workflows, and security control improvements
Kubernetes/EKS experience (or proven migration planning) including documentation, planning, and execution
DR ownership: DR planning/testing, risk identification, and ensuring SLA commitments are achievable
Ability to mentor junior engineers and drive continuous improvement through automation and best practice
Qualifications
Degree or Diploma in Computer Science, Information Systems, Engineering, or a related technical field (or equivalent practical experience)