Key Responsibilities:
Design & Develop Data Pipelines: Build and optimize scalable, reliable, and automated ETL/ELT pipelines using AWS services (e.g., AWS Glue, AWS Lambda, Redshift, S3) and Databricks.o Cloud Data Architecture: Design, implement, and support in maintaining data infrastructure in AWS, ensuring high availability, security, and scalability. Work with lake houses, data lakes, data warehouses, and distributed computing.o DBT Core Implementation: Lead the implementation of DBT Core to automate data transformations, develop reusable models, and maintain efficient ELT processes.o Data Modelling: Build efficient data models to support required analytics/reporting.o Optimize Data Workflows: Monitor, troubleshoot, and optimize data pipelines for performance and cost-efficiency in cloud environments. Utilize Databricks for processing large-scale data sets and streamlining data workflows.o Data Quality & Monitoring: Ensure high-quality data by implementing data validation and monitoring systems. Troubleshoot data issues and create solutions to ensure data reliability.o Automation & CI/CD: Implement CI/CD practices for data pipeline deployment and maintain automation for monitoring and scaling data infrastructure in AWS and Databricks.o Documentation & Best Practices: Maintain comprehensive documentation for data pipelines, architectures, and best practices in AWS, Databricks, and DBT Core. Ensure knowledge sharing across teams.
Skills & Qualifications:
Required:
Bachelor's / Master's degree in computer science, Engineering or a related field.o 8+ years of experience as a Data Engineer or in a similar role.o Extensive hands-on experience with AWS services (S3, Redshift, Glue, Lambda, Kinesis, etc.) for building scalable and reliable data solutions.o Advanced expertise in Databricks, including the creation and optimization of data pipelines, notebooks, and integration with other AWS services.o Strong experience with DBT Core for data transformation and modelling, including writing, testing, and maintaining DBT models.o Proficiency in SQL and experience with designing and optimizing complex queries for large datasets.o Strong programming skills in Python/PySpark, with the ability to develop custom data processing logic and automate tasks.o Experience with Data Warehousing and knowledge of concepts related to OLAP and OLTP systems.o Expertise in building and managing ETL/ELT pipelines, automating data workflows, and performing data validation.o Familiarity with CI/CD concepts, version control (e.g., Git), and deployment automation.o Having worked under Agile project environment
Preferred:
Experience with Apache Spark and distributed data processing in Databricks.o Familiarity with streaming data solutions (e.g., AWS Kinesis, Apache Kafka).o Knowledge of Data Governance, data security, and privacy best practices.
Soft Skills:
Excellent communication skills, with the ability to explain complex technical concepts to non-technical stakeholders.o Strong analytical and problem-solving skills, capable of troubleshooting complex data pipeline issues.o Ability to work independently and manage multiple projects and priorities in a fast-paced environment.
Job Type: Full-time
Pay: R250,00 - R550,00 per hour
Expected hours: 8 per week
Work Location: In person
MNCJobs.co.za will not be responsible for any payment made to a third-party. All Terms of Use are applicable.