Contract
Posted on 18 June 25 by Reginald Dykes
Powered by Tracker
Job Title: Big Data ETL Engineer (Apache Spark / PySpark / Python)
Location: Westlake, TX
Job Type: [Full-Time / Contract]
We are seeking a highly skilled Big Data ETL Engineer with a strong background in building high-performance ETL/data pipelines using Apache Spark, PySpark, and Python. The ideal candidate will bring hands-on experience with big data technologies such as Hadoop, HDFS, Hive, Iceberg, or Kafka, as well as a solid understanding of Unix/Linux scripting and modern CI/CD tools.
Design, build, and optimize scalable ETL pipelines for processing large datasets using Apache Spark and PySpark.
Collaborate with data engineers, architects, and business stakeholders to define data requirements and ensure data quality.
Develop scripts and automation solutions using Python and Unix/Shell scripting.
Leverage big data frameworks such as Hadoop, HDFS, Hive, Iceberg, or Kafka to support data processing needs.
Integrate data pipelines with CI/CD frameworks using tools such as GitHub, Jenkins, SonarQube, Liquibase, UDeploy, Artifactory, Harness, Maven, or Gradle.
Ensure performance, reliability, and scalability of ETL jobs in production environments.
Maintain robust documentation and adhere to secure coding practices and data governance standards.
4–6 years of experience with Apache Spark
6+ years of experience with PySpark
4–6 years of hands-on development in Python
Experience with at least one Big Data technology: Hadoop, HDFS, Hive, Iceberg, or Kafka
Strong scripting experience using Unix/Linux Shell
Experience working with CI/CD tools such as:
GitHub, Jenkins, Artifactory, Sonar, UDeploy, Liquibase, Harness, Maven/Gradle
Proven ability to write optimized, efficient, and maintainable ETL code
Strong troubleshooting, debugging, and problem-solving skills