Job Description
Data Engineer
Location: Irving, TX
Duration: 24 Month Contract
Pay: $55/hr W2 ONLY, NO C2C
Responsibilities:
- Design and develop ETL/ELT workflows and data pipelines for batch and real-time processing.
- Build and maintain data pipelines for reporting and downstream applications using open source frameworks and cloud technologies.
- Implement operational and analytical data stores leveraging Delta Lake and modern database concepts.
- Optimize data structures for performance and scalability across large datasets.
- Collaborate with architects and engineering teams to ensure alignment with target state architecture.
- Apply best practices for data governance, lineage tracking, and metadata management, including integration with Google Dataplex for centralized governance and data quality enforcement.
- Develop, schedule, and orchestrate complex workflows using Apache Airflow, with strong proficiency in designing and managing Airflow DAGs.
- Troubleshoot and resolve issues in data pipelines and ensure high availability and reliability.
Skills:
- Strong Understanding of Data: Data structures, modeling, and lifecycle management.
- ETL/ELT Expertise: Hands-on experience designing and managing data pipelines.
- PySpark: Advanced skills in distributed data processing and transformation.
- NetApp Iceberg: Experience implementing open table formats for analytics.
- Hadoop Ecosystem: Knowledge of HDFS, Hive, and related components.
- Cloud Platforms: GCP (BigQuery, Dataflow), Delta Lake, and Dataplex for governance and metadata management.
- Programming & Orchestration: Python, Spark, SQL.
- Workflow Orchestration: Strong experience with Apache Airflow, including authoring and maintaining DAGs for complex workflows.
- Database & Reporting Concepts: Strong understanding of relational and distributed systems.