Contract
Posted on 04 June 26 by Jacobi Smith
Powered by Tracker
Location: Irving, TX (Preferred) or Ohio (Hybrid)
Duration: 12 Month Contract
W2 ONLY, NO C2C
We are seeking a skilled Data Engineer to support the design, development, and enhancement of an enterprise IAM Data Lake platform within Google Cloud Platform (GCP).
This role will focus on building scalable data lake solutions, developing data ingestion pipelines, and supporting large-scale data processing initiatives using modern cloud and big data technologies. The ideal candidate will have hands-on experience with Google Cloud Platform, data lake architectures, big data processing frameworks, and Hadoop-based environments.
Experience with Hadoop/HDFS and cloud-native data engineering solutions is highly desirable.
Design, build, and maintain scalable Data Lake solutions within Google Cloud Platform (GCP).
Develop and support batch and streaming data ingestion pipelines using GCP-native services and big data technologies.
Build and optimize data processing workflows to support enterprise-scale analytics and reporting requirements.
Design and implement data models, ingestion frameworks, and data transformation processes.
Develop and maintain PySpark-based data processing applications.
Utilize Apache Airflow to orchestrate and manage complex data workflows.
Implement and maintain CI/CD pipelines to support automated deployment and delivery of data engineering solutions.
Design and manage Pub/Sub-based streaming architectures and event-driven data processing workflows.
Support event schema design, schema evolution, and versioning best practices.
Implement incremental data ingestion strategies and Change Data Capture (CDC) patterns.
Develop APIs and integration solutions to support data consumption and data-sharing requirements.
Create and maintain curated datasets, analytical views, and data exposure layers for downstream consumers.
Collaborate with architecture, engineering, security, and business teams to ensure data solutions align with enterprise standards.
4+ years of experience working with Google Cloud Platform (GCP).
4+ years of experience building and supporting large-scale data processing solutions.
4+ years of experience with PySpark and distributed data processing.
4+ years of experience implementing CI/CD practices and deployment automation.
2+ years of experience building and maintaining data pipelines.
2+ years of experience with Apache Airflow.
Experience developing and integrating APIs.
Experience working with data lake architectures and cloud-based storage solutions.
Understanding of data modeling concepts and best practices.
Strong understanding of data processing frameworks and big data technologies.
Experience with Hadoop Ecosystem technologies and HDFS.
Experience designing and implementing streaming architectures using Google Pub/Sub.
Familiarity with Change Data Capture (CDC) methodologies and incremental ingestion frameworks.
Experience building enterprise-scale IAM or security-related data platforms.
Knowledge of data governance, lifecycle management, and access control best practices within GCP.
Experience supporting analytical data platforms and data consumption frameworks.
GCP skills are optional and preference will be given to .NET, specifically the following:
Databases & Data Engineering
Programming & Development
Google Cloud Platform (GCP)
Google Cloud Storage (GCS)
Pub/Sub
Data Lake Architecture
PySpark
Apache Airflow
Data Pipelines
Data Processing
Data Modeling
Change Data Capture (CDC)
Hadoop Ecosystem
HDFS
Parquet
Avro
ORC
APIs
CI/CD Pipelines
Version Control
Automation Frameworks