Data Engineer – IAM Data Lake (Google Cloud Platform)

Back to Jobs

Job Description

STRATEGIC STAFFING SOLUTIONS HAS AN OPENING!

Job Title: Data Engineer – IAM Data Lake (Google Cloud Platform)

Location: Irving, TX (Preferred) | Ohio (Alternate)
Work Type: Onsite/Hybrid (as required)
Contract Term: 12+ Months

Position Overview

We are seeking an experienced Data Engineer to support IAM Data Lake and Data Engineering initiatives, focused on building and enhancing a modern enterprise data lake on the Google Cloud Platform (GCP). This role requires hands-on expertise in designing scalable ingestion pipelines, managing big data architectures, and working with both batch and streaming processing frameworks.

The ideal candidate will have strong experience with GCP-native services, big data tooling, and modern data formats, with additional preference for candidates with exposure to Hadoop/HDFS ecosystems.

Key Responsibilities

Design, build, and maintain a scalable IAM-focused Data Lake on Google Cloud Platform.
Develop and optimize data pipelines for both batch and real-time ingestion.
Implement ingestion frameworks leveraging Airflow, PySpark, and GCP-native tools.
Build streaming architectures using Pub/Sub, supporting event-driven ingestion patterns.
Apply best practices for schema design, schema evolution, and versioning across data domains.
Work with columnar data formats such as Parquet, Avro, and ORC, including compression and performance tuning.
Support incremental ingestion and Change Data Capture (CDC) methodologies.
Ensure data is properly structured and exposed through curated datasets, APIs, and analytical views.
Collaborate with cross-functional teams to ensure data governance, security, and access controls align with IAM requirements.
Contribute to enterprise development standards through CI/CD automation and documentation.

Required Skills & Experience

4–6+ years of Data Engineering experience in large-scale environments.
Strong hands-on experience building Data Lakes on Google Cloud Platform (GCP).
Expertise in developing pipelines using:

Apache Airflow
PySpark
Big data processing frameworks

Solid knowledge of the Hadoop ecosystem, with HDFS experience highly desirable.
Experience with:

APIs and data exposure patterns
CI/CD pipelines in enterprise environments
Data modeling concepts

Strong understanding of GCP architecture, including:

Bucket structuring and naming standards
Lifecycle management policies
Access control and security mechanisms

Preferred Technical Expertise

Streaming ingestion and event-driven design using Google Pub/Sub
Schema registry and governance practices
Knowledge of CDC tools/patterns
Familiarity with curated analytical dataset development
Experience supporting Identity & Access Management (IAM) data domains is a plus

Tools & Technologies

Google Cloud Platform (GCP)
Airflow
PySpark
Hadoop / HDFS
Pub/Sub
Parquet, Avro, ORC
CI/CD Automation
APIs and Data Services

Ideal Candidate Traits

Strong problem-solving and analytical mindset
Comfortable working in complex enterprise environments
Able to collaborate effectively across engineering, security, and governance teams
Detail-oriented with a focus on data quality and scalability

Job Information

Rate / Salary

$0.00 - $0.00

Sector

IT/Software/Technology