Job Description
STRATEGIC STAFFING SOLUTIONS HAS AN OPENING!
Job Title: Data Engineer – IAM Data Lake (Google Cloud Platform)
Location: Irving, TX (Preferred) | Ohio (Alternate)
Work Type: Onsite/Hybrid (as required)
Contract Term: 12+ Months
Position Overview
We are seeking an experienced Data Engineer to support IAM Data Lake and Data Engineering initiatives, focused on building and enhancing a modern enterprise data lake on the Google Cloud Platform (GCP). This role requires hands-on expertise in designing scalable ingestion pipelines, managing big data architectures, and working with both batch and streaming processing frameworks.
The ideal candidate will have strong experience with GCP-native services, big data tooling, and modern data formats, with additional preference for candidates with exposure to Hadoop/HDFS ecosystems.
Key Responsibilities
- Design, build, and maintain a scalable IAM-focused Data Lake on Google Cloud Platform.
- Develop and optimize data pipelines for both batch and real-time ingestion.
- Implement ingestion frameworks leveraging Airflow, PySpark, and GCP-native tools.
- Build streaming architectures using Pub/Sub, supporting event-driven ingestion patterns.
- Apply best practices for schema design, schema evolution, and versioning across data domains.
- Work with columnar data formats such as Parquet, Avro, and ORC, including compression and performance tuning.
- Support incremental ingestion and Change Data Capture (CDC) methodologies.
- Ensure data is properly structured and exposed through curated datasets, APIs, and analytical views.
- Collaborate with cross-functional teams to ensure data governance, security, and access controls align with IAM requirements.
- Contribute to enterprise development standards through CI/CD automation and documentation.
Required Skills & Experience
- 4–6+ years of Data Engineering experience in large-scale environments.
- Strong hands-on experience building Data Lakes on Google Cloud Platform (GCP).
- Expertise in developing pipelines using:
- Apache Airflow
- PySpark
- Big data processing frameworks
- Solid knowledge of the Hadoop ecosystem, with HDFS experience highly desirable.
- Experience with:
- APIs and data exposure patterns
- CI/CD pipelines in enterprise environments
- Data modeling concepts
- Strong understanding of GCP architecture, including:
- Bucket structuring and naming standards
- Lifecycle management policies
- Access control and security mechanisms
Preferred Technical Expertise
- Streaming ingestion and event-driven design using Google Pub/Sub
- Schema registry and governance practices
- Knowledge of CDC tools/patterns
- Familiarity with curated analytical dataset development
- Experience supporting Identity & Access Management (IAM) data domains is a plus
Tools & Technologies
- Google Cloud Platform (GCP)
- Airflow
- PySpark
- Hadoop / HDFS
- Pub/Sub
- Parquet, Avro, ORC
- CI/CD Automation
- APIs and Data Services
Ideal Candidate Traits
- Strong problem-solving and analytical mindset
- Comfortable working in complex enterprise environments
- Able to collaborate effectively across engineering, security, and governance teams
- Detail-oriented with a focus on data quality and scalability