Job Description
AI Data Engineer
Remote- Ecuador
Full Time
About Us:
We are a global team of engineers, architects, designers, researchers, operators and innovators who share a passion for achieving client goals. Our engineering services help
businesses thrive at the intersection of technology and people. From the latest AI implementations to legacy platform migrations and everything in between, our services
span the enterprise technology spectrum. Our world class experience transformation playbook elevates digital success and increases ROI with a relentless focus on the human
experience. Our customer base includes Fortune 500 companies around the globe. We’ve got the skills and insights and we’re also fun to work with. Our global team spans a diverse
cultural spectrum, with wide ranging interests, enabling us to bring personality and depth to every engagement.
Role Summary:
The Data Engineer is responsible for designing, modernizing, and maintaining scalable, secure, and performant data pipelines to support both enterprise data warehousing and
advanced AI/ML workflows. This role ensures that high-quality, well-structured, and reliable data is available throughout the lifecycle—from ingestion, transformation, and
warehousing to downstream ML model consumption and analytics. It also includes modernizing legacy SQL Server and SSIS-based ETL systems into cloud-native data lakes
and marts, enabling scalable reporting, self-service BI, and AI-driven decision-making.
Key Responsibilities:
• Design, build, and manage batch and real-time data pipelines that feed data warehouses, data lakes, and AI/ML systems.
• Modernize legacy data architecture—including SQL Server, SSIS, and custom C# ETLs—into scalable cloud-native solutions using tools like Apache Spark, Airflow, or
Azure Data Factory.
• Migrate from SQL Datamarts to a multi-tenant or modernized architecture (e.g., Snowflake, BigQuery, or Synapse).
• MongoDB or equivalent NoSQL databases as raw data lake layers and build downstream transformations into clean, structured layers.
Confidential and Proprietary
• Process and manage large-scale datasets , ensuring scalability and performance of transformation and storage layers.
• Design scalable data models and pipelines to support downstream web-based tools such as Adhoc Reporting and Continuum platforms.
• Implement ETL/ELT pipelines and workflow orchestration with robust data validation, schema enforcement, lineage, and audit logging.
• Enable secure client access to data marts without VPN tunneling—explore API-based or direct cloud-native access patterns.
• Partner with ML engineers and data scientists to deliver curated datasets and feature pipelines for AI/ML model training and inference.
• Optimize the storage and querying of structured and unstructured data (e.g., logs, text, images).
• Ensure compliance with data governance, quality, lineage, privacy, and security best practices.
Qualifications:
• Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
• 5+ years of experience in data engineering roles, with a focus on supporting AI/ML applications.
Required Skills:
• Proficiency in Python and SQL
• Experience with ETL/ELT tools and frameworks such as Apache Airflow, DBT, Spark, or Kafka.
• Hands-on experience with cloud data platforms (e.g., AWS Glue, GCP Dataflow, Azure Data Factory).
• Strong knowledge of data formats (e.g., Parquet, Avro, JSON) and analytical databases (e.g., Snowflake, BigQuery, Redshift).
• Familiarity with data versioning and feature stores (e.g., Feast, Tecton) is a plus.
• Excellent collaboration skills to work cross-functionally with ML engineers, data scientists, and DevOps teams.