- Proficiency in developing, maintaining, monitoring, and the long-term operations of data pipelines or processing systems running in Cloudera Data Platform. Understanding of data extraction, transformation, loading, and performance tuning of solutions utilizing multiple streams of input data-code-based, git and DevOps-enabled technologies using Python or SQL such as PySpark, pandas, or dbt
- 5+ years of experience in application/data development (i.e., Python)
- 5+ years of experience with data integration/ingestion tools (i.e., Apache NiFi)
- Advanced level knowledge of SQL, Java, Microsoft SQL Server, distributed data/computing platforms (i.e., Apache NiFi, Hadoop, MapReduce, Hive, HBase, Kafka, Spark)
- Experience with Scrum and Kanban methodologies
- Experience with UNIX/Linux including basic commands and shell scripting
- Experience in implementing and maintaining continuous integration continuous delivery (CI/CD) pipelines and data platform management
This position is onsite in Washington, DC. This is open to US Citizens only.