The Big Data Engineer must be able to design, build and maintain enterprise level data pipelines utilizing the public cloud (AWS or Azure) based data ingestion services or ETL/ELT tools available within the big data ecosystem. The engineer will work closely with data analysts, data scientists, and database and systems administrators to create data & analytical solutions.
Responsibilities include the following:
- Should have experience with design, build, deployment, of ETL/ELT data pipelines within big data ecosystems (currently using Streamsets, Impala, Hue, Spark, Python, and Cloudera) to populate cloud-based infrastructure (currently using AWS).
- Should have experience developing technical design, and ETL specification documents
- Perform complex parallel loads, cluster batch execution and dependency creation using jobs/topologies/workflows.
- Convert existing SQL stored procedures for optimal execution in Streamsets.
- Strong exposure working with relation databases DB2, Oracle & SQL Server including complex SQL constructs and DDL generation.
- Work with web service targets, XML/JSON Sources and Restful APIs.
- Manage, monitor and fine tune Hadoop cluster architecture (HDFS) jobs for performance, security and resource management.
- Create detailed designs and POCs to enable new workloads and technical capabilities on the Platform. Work with the platform and infrastructure engineers to implement these capabilities in production.
- Coordinate with Enterprise Analytics to manage workloads and enable workload optimization including resource allocation and scheduling across multiple tenants to