Role: Big Data Developer
Location: Tyson, VA
Duration: Long Term
Responsibilities
- Design and Develop Data Ingestion and Processing Code using Python/Pyspark/ Language R/Hive on the Cloudera CDH Platform.
- Create and update Design Specs and reference Architecture documents to enable acceleration in solution development.
- Cloudera Data Platform Innovating new ideas, researching related technology, developing new concepts, prototyping and delivering implementations
- Participate in testing and peer code reviews to identify any bugs and ensure reusability of code.
- Automate the deployment of the solutions by using Shell scripts/Python/Oozie.
- Work with internal subject matter experts to define requirements for new demo environments
- Collaborating with the Apache community on Hadoop and other related open source projects
- Work with IT change Management group to promote the developed code/scripts from non-production to production environments.
- Work with Architecture team (Application/Security/Infrastructure/Data) to get their approval on the designed solutions.
Tools and Technology Experience that candidates should have..
- Programming Languages - Python, PySpark, Java, SQL, Shell Scripting, Sqoop
- Big Data Tools - Spark, HDFS, Kafka, Hive, HBase
- Databases - MySQL, PostgreSQL , SQL Server, SnowFlake
- Cloudera Technologies - Cloudera Data Platform & Cloudera Manager
- Cloud Technologies - Amazon Web Services, AWS Big Data Platform.
- OpenStack Other Software & Tools - Tableau, SAS, Docker, Kubernetes, GitHub
- Unix/Linux expertise