Bachelor’s degree preferably with Computer Science background.
Lead team of 4 members and more
At least8+ years of experience implementing complex ETL pipelines preferably with Spark toolset.
At least 8+ years of experience with Python particularly within the data space
Technical expertise regarding data models, database design development, data mining and segmentation techniques
Good experience writing complex SQL and ETL processes
Excellent coding and design skills, particularly either in Scala or Python.
Strong practical working experience with Unix scripting in at least one of Python, Perl, Shell (either bash or zsh).
Experience in AWS technologies such as EC2, Redshift, Cloud formation, EMR, AWS S3, AWS Analytics required.
Experience designing and implementing data pipelines in a onprem/cloud environment is required.
Experience building/implementing data pipelines using Databricks/On prem or similar cloud database.
Expert level knowledge of using SQL to write complex, highly optimized queries across large volumes of data.
Hands-on object-oriented programming experience using Python is required.
Professional work experience building real-time data streams using Spark and Experience in Spark.
Knowledge or experience in architectural best practices in building data lakes
Develop and work with APIs
Develop and maintain scalable data pipelines and build out new API integrations to support continuing increases in data volume and complexity.
Collaborate with analytics and business teams to improve data models that feed business intelligence tools, increase data accessibility, and foster data-driven decision making across the organization.
Implement processes and systems to monitor data quality, to ensure production data accuracy, and ensure key stakeholder and business process access.
Write unit/integration tests, contribute to engineering wiki, and documents.
Perform data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
Experience developing data integrations and data quality framework based on established requirements.
Experience with CI/CD processes and tools (e.g., concourse, Jenkins).
Experience with test driven development writing unit tests, test coverage using PyTest, PyUnit, pytest-cov libraries.
Experience working in an Agile environment.
Good understanding & usage of algorithms and data structures
Good Experience building reusable frameworks.
Experience working in an Agile Team environment.
AWS certification is preferable: AWS Developer/Architect/DevOps/Big Data
Excellent communication skills both verbal and written