As a Data Engineer for PreludeSys, you will:
- Data engineer (7 to 10yrs) with work experience in various big data technologies like Hadoop, Hive, Spark frameworks.
- Also, should have a strong experience in Python, or Scala, or Java.
- Build data source pipelines, ETL / ELT
- Very good understanding in data model for enabling the reporting layer
- Understand advanced concepts like delta lake databricks
You Are Great At:
- Identify the most appropriate data sources to use for a given requirement and analyze the structures and contents, in collaboration with subject matter experts.
- Strong knowledge of distributed systems, load balancing and networking, massive data storage, massively parallel processing and security.
- Experience creating data pipelines /ETL/ELT development and processing structured and unstructured data.
- Collect, clean, prepare and load the necessary data onto Big data ecosystem (Hadoop / Cloud based storage) for reporting purposes.
- Strong knowledge in data aggregations, deduplication and linking to identities, containerization, data streaming etc.,
MUST HAVE:
- IT, azure, Kubernetes
- Exploration, Cleaning, Normalizing, Feature Engineering and Scaling
- Experience in development using Python, R or Scala, Spark / SQL / Hive / Synapse / Databricks / kafka
- Proposals or Presales Support for all data and analytics initiatives.
- Experience in data level security and strong knowledge in governance aspects of data life cycle.
- Takes end to end responsibility for all stages in the data architecture development process of large or complex systems
- Implement data quality controls and fix data quality issues detected, liaise with the data supplier for joint root cause analysis.
- Very good understanding in Open Source tools like Apache frameworks - Hadoop, Yarn, Spark, Airflow,ELK, Neo4j,kafka streaming etc.,
- Design and execute the build pipeline using Jenkins/Gitlab etc. to the release activities.