3+ years hands-on experience working with large, structured/unstructured datasets using partitioned cloud storage architecture using query engines such as Spark, Delta Lake.
3+ yearsExperience designing, developing, deploying and testing in Databricks.
3+ years of hands-on experience in Python/Pyspark/SparkSQL.
2+ years experience on Big data pipelines/DAG tools like Airflow, dbt is required.
2+ years of SQL experience, specifically to write complex, highly optimized queries across large volumes of data.
Experience in the AWS computing environment and storage services such as s3/glacier is required.
Experience with conceptual, logical and/or physical database designs is required.
Good knowledge in Linux and shell scripting is highly desired.
Past experience in healthcare data extraction, transformation and normalization is highly desired.
Experience with Data Visualization tools like Looker, Tableau is desired.
Strong communication skills to relay complex data integration requirements to team members.