- Effective at building partnerships with business stakeholders, engineers and product to understand use cases from intended data consumers
- Able to create & maintain documentation to support users in understanding how to use tables/columns
Data Architecture & Data Pipeline Implementation
- Experience creating and evolving dimensional data models & schema designs to structure data for business-relevant analytics.
- Strong experience using ETL framework (ex: Airflow) to build and deploy production-quality ETL pipelines.
- Experience ingesting and transforming structured and unstructured data from internal and third-party sources into dimensional models.
- Experience with dispersal of data to OLTP (ex: MySQL, Cassandra, HBase, etc) and fast analytics solutions.
Data Systems Design
- Strong understanding of distributed storage and compute (S3, Hive, Spark)
- Knowledge in distributed system design, such as how map-reduce and distributed data processing work at scale
- Basic understanding of OLTP systems like Cassandra, HBase, Mussel, Vitess etc.
Coding
- Experience building batch data pipelines in Spark
- Expertise in SQL
- General Software Engineering (e.g. proficiency coding in Python, Java, Scala)
- Experience writing data quality unit and functional tests.
- Proficiency in Salesforce and understanding of its data structure. (Optional)
- Knowledge on Salesforce Bulk Operators. (Optional)