Data Architecture & Design: Design and implement scalable and reliable data pipelines that collect, transform, and load large volumes of data from various sources.
Data Processing: Develop, optimize, and maintain big data processing jobs using technologies like Hadoop, Spark, Hive, and others.
Data Management: Ensure data is clean, organized, and accessible for analysis, including managing data quality, integrity, and governance.
Performance Optimization: Identify and resolve performance bottlenecks in big data environments, ensuring efficient data processing and storage.
Collaboration: Work closely with data scientists, analysts, and other engineers to understand data needs and deliver solutions that meet business requirements.
Innovation: Stay up to date with the latest big data technologies and trends, and evaluate their applicability to current and future projects.
Automation: Automate data processing tasks to improve efficiency and reduce manual intervention.
Technical Skills Required:
Programming Languages: Proficiency in Java, Python, or Scala.
Big Data Technologies: Experience with Hadoop, Apache Spark, Hive, Pig, HBase, or similar tools.
Data Storage Systems: Familiarity with SQL and NoSQL databases such as HDFS, Cassandra, MongoDB, etc.
Data Warehousing: Experience with data warehousing solutions like Amazon Redshift, Google BigQuery, or Snowflake.
Cloud Platforms: Experience with cloud-based big data solutions on AWS, Azure, or Google Cloud Platform.
ETL Processes: Knowledge of ETL processes and tools like Apache NiFi, Talend, or Informatica.
Version Control: Proficiency in version control systems like Git.
DevOps: Familiarity with CI/CD pipelines and containerization (Docker, Kubernetes).