The Hadoop administrator is responsible for the care, maintenance, administration, and reliability of the Hadoop ecosystem. The role includes ensuring system security, stability, reliability, capacity planning, recoverability (protecting business data) and performance. In addition to providing new system and data management solution delivery to meet the growing and evolving data demands of the enterprise. Hadoop administrator using Cloudera, administers Cloudera technology and systems responsible for backup, recovery, architecture, performance tuning, security, auditing, metadata management, optimization, statistics, capacity planning, connectivity, and other data solutions of Hadoop systems.
Tasks and Responsibilities
Hadoop administrator provides support and maintenance and its eco-systems including HDFS, Yarn, Hive, LLAP, Druid, Impala, Spark, Kafka, HBase, Cloudera Work Bench, etc.
Accountable for storage, performance tuning and volume management of Hadoop clusters and MapReduce routines
Deploys Hadoop cluster, add and remove nodes, keep track of jobs, monitor critical parts of the cluster, configure name-node high availability, schedule and configure it and take backups.
Installs and configures software, installs patches, and upgrades software as needed.
Capacity planning and implementation of new/upgraded hardware and software releases for storage infrastructure.
Involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration
Communicates with other development, administrating and business teams. They include infrastructure, application, network, database, and business intelligence teams.
Responsible for Data Lake and Data Warehousing design and development.
Collaboration with various technical/non-technical resources such as infrastructure and application teams regarding project work, POCs (Proofs of Concept) and/or troubleshooting exercises.
Configuring Hadoop security, specifically Kerberos integration with ability to implement.
Creation and maintenance of job and task scheduling and administration of jobs.
Responsible for data movement in and out of Hadoop clusters and data ingestion using Sqoop and/or Flume
Review Hadoop environments and determine compliance with industry best practices and regulatory requirements.
Data modelling, designing and implementation of data based on recognized standards.
Working as a key person for Vendor escalation
On-call rotation is required to support 24/7 environment and is also expected to be able to work outside business hours to support corporate needs.