Are you an Infrastructure Specialist, Site Reliability Engineer or DevOps, who has strong experience in the Hadoop ecosystem? Do you know how to deploy, upgrade, create disaster plans, perform system and ecosystem tuning? What about infrastructure architecture, performance analysis, deployment automation, and intelligent monitoring? Do you have solid experience of Site Reliability Engineering and DevOps tooling, processes and best practices? Do you have experience using infrastructure as code, Cloud Formation, Terraform etc? Knowledge of CI using Maven, Nexus or Jenkins? How about setting up a Kafka cluster? 
This is much more than a Hadoop Administrator role, it is a true SRE job. At Pythian our team is focused on Hadoop service operations and open source, cloud-enabled infrastructure architecture. If you Love Your Data, enjoy solving complex technical problems and want to Love Your Career then this could be the job for you!
 
	- As a SRE on the Hadoop team you will; Deploy, operate, maintain, secure and administer solutions that contribute to the operational efficiency, availability, performance and visibility of Pythian customers infrastructure and Hadoop platform services, across multiple vendors (i.e. Cloudera, Hortonworks, MapR).
- Gather information and provide performance and root cause analytics and remediation planning for faults, errors, configuration warnings and bottlenecks within our customers infrastructure, applications and Hadoop ecosystems.
- Deliver well-constructed, explanatory technical documentation for architectures that we develop, and plan service integration, deployment automation and configuration management to business requirements within the infrastructure and Hadoop ecosystem.
- Understand distributed Java container applications, their tuning, monitoring and management; such as logging configuration, garbage collection and heap size tuning, JMX metric collection and general parameter-based Java tuning.
- Observe and provide feedback on the current state of the clients infrastructure, and identify opportunities to improve resiliency, reduce the occurrence of incidents and automate repetitive administrative and operational tasks.
- Contribute heavily to the development of deployment automation artifacts, such as images, recipes, playbooks, templates, configuration scripts and other open source tooling.
- Be conversant about cloud architecture, service integrations, and operational visibility on common cloud (AWS, Azure, Google) platforms. 
- Understanding of ecosystem deployment options and how to automate them via API calls is a huge asset.