Deploy, operate, maintain, secure and administer solutions that contribute to the operational efficiency, availability, performance and visibility of our customers infrastructure and Big Data platform and related services, such as core Hadoop and streaming services such as Kafka and RabbitMQ.
Gather information and provide performance and root cause analytics and remediation planning for faults, errors, configuration warnings and bottlenecks within our customers infrastructure, applications and Big Data ecosystems.
Deliver well-constructed, explanatory technical documentation for architectures that we develop, and plan service integration, deployment automation and configuration management to business requirements within the infrastructure and Big Data ecosystem.
Understand distributed Java container applications, their tuning, monitoring and management; such as logging configuration, garbage collection and heap size tuning, JMX metric collection and general parameter-based Java tuning.
Observe and provide feedback on the current state of the client s infrastructure, and identify opportunities to improve resiliency, reduce the occurrence of incidents and automate repetitive administrative and operational tasks.
Contribute heavily to the development of deployment automation artifacts, such as images, recipes, playbooks, templates, configuration scripts and other open source tooling.
Be conversant on cloud architecture, service integrations, and operational visibility on common cloud (Google, AWS, Azure) platforms. Understanding of ecosystem deployment options and how to automate them via API calls is a huge asset.
What do we need from you
This is a wish list If you are interested in this role but not sure if your skills and experience are exactly what we re looking for, please do apply, we d love to hear from you! There will be plenty of opportunities to learn new skills and contribute to the Team and Customers needs and grow with Pythian. While we realise you might not have everything to be the successful candidate for the Streaming Big Data Infrastructure Engineer job you will likely have skills such as;
Understand the end-to-end operations of complex Hadoop-based ecosystems and handle / configure core technologies such as HDFS, MapReduce, Spark, YARN, HBase, and ZooKeeper
Understand the dependencies and interactions between these core components, alternative configurations (i.e. MRv2 vs Spark, scheduling in YARN), availability characteristics and service recovery scenarios.
Identify workflow and job pipeline characteristics and tune the ecosystem to support high performance and scalability, from the infrastructure platform through to the application layers in the ecosystem.
Understand security tools and approaches available to configure different use cases based on clients needs, including being able to manage tools such as kerberos, AD, LDAP, the encryption at REST Hadoop services options and PKI concepts.
Understand end to end operations, deployment, troubleshooting, and tuning of streaming technologies such as Kafka and RabbitMQ for a wide variety of applications in a wide variety of environments, including on-prem and cloud.
Understand and enable metric collection at all layers of a complex infrastructure, ensuring good visibility for engineering and troubleshooting tasks, and ensure end to end monitoring of critical ecosystem components and workflows.
Understand the Hadoop toolset, how to manage and copy data between and within a Hadoop cluster, integrate with other ecosystems (for instance, cloud storage), configure replication and plan backups and resiliency strategies for data on the cluster.
Deep understanding of the Kafka ecosystem and the ability to troubleshoot and tune brokers, partition distribution, and topics.
Comprehensive systems hardware and network troubleshooting experience in physical, virtual and cloud platform environments, including the operation and administration of virtual and cloud infrastructure provider frameworks. Experience with at last one cloud provider (GCP, AWS, Azure) is required.
Experience with the design, development and deployment of at least one major configuration management framework (i.e. Puppet, Ansible, Chef) and one major infrastructure automation framework (i.e. Terraform, Spinnaker, CloudFormation). Knowledge of DevOps tools, processes, and culture (i.e. Git, continuous integration, test-driven development, Scrum).
A strong desire to learn and the ability to pick up new technologies and ecosystem components quickly, and establish their relevance, architecture and integration with existing systems.