US Citizen
Green Card
Corp-Corp
W2-Permanent
W2-Contract
Contract to Hire
Consulting/Contract
UG :- - Not Required
PG :- - Not Required
No of position :- ( 1 )
Post :- 5th Jan 2024
•Monitoring - Actively monitor ML jobs through logging, metrics, and alerts.
•Incident Management – Address job failures. Rerun jobs, maintain model/job versions in production.
•Infrastructure Scalability - Scale infrastructure to handle production workloads. Tuning, capacity planning, cost optimization.
•Compliance and Security - Apply security best practices. Encrypt data, manage secrets, enforce access controls.
•Technical Support - Troubleshoot integration and infrastructure issues.
•Operations Review – Monthly Incident management summary, Change management summary and other support metrics.
•Model Re-training - Schedule regular retraining of models on new data. Manage code and config changes needed for retraining.
•Model Evaluation - Continuously evaluate model performance on test data. Watch for statistical validation issues.
•Data Validation - Monitor production data inputs. Check for statistical changes, missing values, outliers, errors.
•Dependency Management - Manage libraries, frameworks, ML pipelines. Keep versions aligned across dev, test, prod environments.
•Update Python package versions for Airflow and Lambda if libraries become deprecated and other components and services as needed to customize the environment.
•DevOps – Use version control, Infrastructure as Code, and Configuration Management tools to develop automation which deploys code base.
•Automation – Programmatically define repeatable and reliable tasks which reduce operational overhead.
•Testing – programmatically enforcing quality standards related to the application, automation, and underlying ml models.
Required qualifications to be successful in this role
Requirements: 3-7 years of proven experience in the following areas:
Cloud:
•AWS CLI
•AWS SDK (boto3)
•Parameter store & Secrets Manager
•Lambda
Data Engineering:
•SQL (Ability to write, Debug, and optimize queries)
•DynamoDB
•Redshift
•Programming: IDE (VS Code / PyCharm),
•Testing (unit test/Pytest/mock)
•Python
Machine Learning:
•ML Development (Sage Maker)
•ML Programming (Pandas, NumPy, Scikit Learn, Matplotlib, lightgbm)
•Modeling
DevOps:
•General Linux Terminal (ps, awk, grep, netstat, etc.)
•Shell Scripting (BASH)
•YAML, IaC (Terraform, Cloud Formation)
•CI/CD (Code Build, Code Deploy)
•Orchestration (Apache Airflow)
•Containerization (Docker, Kubernetes)