Selecting features, building, and optimizing classifiers using machine learning techniques
Data mining and experimental analysis using state-of-the-art methods
Processing, cleansing, and verifying the integrity of data used for analysis and training/inference
Collect/understand business requirements with varying degree of crispness
Define and design data science techniques and pipelines that address specific business problems
Work with datasets of varying degrees of size and complexity including both structured and unstructured data.
Developing pipelines to process massive data-streams in distributed computing environments such as spark, kubernetes/docker microservices
Develop proprietary algorithms to build customized solutions that go beyond standard industry tools and lead to innovative solutions.
Develop sophisticated visualization of analysis output for business users.
Provide control/analytics for all output produced to monitor/ensure established indicators/targets are met both during initial development and on an ongoing basis.
Identify opportunities for continuous improvement of current algorithms, solutions, and methodologies employed
Proactively collaborate with business partners to monitor solution health and changing requirements and develop actionable plans to address the same while optimizing for quality, use, cost, time-to-market amongst other variables.
Requirements
Bachelor's degree in Statistics, Computer Science, Mathematics, Machine Learning, Econometrics, Physics, Biostatistics or related Quantitative disciplines and 3 or more years' experience in an enterprise data science organization
Graduate degree preferred
Must have advanced expertise with software such as Python as well as expertise with JSON, SQL, experience using other programming languages like R, Java, C++, and expertise in GraphQL is preferred
Must have experience working with enterprise data warehouses, data marts, data bases, data lakes, or other distributed or cloud-based data storage systems
Must have experience working in cross-functional teams and ability to communicate results to non-technical audiences.
Must have experience doing exploratory data analysis and visualization using state of the art python based libraries like pandas, numpy, matplotlib, searborn, plotly, streamlit etc.
Must have experience building models/algorithms for training/inference workloads using libraries like sklearn, tensorflow, pytorch
Must have deep understanding of and experience working on atleast one of the following NLP problem domains : NER, Topic Modelling, NLU, Q&A, NMT or related
Exposure to building Cognitive Search (Information Retrieval) or Recommender Systems (Information Filtering) is preferred
Familiarity with synchronous/event-based system/data/orchestration architectures for batch, streaming/real-time and/or transactional workloads that employ one or more of the following technologies - Message Queues, Kafka, RESTful microservices, spark, kubernetes/docker
Experience with cloud platform and SAAS environments & tools like Azure, AWS, GCP preferred
Familiarity with CICD/DevOps tools such as Bitbucket, Bamboo, Jira, Confluence required
Experience doing test driven development, using standard logging, and debugging techniques is required
Work experience in Agile (Scrum) development teams required