1. SQL (Primary Focus - 60%)
- Strong experience in writing complex queries, optimizing performance, and handling large datasets.
- Proficiency in SQL-based data extraction, transformation (ETL), and reporting.
- Knowledge of window functions, indexing, partitions, K-Slot allocation, and distributed query optimization.
2. Machine Learning (40%)
- Hands-on experience in ML model development, training, and deployment.
- Expertise in anomaly detection using techniques such as Isolation Forest, One-Class SVM, and statistical outlier detection.
- Proficiency in clustering techniques like K-Means, DBSCAN, Hierarchical Clustering, and K-Slot allocation strategies.
- Knowledge of predictive modeling, classification, and regression techniques.
- Experience with unsupervised learning models and feature engineering for anomaly detection.
3. Python for Data Processing & Web Scraping
- Strong programming skills in Python with libraries such as Pandas, NumPy, Scikit-learn, TensorFlow/PyTorch.
- Experience in automating data pipelines and handling large-scale data ingestion.
- Web scraping expertise using BeautifulSoup, Scrapy, Selenium for data collection and preprocessing.
4. Cluster Computing & Big Data Technologies
- Familiarity with distributed computing frameworks such as Spark (PySpark), Dask, or Hadoop.
- Understanding of parallel processing, batch processing, and real-time streaming solutions.
- Experience with cloud platforms (AWS, GCP, Azure)
5. Dashboarding & Data Visualization
- Experience with dashboarding tools like Tableau, Power BI, Looker, or Streamlit to present insights effectively.
- Ability to automate and integrate data pipelines for real-time monitoring.