Required Skills

Apache Spark AWS Glue Google Dataflow Talend Hadoop Kafka Apache Flink Hive Presto MySQL PostgreSQL MongoDB Cassandra TensorFlow PyTorch Scikit-learn XGBoost AWS Azure GCP Databricks Celery RQ RabbitMQ Kafka Apache Pulsar Google Cloud PubSub Redshift BigQuery Synapse Analytics NGINX HAProxy MLflow DVC

Work Authorization

  • US Citizen

  • Green Card

  • EAD (OPT/CPT/GC/H4)

  • H1B Work Permit

Preferred Employment

  • Corp-Corp

  • W2-Permanent

  • W2-Contract

  • Contract to Hire

Employment Type

  • Consulting/Contract

education qualification

  • UG :- - Not Required

  • PG :- - Not Required

Other Information

  • No of position :- ( 1 )

  • Post :- 28th Jan 2025

JOB DETAIL

Data Ingestion: Build scalable ETL pipelines using Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from APIs, file systems, and databases.

Data TransformationValidation: Use Pandas, Apache Beam, and Dask for data cleaning, transformation, and validation. Automate data quality checks with Pytest, Unittest.

Big Data Systems: Process large datasets with Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka, Google Cloud PubSub.

Task Queues: Manage asynchronous processing with Celery, RQ, RabbitMQ, or Kafka. Implement retry mechanisms and track task status.

Scalability: Optimize for performance with distributed processing (Spark, Flink), parallelization (joblib), and data partitioning.

CloudStorage: Work with AWS, Azure, GCP, Databricks. Store and manage data with S3, BigQuery, Redshift, Synapse Analytics, and HDFS.
 

Company Information