Required Skills

Pyspark AWS or Azure Streaming Databricks

Work Authorization

  • US Citizen

  • Green Card

  • EAD (OPT/CPT/GC/H4)

  • H1B Work Permit

Preferred Employment

  • Corp-Corp

  • W2-Permanent

  • W2-Contract

  • Contract to Hire

Employment Type

  • Consulting/Contract

education qualification

  • UG :- - Not Required

  • PG :- - Not Required

Other Information

  • No of position :- ( 1 )

  • Post :- 22nd Nov 2023

JOB DETAIL

· Expert-level knowledge of data frameworks, data lakes and open-source projects such as Apache Spark, MLflow, and Delta Lake

· Expert-level hands-on coding experience in Spark/Scala,Python or Pyspark

· Expert proficiency in Python, C++, Java, R, and SQL

· Mid-level knowledge of code versioning tools [such as Git, Bitbucket or SVN]

· In depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, RDD caching, Spark MLib

· IoT/event-driven/microservices in the cloud- Experience with private and public cloud architectures, pros/cons, and migration considerations.

· Extensive hands-on experience implementing data migration and data processing using AWS/Azure/GCP services

· Expertise in using Spark SQL with various data sources like JSON, Parquet and Key Value Pair

· Extensive hands-on experience with the Technology stack available in the industry for data management, data ingestion, capture, processing, and curation: Kafka, StreamSets, Attunity, GoldenGate, Map Reduce, Hadoop, Hive, Hbase, Cassandra, Spark, Flume, Hive, Impala, etc.

· Experience using Azure DevOps and CI/CD as well as Agile tools and processes including Git, Jenkins, Jira, and Confluence

· Experience in creating tables, partitioning, bucketing, loading and aggregating data using Spark SQL/Scala

· Able to build ingestion to ADLS and enable BI layer for Analytics

· Experience in Machine Learning Studio, Stream Analytics, Event/IoT Hubs, and Cosmos

· Strong understanding of Data Modeling and defining conceptual logical and physical data models.

· Proficient level experience with architecture design, build and optimization of big data collection, ingestion, storage, processing, and visualization

· Working knowledge of RESTful APIs, OAuth2 authorization framework and security best practices for API Gateways

· Familiarity of working with unstructured data sets (i.e. voice, image, log files, social media posts, email) 

· Experience in handling escalations from customer’s operational issues.

Responsibilities:

· Work closely with team members to lead and drive enterprise solutions, advising on key decision points on trade-offs, best practices, and risk mitigation

· Guide customers in transforming big data projects,including development and deployment of big data and AI applications

· Educate clients on Cloud technologies and influence direction of the solution.

· Promote, emphasize, and leverage big data solutions to deploy performant systems that appropriately auto-scale, are highly available, fault-tolerant, self-monitoring, and serviceable

· Use a defense-in-depth approach in designing data solutions and AWS/Azure/GCP infrastructure

· Assist and advise data engineers in the preparation and delivery of raw data for prescriptive and predictive modeling

· Aid developers to identify, design, and implement process improvements with automation tools to optimizing data delivery

· Build infrastructure required for optimal extraction, loading and transformation of data from a wide variety of data sources

· Work with the developers to maintain and monitor scalable data pipelines

· Perform root cause analysis to answer specific business questions and identify opportunities for process improvement

· Build out new API integrations to support continuing increases in data volume and complexity

· Implement processes and systems to monitor data quality and security, ensuring production data is accurate and available for key stakeholders and the business processes that depend on it

· Employ change management best practices to ensure that data remains readily accessible to the business

· Maintain tools, processes and associated documentation to manage API gateways and underlying infrastructure

· Implement reusable design templates and solutions to integrate, automate, and orchestrate cloud operational needs

· Experience with MDM using data governance solutions

 

Qualifications:

· Overall experience of 12+ years in the IT field.

· 2+ years of hands-on experience designing and implementing multi-tenant solutions using Azure Databricks for data governance, data pipelines for near real-time data warehouse, and machine learning solutions.

· 3+ years of design and development experience with scalable and cost-effective Microsoft Azure/AWS/GCP data architecture and related solutions 

· 5+ years’ experience in a software development, data engineering, or data analytics field using Python, Scala, Spark, Java, or equivalent technologies 

· Bachelor’s or Master’s degree in Big Data, Computer Science, Engineering, Mathematics, or similar area of study or equivalent work experience

· Nice to have-

·       Advanced technical certifications: Azure Solutions Architect Expert,

·       AWS Certified Data Analytics, DASCA Big Data Engineering and Analytics

·       AWS Certified Cloud Practitioner, Solutions Architect.

·       Professional Google Cloud Certified

Company Information