Required Skills

Big Data Hadoop ETL/ELT Data Warehouse

Work Authorization

  • Us Citizen

  • Green Card

  • EAD (OPT/CPT/GC/H4)

  • H1B Work Permit

Preferred Employment

  • Corp-Corp

Employment Type

  • Consulting/Contract

education qualification

  • UG :- - Not Required

  • PG :- - Not Required

Other Information

  • No of position :- ( 1 )

  • Post :- 4th Jan 2021

JOB DETAIL

Data Architect

Columbus, OH 

Visa: GC, USC, GC EAD, H4

JD:

Must be local to Columbus, Ohio or willing torelocate to Columbus and be ready to report on-site on Day 1 to pick up ITequipment and orientation. Role will initialy be remote, but on-site shortlythereafter.

 

The Technical Specialist will be responsiblefor Enterprise Data Warehouse design, development, implementation, migration,maintenance and operation activities. The candidate will work closely with theData Governance and Analytics team. Will be one of the key technicalresources for data warehouse projects for various Enterprise data warehouseprojects and building critical data marts, data ingestion to Big Data platformfor data analytics and exchange with our partners. This position is a member ofthe ITS and works closely with the Business Intelligence & Data Analyticsteam.

 

Responsibilities

�        Participatein Team activities, Design discussions, Stand up meetings and planning Reviewwith team.

�        Performdata analysis, data profiling, data quality and data ingestion in variouslayers using big data/Hadoop/Hive/Impala queries, PySpark programs and UNIXshell scripts.

�        Followthe organization coding standard document, Create mappings, sessions andworkflows as per the mapping specification document.

�        PerformGap and impact analysis of ETL and IOP jobs for the new requirement andenhancements.

�        Createjobs in Hadoop using SQOOP, PYSPARK and Stream Sets to meet the business userneeds.

�        Createmockup data, perform Unit testing and capture the result sets against the jobsdeveloped in lower environment.

�        Updatingthe production support Run book, Control M schedule document as per theproduction release.

�        Createand update design documents, provide detail description about workflows afterevery production release.

�        Continuouslymonitor the production data loads, fix the issues, update the tracker documentwith the issues, Identify the performance issues.

�        Performancetuning long running ETL/ELT jobs by creating partitions, enabling full load andother standard approaches.

�        PerformQuality assurance check, Reconciliation post data loads and communicate tovendor for receiving fixed data.

�        Participatein ETL/ELT code review and design re-usable frameworks.

�        CreateRemedy/Service Now tickets to fix production issues, create Support Requests todeploy Database, Hadoop, Hive, Impala, UNIX, ETL/ELT and SAS code to UATenvironment.

�        CreateRemedy/Service Now tickets and/or incidents to trigger Control M jobs for FTPand ETL/ELT jobs on ADHOC, daily, weekly, Monthly and quarterly basis asneeded.

�        Modeland create STAGE / ODS / Data warehouse Hive and Impala tables as and whenneeded.

�        CreateChange requests, workplan, Test results, BCAB checklist documents for the codedeployment to production environment and perform the code validation postdeployment.

�        Workwith Hadoop Admin, ETL and SAS admin teams for code deployments and healthchecks.

�        Createre-usable UNIX shell scripts for file archival, file validations and Hadoopworkflow looping.

�        Createre-usable framework for Audit Balance Control to capture Reconciliation,mapping parameters and variables, serves as single point of reference forworkflows.

�        CreatePySpark programs to ingest historical and incremental data.

�        CreateSQOOP scripts to ingest historical data from EDW oracle database to Hadoop IOP,created HIVE tables and Impala views creation scripts for Dimension tables.

�        Participatein meetings to continuously upgrade the Functional and technical expertise.

 

Detailed Day-To-Day Job Duties to be performed

�        Participatein Team activities, Design discussions, Stand up meetings and planning Reviewwith team.

�        Performdata analysis, data profiling, data quality and data ingestion in variouslayers using big data/Hadoop/Hive/Impala queries, PySpark programs and UNIXshell scripts.

�        Followthe organization coding standard document, Create mappings, sessions andworkflows as per the mapping specification document.

�        PerformGap and impact analysis of ETL and IOP jobs for the new requirement andenhancements.

�        Createjobs in Hadoop using SQOOP, PYSPARK and Stream Sets to meet the business userneeds.

�        Createmockup data, perform Unit testing and capture the result sets against the jobsdeveloped in lower environment.

�        Updatingthe production support Run book, Control M schedule document as per theproduction release.

�        Createand update design documents, provide detail description about workflows afterevery production release.

�        Continuouslymonitor the production data loads, fix the issues, update the tracker documentwith the issues, Identify the performance issues.

�        Performancetuning long running ETL/ELT jobs by creating partitions, enabling full load andother standard approaches.

�        PerformQuality assurance check, Reconciliation post data loads and communicate tovendor for receiving fixed data.

�        Participatein ETL/ELT code review and design re-usable frameworks.

�        CreateRemedy/Service Now tickets to fix production issues, create Support Requests todeploy Database, Hadoop, Hive, Impala, UNIX, ETL/ELT and SAS code to UATenvironment.

�        CreateRemedy/Service Now tickets and/or incidents to trigger Control M jobs for FTPand ETL/ELT jobs on ADHOC, daily, weekly, Monthly and quarterly basis asneeded.

�        Modeland create STAGE / ODS / Data warehouse Hive and Impala tables as and whenneeded.

�        CreateChange requests, workplan, Test results, BCAB checklist documents for the codedeployment to production environment and perform the code validation postdeployment.

�        Workwith Hadoop Admin, ETL and SAS admin teams for code deployments and health checks.

�        Createre-usable UNIX shell scripts for file archival, file validations and Hadoopworkflow looping.

�        Createre-usable framework for Audit Balance Control to capture Reconciliation,mapping parameters and variables, serves as single point of reference forworkflows.

�        CreatePySpark programs to ingest historical and incremental data.

�        CreateSQOOP scripts to ingest historical data from EDW oracle database to Hadoop IOP,created HIVE tables and Impala views creation scripts for Dimension tables.

�        Participatein meetings to continuously upgrade the Functional and technical expertise.

Company Information