Required Skills

AWS Spark SQL Python Kafka Data Modeling Data Analysis

Work Authorization

  • US Citizen

  • Green Card

  • EAD (OPT/CPT/GC/H4)

  • H1B Work Permit

Preferred Employment

  • Corp-Corp

Employment Type

  • Consulting/Contract

education qualification

  • UG :- - Not Required

  • PG :- - Not Required

Other Information

  • No of position :- ( 1 )

  • Post :- 6th May 2022

JOB DETAIL

Candidate Submission Format - needed from you

Full Legal Name
Personal Cell No ( Not google phone number)
Email Id
Skype Id
Interview Availability
Availability to start, if selected
Current Location
Open to Relocate
Work Authorization
Total Relevant Experience
Education./ Year of graduation
University Name, Location
Country of Birth
Contractor Type
Home Zip Code

Assigned Job Details

Job Title : DataWarehouse Admin
Location: Washington, DC
Employment Type :Best competitive rate

Technical Skill Matrix – Required from You

Technical Skill Number of Years of Experience

data warehousing :  -- years

AWS EMR Spark  -- years

Data Visualization  -- years

AWS, Spark, SQL, Python, Kafka,  -- years 
Data Modeling, Data Analysis   -- years

Cloud development

JOB DESCRIPTION:

We are looking for a DataWarehouse  Administrator with extensive experience in AWS EMR performing database administration in a Cloud environment using AWS EMR with emphasis on WHD data to develop/maintain an optimal data model and database design in a computing environment that processes a massive volume of WHD data for the use of data analysts and data scientists.

Duties and Responsibilities:

This data warehouse contains WHD systems - Certification, Enforcement, Financial management, Wage Determination, USMCA, etc. Data warehouse design should be targeted toward and driven by the WHD requirements for data reporting and analytics. Leverage modern interactive visualization tools for data reporting and data analytics from this Data Warehouse as required after the data warehouse is built.
Migrate data from WHD legacy DB2 databases and WHD modernized application online transactional Oracle databases using ETL tools into a central reporting Data Warehouse on the DOL DAC (Data Analytics Capability) platform.
Incorporate data from other DOL internal and/or external sources into this data warehouse when required. The data sources include but are not limited to ETA data, BLS survey data, SAM.gov, CLEAR, IVR data and census survey data, etc.
Designs and builds relational databases. Plans and coordinates the administration of computerized logistics databases to ensure accurate, appropriate, and effective use of data, including database definition, structure, documentation, long-range requirements, and operational guidelines.
Deploy ETL procedures using AWS Database Migration Service (DMS). Experience in data lake design, profiling, and conceptual modeling. Follow the best practices for Database Design (creating logical/physical modeling, version control for ER diagrams, objects naming convention standards, and audit columns).
Implement AWS migration strategy to move on-prem COTS/GOTS applications to the cloud Conduct data modeling and provide maintenance/troubleshooting for databases.
Reviews database design and integration of systems, develop data warehousing blueprints and makes recommendations regarding enhancements and/or improvements.
Formulates policies, procedures, and standards relating to database management and monitors transaction activity and utilization.
Prepares and/or reviews activity, progress, and performance reports.
Translates business needs into long-term architecture solutions. Reviews and develops object and data models and the metadata repository to structure the data for better management and quicker access.
Support the documentation and maintenance of data strategy, plans, and artifacts, including logical data model, physical data model, data dictionary, data roadmap, and data security, as needed, in a Cloud environment using the latest Cloud-development technology and tools while adhering to the industry best practices and Federal standards.
Adhere to best practices of AWS Cloud Data Architecture, such as clustered computing resources, massively parallel processing, scalability, high performance, high availability of data, flexibility to support multiple types of business users, load operations, and refresh rates (e.g., batch, mini-batch, stream), query operations (e.g., create, read, update, delete), data processing engines (e.g., relational, OLAP, MapReduce, SQL, graphing, mapping, programmatic) and pipelines (e.g., data warehouse, data mart, OLAP cubes, visual discovery, real-time operational applications).
Define and maintain Data Technology Architecture to build the data objects, tables, views, and models to make the environment more efficient and effective.
Define and maintain Data Integration Architecture and metadata.
When necessary and directed by the Government, reverse-engineer business rules and requirements from the legacy system and engage with program leadership to translate past practices into the current analytics platform requirements.
Perform EMR Hadoop cluster administration and other AWS Storage lifecycle management (such as S3, Storage Gateway, FSx), including Administration of AWS Data Migration and Transfer Services (such as AWS Glue, AWS Datasync, AWS Database Migration Service) and AWS Database Services for CDC (such as RDS, RedShift).
Proactively track metered cloud resources usage to optimize performance and reduce costs Establish and adhere to Service Level Agreements (SLAs) related to cloud resources Key Performance Indicators (KPIs), such as Recovery Point Objective (RPO), Recovery Time Objective (RTO), Mean Time to Repair (MTTR) Provide support services to enterprise the Data Strategy Support and Data Governance Support efforts
Provide technical and operational database administration: installs, maintains, develops, and implements policies and procedures for ensuring the security, availability, and integrity of the Government/Agency databases and large data warehouse database environments.
Experience with ETL development (in general) or (specifically) using AWS GLUE
Build, execute and maintain various AWS ETL/CDC (Change Data Capture) to build a data warehouse, Data processing, ETL, and DMS (Data Migration Service) jobs (AWS DataSync, AWS Data Migration Service) on the DAC platform and provide production support on these jobs.
Provide analysis and definitions of technical standards related to database resources, and applications
Apply knowledge in the area of physical database design, performance and tuning, backup and recovery procedures to resolve database performance issues, database capacity issues, replication, and other distributed data issues
Provide support resolving conversion and upgrade data issues; Liaise with Project Managers, Business Analysts, and Developers for the definition and clarification of requirements and facilitate the progress on project tasks.
Aligning with the systems engineering team to propose and deploy new hardware and software environments required and to expand existing environments.
Perform Cluster maintenance as well as creation and removal of nodes and performance tuning and capacity planning.
Manage and review log files.
Diligently teaming with the infrastructure, network, database, application, and business intelligence teams to guarantee high data quality and availability.
Collaborating with application teams to install operating system and updates, patches, version upgrades when required Performs other duties as assigned

Desired Experience

Experience working in AWS or on other cloud projects; any relevant cloud certifications.
Specific experience with the following:
AWS Cloud Formation
AWS Glue
AWS Dynamo DB
AWS Lambda
AWS S3
AWS DMS
AWS Security (Access Control, Permissions, MFA, Data Encryption, Logs & Monitoring)
AWS SNS and SQS
AWS ALB
AWS RDS MySQL
AWS EC2, EBS, and AMR

Requirements:

4+ years of experience with data warehousing activities.
4+ years of experience in cloud data architecture and technologies such as AWS EMR Spark.
Experience working with Data Visualization tools such as Tableau, R and Apache Spark.
Experienced in designing data architecture that supports high performance, scalability, and reusability for systems that are shared amongst different organizations.
Experience with these tools and techniques: AWS, Spark, SQL, Python, Kafka, Data Modeling, Data Analysis
Experienced working in Cloud development platform that includes tools such as these: AWS (server & serverless services), Ansible or Terraform, CloudFormations, Python, Jenkins, Git, Security scanning tools (Nessus, BurpSuite, Netsparker, OWASP, etc.), IaC (Infrastructure as Code) techniques for the full stack of Dev/Data Analytics, QA tools, Dev/Data Analytics, QA - SaS Viya, Hadoop, airflow, AWS EMR HBase/Hive, Quicksight, Apache Ranger/Knox, AWS IAM, Ambari, Hive, Sagemaker, Zeppelin, Jupyter, Python stack, Java stack.

Company Information