Title: Data Engineer Scope of Services: Replicate functioning code from one environment to another (on premise Mainframe/Bluebird to Google Cloud); Majority of the coding required will be in BigQuery/SQL, YAML, Python and Spark. Effort will involve using the data wrangling skills to understand patterns and data discovery to drive algorithmic development and enhancements.
- Create projects in GCP. Store coding scripts, log files, output files in GCP where they may be referenced (not necessarily run)
- Certify Data Management capabilities in the new system compared to legacy with same or improved results.
- Review legacy processes and code created on-premise, compare to new Google
- Cloud Platform (GCP) version, ensure it is stable, verify results, and recommend best practices
- Create new processes for new product development • Develop strategies, standards and best practices in the areas of data wrangling, data visualization and data integration
What is Needed:
- 5-7 years of data management and analysis experience in the credit risk, telecommunications, financial services, marketing, fraud
- 5+ years of experience with data querying/handling tools (Spark, Python,, SQL, DataLever/RedPoint DM, Spotfire, Tableau or other industry-standard tools)
- 2+ years of experience with graph databases
- Experience with analyzing large (multi-TB to PB), complex data sets for consumer or commercial data
- Strong knowledge of credit bureau data
- Experience developing and deploying Data Matching and Entity Resolution algorithms and/or have previously worked on record linkage capabilities.
- Experience with Google cloud