NH has tons of raw / unaltered data we receive on a regular basis that makes its way into the data lake
Many of the data sources are about the same content (e.g. Centrum might receive medical claim information from over a dozen different insurance companies), but each source sends us the data in their own bespoke format
We're working on an initiative to have all of this data end up in a standard set of database tables (i.e. data normalization) (e.g. take 12x medical claim spreadsheets/csvs that come on a regular basis and produce a single list of ALL claims
The work / skills:
Overall we're looking for contracting support that would be somewhere between a skilled analyst / junior data engineer.
They need to have hands on experience with Scala.
The job would be:
input:
a document that specifies how the source file format should be manipulated to match our standard format
a sample source file
a template for how the normalization code/configuration should be setup
output:
a normalization configuration for a given source file format