Position : Big Data Engineer
Location : Mclean, Virginia
Duration : 12+ Month
MOI : Phone+ Skype
Visa : GC/USC
Required Experience:
Must Have:
- Python, Py Spark, Apache Spark, Numpy, Pandas
- AWS EMR is A MUST
Nice to Have:
JOB DESCRIPTION:
- Expertise in Python language developing data engineering applications (ETL) using Spark, pandas, numpy, pyarrow/fastparquet, pyspark, pytest, behave
- Very good experience developing ETL pipelines on Apache Spark and experience with Hadoop, Databricks etc will be good to have.
- Must have worked with AWS services like EMR, S3, EC2, Athena, ECS and knowledge of IAM, Step functions, Lambda’s is an add on.
- Must have Advanced working SQL knowledge, understanding and writing complex SQL queries, fine tuning the performance of queries.
- Experience with Analytical stores like Snowflake would be a plus.
- Experience working on data pipelines that consume a wide variety of data formats e.g parquet, avro, json and csv from AWS S3.
- Experience creating CI/CD pipelines using Jenkins.
- Experience with the github version control system.
- Experience working on Machine learning models will be a plus.
- Experience with stream-processing systems: Spark-Streaming, Storm etc will be a plus.
- Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Strong analytic skills related to working with unstructured datasets.
- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- A successful history of manipulating, processing and extracting value from large disconnected datasets.
- Strong project management and organizational skills.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- Prior Capital One experience is a big plus.