Job description
Requirements:
1. Data engineer with 4 to 6 years of hands on experience working on Big Data Platforms
2. Experience building and optimizing Big data data pipelines and data sets ranging from Data ingestion to Processing to Data Visualization.
3. Good Experience in writing and optimizing Spark Jobs, Spark SQL etc. Should have worked on both batch and steaming data processing
4. Good experience in any one programming language -Scala/Python , Python preferred.
5. Experience in writing and optimizing complex Hive and SQL queries to process huge data. good with UDFs, tables, joins,Views etc
6. Experience in using Kafka or any other message brokers
7. Configuring, monitoring and scheduling of jobs using Oozie and/or Airflow
8. Processing streaming data directly from Kafka using Spark jobs, expereince in Spark- streaming is must
9. Should be able to handling different file formats (ORC, AVRO and Parquet) and unstructured data
10. Should have experience with any one No SQL databases like Amazon S3 etc
11. Should have worked on any of the Data warehouse tools like AWS Redshift or Snowflake or BigQuery etc
12. Work expereince on any one cloud AWS or GCP or Azure
Job Responsibilities:
Good to have skills:
1. Experience in AWS cloud services like EMR, S3, Redshift, EKS/ECS etc
2. Experience in GCP cloud services like Dataproc, Google storage etc
3. Experience in working with huge Big data clusters with millions of records
4. Experience in working with ELK stack, specially Elasticsearch
5. Experience in Hadoop MapReduce, Apache Flink, Kubernetes etc