Job description
The Senior Databricks Developer will be responsible to implement and maintain solutions in AWS Databricks platform. You will be responsible for coordinating data requests from the various teams, reviewing and approving efficient approaches to ingest, extract, transform and maintaining data in multi-hop model. In addition, you’ll work with team members to mentor other developers to grow their knowledge and expertise. You’ll be working in a fast-paced and high-volume processing environment, where quality and attention to detail are vital.
PRIMARY RESPONSIBILITIES
• Design and develop high performance and secured Databricks solutions using Python, Spark, PySpark, Delta tables, UDP and Kafka
• Create high-quality technical documents including, data mapping, data processes, and operational support guides
• Translate business requirements into data model design and technical solutions
• Develop data ingest pipelines using Python, Spark & PySpark to support near real-time and batch ingestion processes
• Maintain data lake and pipeline processes which includes troubleshooting issues, performance tuning and making data quality improvements
• Work closely with technical leaders, product managers, and reporting team to gather functional and system requirements
• Work in fast pace environment and perform effectively in an agile development environment
KNOWLEDGE AND SKILL REQUIREMENTS
• Bachelor’s degree in Computer Science or Information Systems or equivalent degree
• Must have 8+ years of experience in developing applications using Python, Spark, PySpark, Java, Junit, Maven and its eco-system
• Must have 4+ years of hands-on experience in AWS Databricks and related technologies like MapReduce, Spark, Hive, Parquet and AVRO
• Good experience in end-to-end implementation of DW BI projects, especially in data warehouse and data mart developments
• Extensive hands on RDD, Data frame and Dataset operations of Spark 3.x
• Experience with design and implementation of ETL/ELT framework for complex warehouses/marts.
• Knowledge of large data sets and experience with performance tuning and troubleshooting
• Plus to have AWS Cloud Analytics experience in Lambda, Athena, S3, EMR, Redshift, Redshift spectrum
• Must have RDBMS: Microsoft SQL Server, Oracle, MySQL
• Familiarity with Linux OS
• Understanding of Data architecture, replication, and administration
• Experience in working with real-time data ingestion with any streaming tool
• Strong debugging skills to troubleshoot production issues
• Comfortable working in a team environment
• Hands on experience with Shell Scripting, Java, and SQL
• Ability to identify problems, and effectively communicate solutions to peers and management