Principal Data Engineer
Ai Palette
Bengaluru, Karnataka, India
Responsibilities: Lead a team of data engineers specializing in data crawling, providing technical guidance, mentoring, and performance feedback. Collaborate with cross-functional teams, including data scientists, analysts, and software engineers, to understand data requirements and develop scalable data crawling solutions. Design, develop, and maintain data crawling pipelines, ensuring efficient and timely acquisition of data from various sources. Evaluate and implement appropriate data crawling technologies and tools to optimize the crawling process and ensure data quality and integrity. Develop and enforce data engineering best practices, standards, and processes related to data crawling. Identify and resolve issues related to data crawling, such as handling complex data structures, mitigating crawling bottlenecks, and addressing website-specific challenges. Collaborate with stakeholders to define data engineering project requirements, timelines, and deliverables related to data crawling. Perform data extraction, transformation, and loading (ETL) tasks to convert crawled data into usable formats for downstream analysis and processing. Monitor data crawling performance and implement mechanisms to ensure the reliability and scalability of crawling pipelines. Stay up to date with the latest trends and advancements in data crawling techniques, web scraping frameworks, and related technologies. Experience in Data Modelling Requirements: Bachelor's or master's degree in computer science, data engineering, or a related field. Proven experience (6- 10 years ) working as a data engineer, with a specialization in data crawling and web scraping. Strong programming skills in languages such as Python, Java, or Scala, with expertise in web scraping frameworks like Scrapy, Beautiful Soup, or Selenium. Solid understanding of web protocols (HTTP, HTTPS), HTML, CSS, and JavaScript to effectively crawl and extract data from websites. Experience with distributed crawling frameworks such as Apache Nutch or Apache Storm is a plus. Proficiency in SQL and database technologies (e. g., PostgreSQL, MySQL, or Oracle) for data storage and retrieval. Familiarity with cloud platforms (e. g., AWS, Azure, or Google Cloud) and related data services for scalable and reliable data crawling. Knowledge of data modeling, data warehousing, and ETL processes. Strong analytical and problem-solving skills, with a focus on data quality and accuracy. Excellent leadership and team management abilities, with a proven track record of leading data engineering teams. Effective communication and collaboration skills, with the ability to explain complex technical concepts to non-technical stakeholders.