Job description
Description:
5-9 years of demonstrable experience designing technological solutions to complex data problems, developing & testing modular, reusable, efficient and scalable code to implement those solutions.
Ideally, this would include work on the following technologies:
Expert-level proficiency in Scala/PySpark knowledge is a strong advantage. Exp in at least one of Java, Scala or Python (preferred)
Strong understanding and experience in distributed computing frameworks, particularly Apache Hadoop (YARN, MR, HDFS) and associated technologies — one or more of Hive, Sqoop, Avro, Flume, Oozie, Zookeeper, Impala, etc.
Hands-on experience with Apache Spark and its components (Streaming, SQL, MLLib) is a strong advantage.
Operating knowledge of cloud computing platforms (AWS/Azure/GCP)
Experience working within a Linux computing environment, and use of command line tools including knowledge of Shell/Python scripting for automating common tasks
Ability to work in a team in an agile setting, familiarity with JIRA and clear understanding of how Git works or any version control tools
In addition, the ideal candidate would have great problem-solving skills, and the ability & confidence to hack their way out of tight corners.
Experience:
Must Have (hands-on) experience:
Scala or Python/PySpark expertise
Distributed computing frameworks (Hadoop Ecosystem & Spark components)
Cloud computing platforms (AWS/Azure/GCP)
Linux environment, SQL and Shell scripting
Requirements:
Qualification & Experience
5-9 years of demonstrable experience designing technological solutions to complex data problems, developing & testing modular, reusable, efficient and scalable code to implement those solutions.
Ideally, this would include work on the following technologies:
Expert-level proficiency in Scala/PySpark knowledge is a strong advantage. Exp in at least one of Java, Scala or Python (preferred)
Strong understanding and experience in distributed computing frameworks, particularly Apache Hadoop (YARN, MR, HDFS) and associated technologies — one or more of Hive, Sqoop, Avro, Flume, Oozie, Zookeeper, Impala, etc.
Hands-on experience with Apache Spark and its components (Streaming, SQL, MLLib) is a strong advantage.
Operating knowledge of cloud computing platforms (AWS/Azure/GCP)
Experience working within a Linux computing environment, and use of command line tools including knowledge of Shell/Python scripting for automating common tasks
Ability to work in a team in an agile setting, familiarity with JIRA and clear understanding of how Git works or any version control tools
In addition, the ideal candidate would have great problem-solving skills, and the ability & confidence to hack their way out of tight corners.
Experience:
Must Have (hands-on) experience:
Scala or Python/PySpark expertise
Distributed computing frameworks (Hadoop Ecosystem & Spark components)
Cloud computing platforms (AWS/Azure/GCP)
Linux environment, SQL and Shell scripting
Job Responsibilities:
5-9 years of demonstrable experience designing technological solutions to complex data problems, developing & testing modular, reusable, efficient and scalable code to implement those solutions.
Ideally, this would include work on the following technologies:
Expert-level proficiency in Scala/PySpark knowledge is a strong advantage. Exp in at least one of Java, Scala or Python (preferred)
Strong understanding and experience in distributed computing frameworks, particularly Apache Hadoop (YARN, MR, HDFS) and associated technologies — one or more of Hive, Sqoop, Avro, Flume, Oozie, Zookeeper, Impala, etc.
Hands-on experience with Apache Spark and its components (Streaming, SQL, MLLib) is a strong advantage.
Operating knowledge of cloud computing platforms (AWS/Azure/GCP)
Experience working within a Linux computing environment, and use of command line tools including knowledge of Shell/Python scripting for automating common tasks
Ability to work in a team in an agile setting, familiarity with JIRA and clear understanding of how Git works or any version control tools
In addition, the ideal candidate would have great problem-solving skills, and the ability & confidence to hack their way out of tight corners.
Experience:
Must Have (hands-on) experience:
Scala or Python/PySpark expertise
Distributed computing frameworks (Hadoop Ecosystem & Spark components)
Cloud computing platforms (AWS/Azure/GCP)
Linux environment, SQL and Shell scripting