A career is an ongoing journey of discovery: our 52,000 people are shaping how the world lives, works and plays through next generation advancements in healthcare, life science and performance materials. For 350 years and across the world we have passionately pursued our curiosity to find novel and vibrant ways of enhancing the lives of others.
Job Title:IT EIM Data Engineering Process Improvement
Job Location: Bangalore
The Data Engineering ‘Process Improvement’ role is a Hadoop Data Engineering role responsible for helping the team develop automated end to end data pipelines for the data management and analytics platform (also referred to as “MCloud”) in an environment of continuous improvement. You will be an experienced, hands-on, software engineer and in this role, you will play a coaching and mentoring role focusing on improving the development processes by leveraging continuous integration/continuous deployment tools and techniques.
The MCloud platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or on-premise Org’s own data centers. These are:
- Hortonworks Hadoop environment (development cluster and a regulated production cluster)
- ELK (Elasticsearch, Logstash, Kibana) stack
- R and Python Servers with connectivity to the Hadoop cluster.
- Docker and Docker container technologies
This position will conduct architecture, code and testing reviews and develop documented processes and procedures to ensure consistent quality in the delivery of data engineering outputs. The individual must be capable of complex and creative problem solving with the ability to create an environment for others to develop their skills and techniques.
Roles & Responsibilities:
- Ability to develop, maintain and test data systems and architectures especially with automation
- Develop improvements in the development process coaching others on best practice and running regular teaching sessions.
- Debug problems across a full stack of Hadoop tools and code based on Python, Scala and Java.
- Utilize unit testing frameworks in ‘big data’ environment to ensure full coverage and test automation
- Champion the use or full automation in CI/CD
- Support other members of the organization both within and external to the team with the ability to explain functionality and assist in development
- Document procedures and processes in a professional and transparent way using a variety of tools
- B.Sc. (or higher) degree in Computer Science, Engineering, Mathematics, Physical Sciences or related fields
- 5+ years of experience in system engineering or software development
- 3+ years of experience in engineering with experience in ETL type work with databases and Hadoop platforms.
Deep knowledge of distributed file system concepts, map-reduce principles and distributed computing. Knowledge of Spark and differences between Spark and Map-Reduce. Familiarity of encryption and security in a Hadoop cluster.
HDFS and Hadoop File System Commands
Creating and managing tables; experience of building partitioned tables; HQL; controlling Yarn queues in Hive operations
Full knowledge of sqoop including creating and running sqoop jobs in incremental and full load
Experience in creating Oozie workflows to control Java, Hive, Spark and Shell actions using
Experience in launching spark jobs in client mode and cluster mode. Familiarity with the property settings of spark jobs and their implications to performance.
Must have leveraged source code control and automated build & deploy tools on large projects, integrating continuous unit testing
Experience or understanding of Docker is required as will a good understanding of micro-services architecture
Must be experienced in Enterprise Linux command line, preferably in SUSE Linux
Ability to write parameterized shell scripts using functions and familiarity with Unux tools such as sed/awk/etc
Must be at expert level in Python or expert in at least one high level language such as Java, C, Scala. Must be familiar with Maven. Knowledge of Python virtual environments and python package creation will be a plus.
Must be an expert in manipulating database tables using SQL. Familiarity with views, functions, stored procedures and exception handling.
General knowledge of AWS Stack (EC2, S3, EBS, …)
IT Process Compliance
SDLC experience and formalized change controls
Fluent English skills
Specific information related to the position:
- Physical presence in primary work location (Bangalore)
- Flexible to work CEST and US EST time zones (according to team rotation plan)
What we offer: With us, there are always opportunities to break new ground. We empower you to fulfill your ambitions, and our diverse businesses offer various career moves to seek new horizons. We trust you with responsibility early on and support you to draw your own career map that is responsive to your aspirations and priorities in life. Join us and bring your curiosity to life!
Curious? Apply and find more information at