IT EIM Data Engineering Process Improvement

Posted 01 Feb 2019

Bangalore SBS, Karnataka - India

Req Id 183923


A career is an ongoing journey of discovery: our 52,000 people are shaping how the world lives, works and plays through next generation advancements in healthcare, life science and performance materials. For 350 years and across the world we have passionately pursued our curiosity to find novel and vibrant ways of enhancing the lives of others. 

Job Title:IT EIM Data Engineering Process Improvement

Job Location: Bangalore

Job Details:

The Data Engineering ‘Process Improvement’ role is a Hadoop Data Engineering role responsible for helping the team develop automated end to end data pipelines for the data management and analytics platform (also referred to as “MCloud”) in an environment of continuous improvement. You will be an experienced, hands-on, software engineer and in this role, you will play a coaching and mentoring role focusing on improving the development processes by leveraging continuous integration/continuous deployment tools and techniques.

The MCloud platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or on-premise Org’s own data centers. These are: 

  • Hortonworks Hadoop environment (development cluster and a regulated production cluster) 
  • ELK (Elasticsearch, Logstash, Kibana) stack 
  • R and Python Servers with connectivity to the Hadoop cluster. 
  • Docker and Docker container technologies

    This position will conduct architecture, code and testing reviews and develop documented processes and procedures to ensure consistent quality in the delivery of data engineering outputs. The individual must be capable of complex and creative problem solving with the ability to create an environment for others to develop their skills and techniques.

    Roles & Responsibilities: 

  • Ability to develop, maintain and test data systems and architectures especially with automation
  • Develop improvements in the development process coaching others on best practice and running regular teaching sessions.
  • Debug problems across a full stack of Hadoop tools and code based on Python, Scala and Java.
  • Utilize unit testing frameworks in ‘big data’ environment to ensure full coverage and test automation
  • Champion the use or full automation in CI/CD
  • Support other members of the organization both within and external to the team with the ability to explain functionality and assist in development
  • Document procedures and processes in a professional and transparent way using a variety of tools


  • B.Sc. (or higher) degree in Computer Science, Engineering, Mathematics, Physical Sciences or related fields 

    Professional Experience 

  • 5+ years of experience in system engineering or software development 
  • 3+ years of experience in engineering with experience in ETL type work with databases and Hadoop platforms.


Hadoop General

Deep knowledge of distributed file system concepts, map-reduce principles and distributed computing. Knowledge of Spark and differences between Spark and Map-Reduce. Familiarity of encryption and security in a Hadoop cluster.


HDFS and Hadoop File System Commands


Creating and managing tables; experience of building partitioned tables; HQL; controlling Yarn queues in Hive operations


Full knowledge of sqoop including creating and running sqoop jobs in incremental and full load


Experience in creating Oozie workflows to control Java, Hive, Spark and Shell actions using


Experience in launching spark jobs in client mode and cluster mode. Familiarity with the property settings of spark jobs and their implications to performance.


Must have leveraged source code control and automated build & deploy tools on large projects, integrating continuous unit testing


Experience or understanding of Docker is required as will a good understanding of micro-services architecture


Must be experienced in Enterprise Linux command line, preferably in SUSE Linux 

Shell Scripting 

Ability to write parameterized shell scripts using functions and familiarity with Unux tools such as sed/awk/etc


Must be at expert level in Python or expert in at least one high level language such as Java, C, Scala. Must be familiar with Maven. Knowledge of Python virtual environments and python package creation will be a plus.


Must be an expert in manipulating database tables using SQL. Familiarity with views, functions, stored procedures and exception handling.


General knowledge of AWS Stack (EC2, S3, EBS, …)

IT Process Compliance

SDLC experience and formalized change controls


Fluent English skills

Specific information related to the position:

  • Physical presence in primary work location (Bangalore)
  • Flexible to work CEST and US EST time zones (according to team rotation plan)

What we offer: With us, there are always opportunities to break new ground. We empower you to fulfill your ambitions, and our diverse businesses offer various career moves to seek new horizons. We trust you with responsibility early on and support you to draw your own career map that is responsive to your aspirations and priorities in life. Join us and bring your curiosity to life!

Curious? Apply and find more information at

Apply Now

Let’s stay connected

Do you want to receive company news and information about career opportunities tailored to your preferences? Sign up here. You want to check the status of your application or change your candidate profile? Enter our job portal.


You have accessed, but for users from your part of the world, we originally designed the following web presence

Let's go

Share Disclaimer

By sharing this content, you are consenting to share your data to this social media provider. More information are available in our Privacy Statement