Data Engineer - Enterprise Hadoop/Spark Platform

Posted 01 Feb 2019

Bangalore SBS, Karnataka - India

Req Id 184204


A career is an ongoing journey of discovery: our 52,000 people are shaping how the world lives, works and plays through next generation advancements in healthcare, life science and performance materials. For 350 years and across the world we have passionately pursued our curiosity to find novel and vibrant ways of enhancing the lives of others. 

Job Title:Data Engineer - Enterprise Hadoop/Spark Platform

Job Location: Bangalore

Job Details:

The Enterprise Data Engineering Team is responsible for designing, developing, testing, and supporting so-called data pipelines on Orgs’s enterprise data management and analytics platform (Hadoop and other components). The platform stack is also referred to as “MCloud”. In this role, you will be part of a growing, global team of data engineers, who collaborate in a DevOps approach, in order to enable Orgs business sectors with state-of-the-art technology to leverage data as an asset and to take informed decisions.

The MCloud platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or in Orgs's own data centers. These are: 

  • Hortonworks Hadoop environment (development cluster and GXP-regulated production cluster) 
  • ELK stack (Elasticsearch, Logstash, Kibana)
  • Palantir Foundry platform (proprietary technology stack)

    The technology focus of this role will be Hadoop, Spark and related technologies. However, it might be required to collaborate with team members whose main focus is around one of the other technologies.

    It is important to note that this role requires experience working in a strictly regulated IT context and knowledge of the applicable good practices (preferably related to Healthcare regulation). These include but not limited to: good documentation practices, software validation, change management for regulated software and responsible management of deviations/non-conformances. Additional custom training will be provided, but willingness to work in strictly regulated context is required.

    Roles & Responsibilities: 

  • Develop data pipelines in a Hadoop-based cluster environment
  • Participate in end to end project lifecycle, from requirements analysis to go-live and operations of an application
  • Review code developed by other data engineers and check against platform-specific standards, cross-cutting concerns, coding and configuration standards and functional specification of the pipeline
  • Create high quality technical documentation; work must be documented in a professional and traceable way
  • Work out the best possible balance between technical feasibility and business requirements (the latter can be quite strict)
  • Consult technical team members and management staff
  • Deploy applications on MCloud platform infrastructure (especially Hortonworks Hadoop) with clearly defined checks
  • Implementation of changes and bug fixes via Org's change management framework and according to system engineering practices (additional training will be provided)
  • DevOps project setup following Agile principles (e.g. Scrum)
  • Besides working on projects, act as third level support for critical applications (partly GXP regulated); analyze and resolve complex incidents/problems with MCloud support team members


  • B.Sc. (or higher) degree in Computer Science, Engineering, Physics or related fields 

    Professional Experience 

  • 5+ years of experience in system engineering or software development 
  • 3+ years of intensive experience working with an Apache Hadoop distribution



Experience with Big Data Platforms / Hadoop platform (ideally Hortonworks Data Platform)


Experience with ELT/ETL tools

Data management / data structures

Must be proficient in technical data management tasks, i.e. writing code to read, transform and store data

XML/JSON knowledge

Experience working with REST APIs

Shell Scripting 

Ability to write shell scripts (Linux shell and shell scripting)


Deep experience in software development with Scala or Java

Potentially also Python or R (preferred)


Must be experienced in writing complex SQL statements

IT project management / process understanding

SDLC experience

Working in DevOps teams, based on Agile principles (e.g. Scrum)

ITIL knowledge (especially incident, problem and change management)

Regulated industry

Experience working in regulated IT context (preferably Healthcare/GXP)


Fluent English skills (orally and in writing)


Experience working with the Unix CLI

Basic knowledge of Enterprise Linux, ideally SUSE Linux (preferred)


Basic understanding of user authorization (Apache Ranger preferred)


General knowledge of AWS Stack: EC2, S3, EBS, etc. (preferred)

Specific information related to the position:

  • Physical presence in primary work location (Bangalore)
  • Must be able to work mobile to support issues offline during evenings and weekends
  • Flexible to work CEST and US EST time zones (according to project demand/team rotation plan)
  • Willingness to travel to Germany, US and potentially other locations (as per project demand)

What we offer: With us, there are always opportunities to break new ground. We empower you to fulfill your ambitions, and our diverse businesses offer various career moves to seek new horizons. We trust you with responsibility early on and support you to draw your own career map that is responsive to your aspirations and priorities in life. Join us and bring your curiosity to life!

Curious? Apply and find more information at

Apply Now

Let’s stay connected

Do you want to receive company news and information about career opportunities tailored to your preferences? Sign up here. You want to check the status of your application or change your candidate profile? Enter our job portal.


You have accessed, but for users from your part of the world, we originally designed the following web presence

Let's go

Share Disclaimer

By sharing this content, you are consenting to share your data to this social media provider. More information are available in our Privacy Statement