A career is an ongoing journey of discovery: our 52,000 people are shaping how the world lives, works and plays through next generation advancements in healthcare, life science and performance materials. For 350 years and across the world we have passionately pursued our curiosity to find novel and vibrant ways of enhancing the lives of others.
Job Title:IT EIM Data Engineer - Hadoop with Containers
Job Location: Bangalore
The Data Engineering – Container Technology role is responsible for developing data products than utilize scalable container technology such as Docker with integration into Hadoop and/or the ELK stack for the Org IT Enterprise Information Management (EIM). In this role, you will be part of a growing, global team of DevOps engineers, system admins and infrastructure technicians who collaborate to design, build, test and implement solutions across Life Sciences, Finance, Manufacturing and Healthcare.
The EIM platform currently comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or on-premise Org’s own data centers. These are:
- Hortonworks Hadoop environment (development cluster and a regulated production cluster)
- ELK (Elasticsearch, Logstash, Kibana) stack
- R and Python Servers with connectivity to the Hadoop cluster.
- Docker and Docker container technologies
This position will be expected to help build up the Data Engineering expertise with Docker and Kubernetes type environments to build data products that are scalable and provide high availability. This role is for a developer with Hadoop/Spark skills who can code and build solution that are tightly integrated with big data and advanced analytics type tools, so knowledge of Hadoop/Spark is essential. The individual must be capable of complex and creative problem solving with the ability to work in an agile development environment.
Roles & Responsibilities:
- To build and deploy Docker based applications at scale using a variety of tools; write and maintain Docker repos; configure/write Docker files
- Ability to understand and model data using traditional RDBMS methods and modern NoSQL solutions.
- Work closely with business users, data scientists/analysts to design logical and physical data models and data applications based on container technology
- Utilize automation to create pipeline processes for data ingestion, transformation and access to data catalog solutions
- Support other members of the organization both within and external to the team with the ability to explain functionality and assist in development
- Document technical work in a professional and transparent way
- Code in Python, Java/Scala and shell scripts
- B.Sc. (or higher) degree in Computer Science, Engineering, Mathematics, Physical Sciences or related fields
- 5+ years of experience in system engineering or software development
- 3+ years of experience in engineering with experience with Docker and Hadoop platforms.
Must have experience in creating application in Docker containers, writing Dockerfiles, using Docker repos and configuring Docker containers to runin scalable environments such as Kubernetes.
Deep knowledge of distributed file system concepts, map-reduce principles and distributed computing. Knowledge of Spark and differences between Spark and Map-Reduce. Familiarity of encryption and security in a Hadoop cluster.
HDFS and Hadoop File System Commands
Creating and managing tables;
Experience in launching spark jobs in client mode and cluster mode. Familiarity with the property settings of spark jobs and their implications to performance.
Must be experienced in the use of source code control systems such as Git
Experience with developing ELT/ETL processes with experience in loading data from enterprise sized RDBMS systems such as Oracle, DB2, MySQL, etc.
Must be experienced in Enterprise Linux command line, preferably in SUSE Linux
Ability to write parameterized shell scripts using functions and familiarity with Unux tools such as sed/awk/etc
Must be at expert level in Python and expert in at least one high level language such as Java, C, Scala.
Must be an expert in manipulating database tables using SQL. Familiarity with views, functions, stored procedures and exception handling.
General knowledge of AWS Stack (EC2, S3, EBS, …)
IT Process Compliance
SDLC experience and formalized change controls
Fluent English skills
Specific information related to the position:
- Physical presence in primary work location (Bangalore)
- Flexible to work CEST and US EST time zones (according to team rotation plan)
What we offer: With us, there are always opportunities to break new ground. We empower you to fulfill your ambitions, and our diverse businesses offer various career moves to seek new horizons. We trust you with responsibility early on and support you to draw your own career map that is responsive to your aspirations and priorities in life. Join us and bring your curiosity to life!
Curious? Apply and find more information at