IT EIM Data Engineer - Enterprise Pipelines

Posted 01 Feb 2019

Bangalore SBS, Karnataka - India

Req Id 183922

Details

A career is an ongoing journey of discovery: our 52,000 people are shaping how the world lives, works and plays through next generation advancements in healthcare, life science and performance materials. For 350 years and across the world we have passionately pursued our curiosity to find novel and vibrant ways of enhancing the lives of others. 
 


IT EIM Data Engineer - Enterprise Pipelines

Job Location: Bangalore

Job Details:

The Data Engineering Enterprise Pipeline role is responsible for developing automated end to end data pipelines for the enterprise data management and analytics platform, also referred to as “MCloud”. In this role, you will be part of a growing, global team of DevOps engineers, system admins and infrastructure technicians who collaborate to design, build, test and implement solutions across Life Sciences, Finance, Manufacturing and Healthcare. 

The MCloud platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or on premise Org’s own data centers. These are: 

  • Hortonworks Hadoop environment (development cluster and a regulated production cluster) 
  • ELK (Elasticsearch, Logstash, Kibana) stack 
  • R and Python Servers with connectivity to the Hadoop cluster. 
  • Docker and Docker container technologies

    This position will execute sophisticated operations with data sources to acquire, transform and organize data that is required to develop insightful and actionable information. The individual must be capable of complex and creative problem solving with the ability to work in an agile development environment.

    Roles & Responsibilities: 

  • Ability to develop, maintain and test data systems and architectures especially with automation
  • Work closely with business users, data scientists/analysts to design logical and physical data models
  • Debug problems across a full stack of Hadoop tools and code based on Python, Scala and Java.
  • Utilize automation to create set processes for specific data ingestion, transformation and access procedures
  • Support other members of the organization both within and external to the team with the ability to explain functionality and assist in development
  • Document technical work in a professional and transparent way

    Education 

  • B.Sc. (or higher) degree in Computer Science, Engineering, Mathematics, Physical Sciences or related fields 

    Professional Experience 

  • 5+ years of experience in system engineering or software development 
  • 3+ years of experience in engineering with experience in ETL type work with databases and Hadoop platforms.

    Skills

Hadoop General

Deep knowledge of distributed file system concepts, map-reduce principles and distributed computing. Knowledge of Spark and differences between Spark and Map-Reduce. Familiarity of encryption and security in a Hadoop cluster.

HDFS

HDFS and Hadoop File System Commands

Hive

Creating and managing tables; experience of building partitioned tables; HQL; controlling Yarn queues in Hive operations

Sqoop

Full knowledge of sqoop including creating and running sqoop jobs in incremental and full load

Oozie

Experience in creating Oozie workflows to control Java, Hive, Spark and Shell actions using

Spark

Experience in launching spark jobs in client mode and cluster mode. Familiarity with the property settings of spark jobs and their implications to performance.

SCC/Git

Must be experienced in the use of source code control systems such as Git

ETL 

Experience with developing ELT/ETL processes with experience in loading data from enterprise sized RDBMS sytems such as Oracle, DB2, MySQL, etc.

Linux 

Must be experienced in Enterprise Linux command line, preferably in SUSE Linux 

Shell Scripting 

Ability to write parameterized shell scripts using functions and familiarity with Unux tools such as sed/awk/etc

Programming 

Must be at expert level in Python or expert in at least one high level language such as Java, C, Scala.

SQL 

Must be an expert in manipulating database tables using SQL. Familiarity with views, functions, stored procedures and exception handling.

AWS 

General knowledge of AWS Stack (EC2, S3, EBS, …)

IT Process Compliance

SDLC experience and formalized change controls

Languages 

Fluent English skills

Specific information related to the position:

  • Physical presence in primary work location (Bangalore)
  • Flexible to work CEST and US EST time zones (according to team rotation plan)


What we offer: With us, there are always opportunities to break new ground. We empower you to fulfill your ambitions, and our diverse businesses offer various career moves to seek new horizons. We trust you with responsibility early on and support you to draw your own career map that is responsive to your aspirations and priorities in life. Join us and bring your curiosity to life!

Curious? Apply and find more information at

Apply Now

Let’s stay connected

Do you want to receive company news and information about career opportunities tailored to your preferences? Sign up here. You want to check the status of your application or change your candidate profile? Enter our job portal.

Redirect

You have accessed https://www.emdgroup.com, but for users from your part of the world, we originally designed the following web presence https://www.merckgroup.com.

Let's go

Share Disclaimer

By sharing this content, you are consenting to share your data to this social media provider. More information are available in our Privacy Statement