Sr. Site Reliability Engineer

San Francisco,CA

About:

 

My client is the world’s leading personalization provider. Their cloud-based solution empowers the world’s best companies to personalize their customers’ experiences resulting in over $24B in attributable sales.

 

Job Description:

 

We are looking for a Site Reliability Engineer to join our Site Operations team, which is responsible for the support and maintenance of our SAAS Platform. This position will help my client scale to utilize the petabytes of data retailers bring into our systems, all while increasing efficiency, speed of delivery, and reliability at every level of service. One of the most common problems with Big Data sets is facilitating access to them (especially with nascent technologies), and our team constantly identifies and vets new ways to enable our customers, internal and external, to see and use data.

 

Responsibilities:

 

· Support of production infrastructure & services, including our worldwide APIs & Hadoop-based storage & processing
· Automation of configuration, management & deployment (Puppet, Docker, etc).
· Troubleshooting production and development issues, top to bottom, including application, network, & hardware.
· Management of clusters, all types: Hadoop, Mesos, Riak, Cassandra
· Implementation of service monitoring.
· Elimination of repetitive tasks.
· Java application support.
· Participate to 24/7 on-call rotation.
· Documentation and internal education of new projects

 

Required Skills:

 

· BS/MS in CS or Engineering discipline or equivalent experience.
· Linux administration (preferably RHEL/Centos).
· 3-5 years production internet SaaS systems management & administration.
· Understanding of distributed storage and computing frameworks, with experience in at least one of Hadoop, Mesos, or Riak.
· Basic SQL, familiarity with basic database administration. (PostgrsSQL a plus)
· Familiarity with Nagios, Datadog, Ganglia, and other monitoring tools
· Scripting utilizing BASH, Perl, Python, or Ruby.
· Experience implementing opensource packages. Active contribution a plus.
· Team player—it’s a small team, we rely on one another.
· Great communicator—we act as a team, we celebrate successes together and we brainstorm our way together through challenges.