As a Site Reliability Engineer, you will be the driver of SRE + DevOps core values: You’ll bring thought innovator and disciplined execution to spearhead the Site Reliability Engineering role that is responsible for an Enterprise LIMS.
What you can expect:
Manage unplanned downtime - a system in motion (Production)
• Performs advanced troubleshooting and triage to insulate build teams
• Partners to define service level objectives and implements monitoring tools, alerting, and dashboards
• Tunes alerting and logging to reduce false positives and false negatives
• Collaborate with Infrastructure team for service capacity planning, monitoring, and demand planning
• Analyzes system health, errors, and run-time statistics to provide input into the development roadmap
• Spend 50% of the time developing system configurations and defect resolution to improve overall stability, performance, and scalability
Manage unplanned downtime - a system in rest (Non-Production)
• Creates dashboards, metrics, and code analysis tooling for early detection and prevention of defects
• Performs and directs peer reviews to ensure compliance with best practices and adoption of the technical roadmap
• Automating, creating, reviewing, and executing deployment plans
Manage Technical Debt work – Establish Governance Model
• Setup rigorous quality gates in place to have a strong focus on the quality release
• Build and run capacity tests to manage the volume growth of users on the application
• Ensure scalability, availability, reliability, and security is delivered to all internal and client-serving systems
Foster DevOps –
• Work on DevOps toolchain (GIT repository, DevOps tools like Jenkins, UCD, Docker, Kubernetes, J-frog Artifactory, and Cloud monitoring tools)
• Collaboration with Development/QA/Release teams to achieve the continuous integration and delivery process.
• Build, release, and configuration management.
• Suggesting architecture improvements, recommending process improvements.
• Engineer automation solutions within Dev-Sec-Ops.
• Implement continuous build, integration, deployment, and infrastructure as code systems.
What you will need to succeed:
Experience with building & maintaining complex, scalable, and distributed systems
Good understanding of Software Engineering and Computer Science principles
Ability to formulate SRE & DevOps scripting solutions from scratch
Experience working with AWS Platform and serverless technologies
Experience with Maven and CI build tools such as Jenkins is a must
Hands-on experience in Linux administration and troubleshooting
Ability to define SLIs and SLOs
Additional experience in networking, security, or storage is an advantage
Experience designing and building REST APIs
Nice to Have Experience:
Experience working with Google Cloud Platform (App Engine, Cloud Datastore, Big Query)
Education: Bachelors Degree