Site Reliability Engineer (London) Moonpig
Moonpig is an innovative, tech business with a mission to create closer connections by making moments real for our customers. Our culture is central to our success. We’re driven to sustain our phenomenal growth year on year, and this means we’re always working closely and collaboratively to turn our ideas into reality. It’s this sense of pace, innovation and improving pretty much everything we do, that makes Moonpig so exciting and unique - we truly believe our work has a genuine impact.
The DevOps team are responsible for the overall availability, performance and reliability of the entire Moonpig platform. At peak times, we process up to 300 orders a minute. As each product is highly personalised and emotively connected, we cannot afford to get things wrong.
We are a company that loves new technology. We run a diverse platform that is in progress to be completely migrated and be 100% hosted on AWS, utilising the best of what it has to offer, coupled with our own tooling this allows us to embrace Continuous Delivery, DevOps and Cloud environments to our full potential.
What you'll be working on:
As a Site Reliability Engineer at Moonpig you will play a critical role in the delivery and optimisation of our eCommerce and Production Engineering infrastructure. The Technical Operations team at Moonpig are responsible for the end to end health of our systems and services that are required to be available 24x7. You will have direct communication lines to solution architects and software engineers operating in an agile environment to drive availability, performance, security and cost.
Being a leading member of the Photobox Group Technical Operations team, you will be expected to work closely with colleagues in Software Engineering, Product Management and Architecture. You must be able to work as part of a team or independently and be able to effectively prioritise work items. This is an excellent opportunity to work within an extremely dynamic environment and the role expects you to be efficient and engaging. You will have the opportunity to have a direct influence on day to day operations, driving automation and transition of infrastructure to cloud service providers.
As a Site Reliability Engineer you will be responsible for delivering a secure, performant, resilient, cost optimized, high availability environment; you will be involved in the design and architecture of the infrastructure required to deliver and sustain our portfolio of online services and you will be responsible for the build and configuration of tools to enable automated deployment, management and monitoring of our online services.
- Enterprise Technology: Experience with highly available, high transactional websites and applications within microservices architecture, clustered systems, N+1 architecture, automated deployments, disaster recovery and business continuity.
- Operating Systems: Microsoft Windows Server (Including Active Directory, DNS, DHCP, IIS), Linux
- AWS: EC2, S3, Lambda, VPC, CloudWatch, ALB, Terraform
- Automation, Scripting and Infrastructure as Code: OctopusDeploy, Team City, Powershell, Ansible/AWX, Bash, Python, Ruby, GitHub
- Monitoring: Prometheus, PRTG, AppDynamics/New Relic Insights/APM, Elastic Search, Kibana, Grafana
- Production experience with frontend web services including IIS, Apache and NGINX.
- Experience with e-commerce and website operations (WebOps).
- Experience with monitoring & web analytics tools.
- Experience working with Public DNS and Certificates
- Akamai / CDN experience.
- Understanding of Networking, TCP\IP, Firewalls, NAT Instances, application load balancer and traffic management.
- Firm grasp on security and its importance within a cloud environment (PCI-DSS/SecOps).
- Understanding of database technologies such as MSSQL, DynamoDB.
- Understanding of DevOps and Agile methodologies.
Want to hear more about Moonpig and our benefits? Take a look at our dedicated hiring site https://hiring.moonpig.io/