Back to talent
Ryan Cocks
Verified
Cleared

Ryan Cocks

Engineering

Transitive

4.0
London, United Kingdom~19.4 yrs in the field
DevOps
Cloud Engineering
Site Reliability
AWS DevOps
DNS
AWS Cloud
Git
Linux
Agile Development
Unix
AWS
Terraform
Trading Software
Microservices Development
Request Intro

About

Ryan is experienced in developing reliable and scalable production cloud systems. He specializes in SRE, DevOps, microservices, cloud architecture, and observability. He has a solid technical background as a back-end developer. He has good soft skills, is self-motivated, and is comfortable networking to achieve project goals. Ryan has an excellent ability to understand the business needs behind requirements and is able to program in several languages.

Experience

About 19.4 yrs of professional experience, estimated from the roles below (overlaps counted once).

  1. Resident Architect

    Honeycomb

    Jan 2025Present

    Onboarding large enterprise organisations onto Honeycomb. Open Telemetry migrations and custom instrumentation. Honeycomb.io setup and refinement. Educating clients on Honeycomb and observability 2.0.

  2. DevOps Developer

    Pfizer - PGS Operations And Insights

    Jan 2025Jan 2025

    Added OpenTelemetry instrumentation across services. Set up Sysdig dashboards to monitor deployed services. Introduced root cause analysis for production issues and got the project owner on board. Instrumented ElasticSearch monitoring for shard issues. Onboarded a project onto Honeycomb for traces and metrics.

  3. Site Reliability Engineer (Datadog Specialist)

    BCG - Gamma

    Jan 2021Jan 2023

    Worked with multiple product teams within the organization, designing their observability (monitoring) solutions. Guided teams on architectural considerations for observability. Defined observability best practices and coached the various teams. Worked to get as close to real-time awareness of customer visible issues as possible. Segmented alerting into different paths for different levels of severity. Developed Terraform to set up dashboards and alerting for Kubernetes clusters and canonical architecture (fe/be+db) applications (Datadog).

  4. Site Reliability Engineer (ECS)

    Toptal Project

    Jan 2020Jan 2021

    Re-architected parts of the system that were vulnerable to high load, resulting in a perfect performance with no degradation during peak traffic Black Friday periods. Launched the new version of their website on the new infrastructure. Completed with only 10 minutes of planned downtime. The total downtime over two years on the project was less than three hours. Implemented alerting and monitoring for the new clusters. Customized Fastly CDN to provide outage mitigation. Wrapped the endpoint for an unreliable 3rd-party API with a CDN-managed endpoint that redirected to a backup if latency was high on the main API. Coached the team to improve their architectural designs according to the twelve-factor app principles and SRE best practices. Created Terraform-managed AWS Fargate clusters for deployed services.

  5. Site Reliability Engineer (EKS)

    Global Fashion Group

    Jan 2019Jan 2019

    Created new Terraform-managed AWS EKS Kubernetes clusters (multi-region). Executed live cluster migrations to new Kubernetes clusters with zero downtime. Broke up a PHP back end into microservices, which improved reliability and scalability. Moved from self-hosted services to AWS-managed ones, improving reliability using Redis and SQL databases. Replaced Jenkins with AWS CodePipeline, which reduced maintenance costs. Replaced legacy storage with S3, resulting in improved reliability. Reworked database usage, eliminating bottlenecks during the high load.

  6. DevOps Engineer and Release Manager

    HERE Technologies

    Jan 2016Jan 2018

    Designed and developed Jenkins deployment pipelines into AWS. Contributed to the programmatic generation of Jenkins pipelines using Job DSL. Set up the production Docker on Amazon EC2 instances. Ran the AWS autoscaling, microservices, Kafka, Flink, and windowed stream processing. Developed IoT-specific testing that fed continuous test data into production. This allowed us to build real-time dashboards to identify which part of a complex microservices system was failing.

  7. Test Lead

    HERE Technologies

    Jan 2015Jan 2016

    Oversaw the analytics and A/B testing using Apptimize and Amplitude. Developed test strategies for mobile devices.

  8. Test Lead

    Auckland Transport

    Jan 2013Jan 2014

    Defined and executed test strategies for a citywide critical infrastructure. Created tooling to optimize work methods.

  9. Test Lead

    Serato, Inc.

    Jan 2012Jan 2013

    Oversaw and mentored junior developers. Introduced tools and processes for bug tracking, test management, peer review, crash report collection and analysis, beta test cycles, and improving the communication between customer support and product management teams. Tested iOS apps. Aided Scrum teams to adopt best practices in their testing and quality control.

  10. Test Team Manager

    IBM

    Jan 2011Jan 2012

    Oversaw the management and technical rigor for a team of 11 testers. This included five products in flight from IBM's virtualization, security, operating system performance, and failover stacks. Changed the way the development and QA teams interacted by focusing on rapid iterative feedback. This reduced the release cycles from 2-3 months down to 2-3 weeks. Successfully oversaw two new major product launches.

  11. Project Manager

    IBM

    Jan 2010Jan 2011

    Managed the development and release cycle for a small software team.

  12. C++ Developer

    Transitive

    Jan 2001Jan 2009

    Developed automated testing infrastructure, including toolchains (cross-linking and bootstrapping build systems), assembly, linkers, CPU, and memory management architecture (SPARC, x86, X86_64, ARM, Itanium), and Linux kernel patching and building. Developed dynamic binary translators that would load binaries for one processor and execute them on another using UNIX kernel interface (syscalls). Acted as the lead engineer on a specialist performance analysis team. Studied the principles of performance analysis and improvement and applied them to solve performance issues when clients experienced lower-than-expected on-site performance.

Interested in working with Ryan?

Tell us about your project and we'll arrange an introduction within 24 hours.