Moving the Insurance Marketplace Infrastructure to a New Cloud
Mafin is an insurance aggregator that, as a car insurance marketplace, accumulates offers from various insurance companies and helps calculate the cost of policies. On the website, you can get insurance for cars, real estate, health, and travel.
The company turned to Evrone for the development of a new infrastructure project and help with creating a B2B platform for finance and insurance industry. Our task was to transfer all services to a single cloud and reduce the costs of its support.
The client had a large cloud and many virtual machines. Previously, developers set up an environment for each pull request, which consumed many resources that were not released in a timely manner. Therefore, the company decided to switch to static environments: a sandbox for developers, staging for testing releases, and a full production environment.
Initially, we developed sizing based on the client's brief, which included transferring a single Rails application, a version control system, continuous integration, automation, and most of the infrastructure.
At this stage, we created infrastructure from scratch in Yandex Cloud. We made a Managed Kubernetes Cluster for several environments, which use their own databases and services and are completely isolated from each other. For processes, we used a continuous integration approach, and for application release automation, GitOps. Flux CD became the main tool for setting up environments and automatic application delivery.
We exported basic infrastructure components, set up a database operator for environments, and configured a closed loop. All services in internal environments were on internal domains for security reasons. For access to the cluster, the company used VPN, we organized the same closed infrastructure and set up a convenient interface for issuing VPN keys to users.
When we moved on to transferring the main Rails service to the new infrastructure, it turned out that it depended on many PHP services, and it was impossible to transfer it separately. The scale of the problem was not initially understood, as the service was a legacy of another project.
After a thorough audit of the application and dependent services with the client's team, we found that we needed to transfer nine services instead of one. Each came with MySQL, Mongo DB, Elasticsearch, ClickHouse databases. The client approved the new plan, and we got to work.
Together with the in-house team, we started debugging. For some services, the configuration was hardcoded directly into the code, others were not configured at all. Some services belonged to the backend, others to the frontend. However, thanks to our joint efforts, we managed to understand the legacy and transfer everything necessary to the new infrastructure, configure it, and set up automation for deploying each service in a unified style.
For the new infrastructure, we connected the Sentry error catching system in runtime, which also provided performance metrics. During the first launch in production, we faced the fact that not all partners reconnected to our new IP addresses. Because of this, a large part of the business logic failed: calculators didn't work, responses from insurance companies didn't come. Once formalities were settled, the system worked correctly.
Evrone engineers focused on the new environment, set up the infrastructure, backup, and emergency recovery processes, and wrote documentation. Colleagues took on the reconfiguration of the old environment to work with the new cluster: interaction with the security system, firewalls, and routing. At the same time, they learned the processes, templates, and methodologies of the new infrastructure, so now they freely support it with their own forces.
Despite the project not going as planned, we managed to solve the main task of reducing expenses on cloud resources. Monthly maintenance costs were cut by three times. Now, 30 virtual machines are operating instead of a hundred at the start. From the old ones, proxies for traffic to the new environment and Nexus as an artifact storage remained. We transferred about 2.5 TB of data to the new cluster, set up monitoring and analytics. We managed ETL processes using Apache Airflow.
Thanks to this project, we have restructured our approach to work. Now we conduct an audit of the current infrastructure or the entire project to calculate risks and give clients an accurate estimate of deadlines, workload, and costs. In this case, even the client did not know all the nuances of the project, so in the future, we will try to reduce the number of surprises for both sides.
Contact us if you are looking for an experienced DevOps team to support you in cloud migration services, moving server to the cloud or creating a new cloud architecture for your project. We will help solve problems and consult on reasons to migrate an on-premises solution to the cloud.