Developing an A/B testing platform for a streaming service

Here’s how we developed an A/B testing platform for a multimedia content provider, that allows analysts to manage test data, monitor metrics and track metrics overlap.

September 2022 6 mins

START is an online cinema partly owned by Megafon. In 2021, START turned to Evrone with the task of creating a separate service for A/B testing. Its function is to accept a request with a user ID, distribute it among current experiments, and give this information back.

START needed a user-friendly interface where analysts could set parameters themselves, make groups, and start and end experiments at a given time. It had to be connected to a service that collects the necessary data about users, on the basis of which it would be possible to collect groups for testing.

How A/B tests work

A/B or split tests help to check how changes in the operation or design of the service affect the user, if at all. For example, button color: during the test, some users will see a red button instead of the usual green one.

Small groups of users with given characteristics are selected for testing. This helps to calculate whether the changes will take root. After all, the more complex and older the product, the more careful you should be with innovations, so as not to lose your audience and money. On the other hand, A/B tests are usually not performed shortly after launch, because the services do not have enough control data.

You also need to take into account the complexity of the changes being checked. For example, conclusions on a sharp change in the color of a button can be drawn easily and quickly — this change is striking. But less noticeable settings require a longer experiment. In addition, it is important to take into account the activities of users at different stages of the test. Sometimes, people start using a new feature, but eventually return to the usual scenario. Statistically, this will not affect the results.

Metrics, or more precisely, the change in their values for the compared groups, are the main tools for obtaining conclusions about the course of the experiment. They can also be checked if, without changing anything, you apply a new metric to the group. If the results do not agree with the control group, then something is wrong or it is sensitive to parameters that are not important in a particular experiment.

How the service works

All these peculiarities are familiar to analysts, but for this project, our developers also had to figure them out and determine the best tools for A/B testing for media and streaming. Therefore, we used Domain-Driven Design (DDD). Despite the fact that the service does not consider test results, you need to make sure that all inputs are taken into account and used correctly, otherwise the results will be irrelevant.

Backend

The backend of the split testing platform is written in Python, using the FastAPI web framework. We chose a standard set for test data management with Python SQLAlchemy and PostgreSQL.

By the way, it was thanks to Alchemy that we managed to reduce the amount of logic processed on the backend. Usually, along with the root structure, which in our case is an experiment, all its sub entities are also taken out of the database. With Alchemy, you can control exactly which pieces of the experiment are needed, and get them from the database for a couple of request units.

Kickstart your startup like Start. Partner with us to bring your innovative ideas to life with cutting-edge technology.

Let’s talk

The backend is connected to the frontend via the API. Another API is connected to the service, from which it unloads user config values — based on these, groups for tests are formed.

Dynamic admin panel

The original plan for the A/B tests platform development was to find the simplest possible solution for the admin panel, and it was planned to make it static — where the page is updated after each change. But we realized pretty quickly that a dynamic admin panel would be a more optimal solution.

Since there are so many parameters, it can take a long time to write them down. Therefore, it was decided to add the possibility of phased filling in the experiment form. The analyst could write down some of the parameters, save them, and return to work later, adding new values.

Therefore, we chose to create a React admin panel. This is a modern solution that will be easy to maintain in the future. The admin panel itself is made up with Bootstrap. The interface has experiments (names), epochs (terms) and filters — user attributes: gender, age, where they came from. On the right, there is a hint as to which fields are already filled.

Intersections

When there are many experiments, they can overlap — that is, affect the same parameters. This is normal, but it is important for an analyst to take this into account in order to draw the right conclusions. If people do not see which experiments have a collision, strange artifacts can occur. Our service can track intersections and warn the user about them.

Expanding the experiments

When an experiment on a limited sample has shown a significant effect of one change over another, you want to please more users — sometimes all of them. To do this, it would be convenient to have a mechanism by which one could gradually increase the initial sample size.

In terms of a classic A/B test, this is undesirable behavior. Until the experiment is completed, the base of its accounts should not change much, so as not to spoil the statistics.

Therefore, for gradual rollout, you need to add a parallel design, which is very similar to the experiment, but it has only one group with a configuration of variables that have successfully proven themselves in completed experiments.

Adding to the complexity is the fact that this new entity needs to be expressed in existing terms, reusing only the components. And also it is necessary to guarantee a deterministic order of work of different configurations, so as not to generate bugs that are then difficult to catch.

Result

We have fully completed the A/B testing platform for the multimedia content provider, creating a service for split tests that allows users to, not only to conduct experiments, but also monitor their progress, deadlines, and possible intersections. As an additional feature, they also have the ability to distribute tested changes to all users. The product is fully tested and accompanied by the necessary documentation, and the client took care of the integration.

If you are also interested in building a special service that will allow split testing for streaming apps or other products, send us a message! Thanks to our knowledge and experience with A/B testing software engineering, we can advise you and offer the best solution for your case!

Client Review

Thanks to Evrone, we now have a convenient service to test our theories and make START better for our users. Thank you for helping to choose the optimal technology stack and promptly agreeing to implement more functionality!

Kirill Evseenko

CTO, START