Why MVP is Important
- Register;
- Log in;
- View ads or find a specialist;
- See integration with CRM;
- Receive SMS or push notifications, etc.
All the main user steps should work without bugs. This is especially true for the payment process. Any bug in the MVP is painful, and there should be few of them. We can overlook some rough edges in the process or manual actions, but everything should work.
That's why an MVP should have a minimal number of viable features. You quickly launch a quality product with limited functionality, which allows the business to test a theory or capture a piece of the market, while also preventing you from burning out due to a 120-hour work week.
What challenges await the team
- Vague project requirements at the start.
In the beginning, the client usually does not fully understand what the project will look like, not just in terms of design, but also business logic. They may not know the payment flow, what sections should be included, and what aspects of the initial scope are truly important. Almost always, nuances are lost at the start, and the final scope expands.
The fact is that the business is also evolving. Initial ideas do not always make it to the end. They develop and change in priority. Therefore, new important features periodically arise, without which the project cannot be launched. Sometimes, the opposite happens: a feature that the team struggled with for a week suddenly becomes unnecessary. - Errors in implementing integrations or object models.
It is important to understand that the decisions you make at the start will live with the project for a long time. Changing the object model is difficult, as you do not have time to rewrite all services tied to it. The same goes for architecture. If you make a mistake, you will likely have to live with it until the work is completed. - Hidden issues: network infrastructure problems, database usage not allowed per company policy, sanctions, weaker servers than required, etc.
There can be many non-obvious problems. For example, in a project in Kazakhstan, we were surprised to learn that the local cloud provider did not have Kubernetes. We had to test and deploy it together with them, which took 2 weeks. - Due to the problems above, the initial architecture of solutions often suffers.
Architecture is probably the key factor in your project. How you build the system, which databases you choose, who will be the broker, how services will communicate, and which modules will be used. A mistake at this stage significantly increases the number of hours spent on debugging and refactoring in the future. - Any mistake in CORE is very costly.
By CORE, I mean the modules or services of your system: MDM, IDM, and other abbreviations on which you build service wrappers for various needs, or key packages that integrate into all system services.
The problems above may seem like issues for a manager, team lead, or architect. However, at this stage, the foundation for future developer problems is laid, so understanding all these circumstances is essential.
Main mistakes in MVP
- Incorrectly defining the scope of work.
A recent example from practice: at the beginning of the project, we agreed with the client that a review service was not so important at the start of the project because there would not be significant traffic. Therefore, reviews could be postponed. We agreed to leave a simple form where it would be possible to rate the session and write a comment.
Later, a second mistake occurred. - Not fixing the scope of work.
Eventually, this service expanded even before leaving the MVP stage. The ability to choose multiple feedback options was added, each with its own problem checkboxes. As a result, we first implemented a not-so-important feature for the MVP and then expanded it further.
It may seem like a quick task, but there are many small tasks like this. A developer spends a couple of hours on it, a tester tests it for half an hour, front-end developers fix it for an hour and a half, and the team lead validates it in the same time. And suddenly, a 30-minute task consumes 4-6 hours.
When there are 30 such tasks, you will spend weeks on what could have been done without stress and haste. - Incorrectly choosing the architecture.
- Spending too much or too little time on specifications.
From this list, you can only influence the final scope and its quality. Any developer is a highly intellectual unit, and their opinion should be considered. Do not hesitate to tell the team lead that half of the services can be removed from the MVP because they are not important. Or that a service can be simplified, made worse, but it will work. And after the MVP, you can return to it and refactor.
It is essential to do it with quality and do it quickly. The rest can be caught up.
Packages and Utilities
SSO and Internal Requests
- On whose behalf we are making the request from the cart to the acquiring service — on behalf of the user or the service.
- Where the neighboring service is located.
- Ensuring token proxying,
- Ensuring communication between services,
- Encapsulating service addresses.
For configuration, we need to obtain the addresses of neighboring services from the outside – from our DevOps. Environment variables that are passed to the container during assembly are perfect for this. Inside the code, we only need to define a settings class and put our envs in it. Pydantic models are great for serializing this data.
Proxying the user's token is a fairly simple task. You need to take the header from the request and transfer it to another one. But what if a background task is being executed, and we need to make an interservice request without the user's token? This is more complicated. You could simply configure internal networks in Kubernetes, which will only be accessible during communication between containers, but your information security specialist is unlikely to allow this.
To optimize the speed of operation, we also decided to store the token directly in memory and work with it directly. If the token has expired or is absent in memory, the client re-authorizes the service in SSO and updates the token. At the same time, we leave the option to save the token in the cache, not in memory.
Logging
But we should not simply set up a standard logger. In microservices, it is good to have the ability to view request tracing from entry to exit:
- through which services the request has passed,
- how many requests were made to neighboring services,
- how long they took to execute, etc.
Unlike monolithic Django, it is not so easy for us to find out where the longest execution time was since there is interservice interaction.
"Integration finished. Time %%%%"
"Integration error: %errors"
The same principle should be used for regular logs that you implement. Replace the standard logger so that it can collect more metadata and convert them into JSON strings. Without logs, debugging the system will be difficult.
In fact, logs are always written in our systems for all interservice interactions. This is also implemented in a separate client, which is the basic component of our SDKs.
If you unfold the init in the S2S client, you can see that several clients are created inside the S2S client during initialization: for communication with SSO and with the service. And these are clients inherited from the BaseClient.
BaseClient, in turn, is responsible for the standard behavior of requests: SSL certificates, retry policy, timeouts, and so on. This is another abstraction over requests that allows for unified communication between services. And another important task it solves is logging messages.
I described one of the ways you can wrap your requests. With each request, a record is made of where and what request was sent. All errors are also logged with exhaustive information about the data used in the request. Both the request and the response are logged.
Then, all this is output to stdout and stored in ELK, and in the future, this log is used for debugging and monitoring.
Auto-tests
In the Django world, there is the Pytest-Django library. It provides convenient tooling primarily for working with databases. Since Django has its own ORM, Pytest-Django allows for deep integration, providing various handy tools out of the box for working with databases, including when running tests with Xdist. Frameworks like Fastapi/Flask/aiohttp do not dictate tools and architecture to us and imply independent choice of various tools to build an application. Unlike Django, it takes a considerable amount of time to write your own test framework. We built our framework with Pytest-Django in mind, so we had to implement the following functionality:
- creating a test database;
- setting up migrations;
- correct transaction handling for rolling back changes in each test;
- cloning the database for each worker when running with Xdist;
- various convenient JWT token generators with necessary permissions/roles.
Creating a good test framework with all the necessary fixtures and behavior takes time. But in the future, it gives a strong boost to writing tests. Moreover, it is an excellent way to teach the team TDD. Without a good framework, it's difficult, but when it's well-done, even junior-level specialists can create their tests by looking at other tests. Another non-obvious advantage is that debugging with tests is significantly simplified, as reproducing a problem in a test using code becomes easier than manually.
- Unit tests for key functions and classes of the system.
These are needed primarily for confidence that unnecessary changes won't break all services. - Tests for API endpoints.
This choice is primarily due to the fact that testing the API allows us to test as many application layers as possible. Then, if finances and time permit, each layer can be tested separately. - Deterministic data.
When testing API endpoints, we try to make all input data required for processing user requests as deterministic as possible. - Comparing with reference results.
As a result of the previous point, ideally, we should compare the HTTP status and response text completely, one-to-one. If the tested endpoints start returning more or fewer fields or discrepancies in the format of some fields are found, the tests will immediately signal this to us. - Mocks.
A lot of mocks of responses from various services. It's better to mock service responses rather than classes/functions to test how your clients work.
For example, our standard test looks like this: it loads the main fixtures, where all external requests are mocked. After that, we simulate a request to create a contract with specific data. We check the response status and compare the received data with the expected contract.
- tests for Healthcheck endpoints;
- tests for checking endpoints with different debug information;
- tests for endpoints that always "fail" to check Sentry;
- ladder migration tests (the idea is borrowed from Alexander Vasin of Yandex): apply one migration, rollback, apply two migrations, rollback, etc.;
- tests for checking the correctness of permissions at endpoints.
- when requesting without a JWT token;
- when requesting with an incorrect authorization header format;
- when requesting with an incorrect JWT token (expired, mismatched signature);
- in the absence of access rights.
The last point is the most important, as it helps to understand where we forgot to apply the necessary permission or check the rights correctly. It's especially good that such tests help mitigate human factors or simply highlight to a new person on the team that they did something but did not apply the correct rights check.
In this test, all_routes is a function that returns a list of tuples like ('url', 'method'). Moreover, it generates URLs with correct path_params: for example, for a URL like "/payments/{payment_id:uuid}/", it generates a random UUID. Even if an object with this ID does not exist in the system, it's not that important. What matters is that it is a correct existing URL for the framework, and we won't get a 404 error.
Asynchronous tasks
- Dramatiq works under Windows;
- Middleware can be created for Dramatiq;
- Subjectively, Dramatiq's source code is more understandable than Celery's;
- Dramatiq supports reloading when the code changes.
Considering all this, our task code was monstrous, with various hacks and hard-to-debug bugs. Therefore, we wrote our own producer-consumer implementation based on the Aiopika library. Hacks for running asynchronous code disappeared from the code, the ability to add our own Middleware appeared, but for the worker.
And since we can now natively work with Python's asynchrony, our worker can now process not just one task but several at once. It looks more or less the same as it would have been in Celery or Dramatiq.
Recommendations
There are always many issues with them; you need to constantly scan the code to prevent critical vulnerabilities from appearing.
Load testing.
It is essential to conduct it. It is difficult to predict in advance how much load your system can withstand. Often, problems arise after exceeding a certain RPS.
S3 storage.
In monolithic architectures, you can always manage files in a separate package. Most often, the file size, allowed extension, file executability check, and limits on the number of file uploads are checked. We also need to check whether the file has expired, clean the storage on time, etc.
In microservices, this package is a separate microservice. This means that you will have to aggregate all the file processing logic in it. This incurs overhead. We need to know which service the file was uploaded from and its metadata. The file has to be uploaded directly to the S3 service. Thus, the microservice, according to its business logic, requires uploading a file (photos, docx, excel, etc.), knows nothing about the file, and only has metadata.
Consequently, we will need a separate asynchronous data synchronization procedure: informing the microservice where the file is located (its URL), the identifier of the file, whether everything is okay with it, etc.
Be sure to use auto-generated documentation.
FastAPI comes with it out of the box. In Django, there is Django-yasg. Ready-made auto-docs save a lot of time for frontend and mobile developers.
Do not neglect typing.
An excellent way to ensure that you are using packages and classes correctly and not making mistakes.
Write automated tests.
Where there is a lot of communication, they are indispensable. For you, this will be protection from extra bugs and an excellent tool for development.
Ask colleagues for help if you are stuck on a question.
It's not shameful; it's necessary. This way, you will complete the task faster, not miss deadlines, and not engage in self-flagellation. Brainstorming is an excellent practice; use it.
Set up Sentry.
It is a simple and powerful tool that is easily set up in standalone mode. Sentry is easy to configure in any framework. Implementing it in a project takes no more than 30 minutes.
Lock library versions.
By default, our projects use Poetry. In it and other dependency managers, there is an option to specify the minimum library version. But there's a small catch. Usually, only the minimum library version required is specified. But this is bad. When there are many libraries, especially popular ones, the chance of catching a package conflict is higher.
If you have any questions, feel free to ask in the comments.