How to survive the end of the month in the direct selling industry: a few words about performance optimization
Today, nobody needs to be persuaded that technological issues have a direct and profound impact on the execution of business objectives. However, as our experience tells us, each industry faces its own characteristic and unique challenges which often give engineers a real headache.
In the case of the direct selling model, that unique challenge is the handling of increased online traffic (traffic peak) that occurs at the end of a settlement period . Despite the challenges posed by this problem, it is a welcome one due to its association with a high volume of sales. Therefore performance optimization becomes a key factor.
During the peak, the number of simultaneous users of the system drastically increases. Visiting the system, they execute a variety of business scenarios that are critical to the direct selling business, such as exploration of the product catalogue, completion of huge orders, intensive monitoring of their own businesses, and motivating their partners to participate in the structure they build.
As we’ve noted, such cumulation of activities at the end of a settlement period has its justification: large volume shopping also means achieving the so sought-after revenue. But it cannot happen without intuitive and useful tools for distributors which can work flawlessly under heavy load. Any reduction to the system’s responsiveness may leads to the loss of valuable users.
Peak - reward and challenge
The increased volume of shopping in the e-commerce system is a reward for successful marketing campaigns that have attracted new system users. Without online distribution support tools it would be almost impossible to achieve this increase. High traffic at the end of a settlement period is an invaluable benefit of a properly digitalized direct selling business.
But what if our system is not responsive enough? It’s public knowledge that a long delay in the loading of a website is an unwelcome inconvenience to potential users. What’s worse, they may turn to another seller. It’s a very serious threat to business, so we should never underestimate the role of optimization. The process of identification and removal of errors that may potentially slow down customer-system interactions must be continuous - performed on a regular basis through the use of a rich set of state-of-the-art tools supporting software profiling and optimization.
To better understand the essence of this issue we should consider what, exactly, is the meaning of performance in the context of the IT system. We define it as the number of requests the software can handle per unit of time. For web applications, it can be measured either in RPS (requests per second) or - considering the physical size of the transferred data - in Mbps (megabits per second).
Another important issue is the software’s responsiveness. In the internet world, responsiveness is measured as the time that transpires between a user’s HTTP request(e.g. clicking a link, filling and submitting a form) and that user’s reception of a response (as rendered by the browser).
There is a close correlation between responsiveness and performance. Improving the system’s responsiveness is synonymous with increasing its throughput and parallel request processing capabilities. How do we approach the problem of performance at e-point?
Optimization as a continuous process
When it comes to ensuring a highly responsive system, we must understand that performance optimization is a continuous, structured process. Our process includes an iterative cycle consisting of 5 stages:
- Monitoring the system’s parameters:During this stage we obtain information about the health of the system and possible causes for concern in the area of performance.
- Analysis of the causes of problems: Determination of which components are unable to handle heavy loads, what are the underlying problems, and how to fix them.
- Optimization of the application: Introduction of corrections appropriate to the diagnosis made in stage 2. At present, engineers have available a wide variety of corrective options which is constantly expanding.
- Performance tests: Conducted skilfully, performance tests allow us not only to learn the system’s limits but, first of all, to verify whether the corrections introduced have the expected result without introducing any unexpected challenge to the system’s responsiveness.
- Implementation of optimization to the production system.
Monitoring of the system
As the old saying goes, a chain is only as strong as its weakest link. It’s the same with IT systems. You cannot perform efficient optimizations without knowing which element of the infrastructure processes user requests most slowly and what is the cause of the slowdown. Today’s large, complex systems consist of so many components that, without a precise monitoring mechanism, any attempt to speed up the software is like groping in the dark.
That is why at e-point we build very accurate monitoring and early warning systems for programmers and administrators on duty. These systems not only provide the ability to pinpoint an ailing element of the existing infrastructure, but also to anticipate its failure by introducing performance improvements, scaling, and changes to configuration.
To build a good stack of monitoring tools you don’t need costly licenses or expensive, paid software. The most important thing is our knowledge of the technology being used in the project and the experience of our engineers. A group of open source projects written in Go by InfluxData offers an excellent software support here. It includes the so-called TICK stack, i.e. four components giving a full-fledged monitoring mechanism that consists of: a plug-ins based Telegraph server for data collection and storage from various sources, an InfluxDb database optimized for storage of time series information, Chronograf - a visualization tool (may be replaced with the more functional Grafana), and Kapacitor, which allows notifications about emergency situations. It's an absolutely basic and minimal set, without which one should not even think about publishing an e-commerce system to a wider group of customers.
What should be monitored? All the most important technical aspects of the software’s life and health. To illustrate, examples of measured objects are, among others, thread pools and database connections, degree and nature of memory and processor use, behaviour of de-cluttering mechanisms, class loading, lock and jam control when accessing shared resources, and much, much more. In addition to the TICK stack mentioned here, developers and administrators now have at their disposal a wealth of additional tools for profiling, debugging, and extracting information from running systems. Such a workshop should be constantly improved and expanded with modern solutions.
Mechanisms of optimization
Once we've isolated the infrastructure components problematic to the processing of user requests and identified the causes for those problems, we can begin to make improvements. Today's engineers are limited only by their imaginations - they have at their disposal a very wide choice of solutions that can be used to improve the system’s responsiveness. Performance optimization is a constant pursuit of excellence and speed, and improving system performance gives developers great satisfaction.
It is impossible to list here all the techniques that are commonly used in optimization, but it’s worth mentioning that they can be applied at different levels of the software architecture. Certain modifications can be made directly to the source code, e.g. cache integration, reactive programming, hystrix circuit breaker systems, asynchronous processing with the use of message brokers. Others may involve optimizing the database itself through the introduction of views, indexes, advanced DBM configuration, and the use (in the code) of non-standard transaction isolation levels, materialized queries, and even definition of their own analysis. In recent times, we have also practiced decomposition of monolithic systems into easily scalable microservices, and even separation of read and write operations - the so called CQRS. You can reach even further. Content Delivery Network (CDN) class systems are commonly used and provide caching for sub-pages of our systems in a distributed geographic location, as well as protection against DDoS attacks.
Of course the aforementioned optimization mechanisms are only a few among the multitude of actions that can be taken to improve responsiveness. Please be aware that each of them must be selected by an experienced engineer and adapted to problems of a particular nature. Only then can the end-effect be really impressive.
It’s not easy to create a good performance test. In order to provide reliable information about the system's resistance to an increased load the test must meet several important requirements. The first and the most important is a realistic reflection of user behaviour during periods of increased traffic. In order to prepare realistic scenarios we must first explore the logs of HTTP servers and identify portions which correspond to the specific circumstances over which we wish to test. At e-point we wrote a number of scripts to automate this process, which we now use successfully in various projects.
Well-prepared performance tests reproduce, in a maximum repeatable way, errors in synchronization of access to critical sections in the concurrent code. At the pre-production level, any possible starvations or deadlocks can be detected only through such testing. This kind of test allows us to prepare for failure-free handling of the increased traffic long before the system reaches the desired popularity.
Never forget that performance tests are as important as any other types of tests: manual, unit, or integration. They should not be neglected. I recommend that the process of their fine-tuning and maintenance be carried out continuously, and that their deployment takes place before each production deployment. It is also imperative that the software provider has appropriate machines on which to run these tests, incrementally increasing system load until the system goes belly up, thereby allowing us to identify its limits and locate the weakest links (least efficient components). This is invaluable knowledge when planning for expansion of the system’s infrastructure.
Paradox of optimization - the better, the less visible
In conclusion, I’d like to share a certain reflection on the subject of performance optimization from the business point of view. Technologically, this issue is very complex, and despite the wide range of support tools, engineers have to spend a lot of time in order to properly improve the systems they’re building. This work does not provide the client with any new system features, and, for a less experienced manager, the time devoted to it may seem to have been lost. Nothing could be further from the truth. By constantly optimizing the software, we guarantee our clients peace of mind and confidence that, when the purchase craze reaches its peak, there will be no performance issue to discourage their customers from using the software. The resulting revenue will be the crowning of the investment made.