Paweł Wieleba

DevOps Engineer, System Security Specialist

Why We Implemented Invoice Financing Using OpenShift

While building a microservices-based Invoice Financing system for ING, we had to choose an appropriate runtime environment technology. To reduce costs and ensure a high level of security and accessibility, we decided to implement containers.

The entire system was deployed on RedHat’s OpenShift platform using Kubernetes and Docker.

Requirements and Project Context

First, we need to say a few words about the project. This will help explain the decisions we made:

  • The project was implemented using Scrum/Agile methodology.
  • The system architecture was based on microservices, which reduces the workload and the time required for implementation.
  • The project included very strict security requirements, which is typical for systems in the financial sector.
  • The infrastructure would be maintained on the AWS Cloud.

Choosing a Container Technology

We began by analyzing the service maintenance models available to satisfy ING’s needs and those of the e-point implementation and hosting teams. Among others, we considered the classic model, in which services are launched on physical or virtual machines and separate accounts are used for individual microservices. However, this solution would generate high maintenance and development costs. That’s why we decided to use containers with the Docker engine.

How Containerization Works

Containerization allows us to package an application, its own operating system, and any dependencies inside an image. Based on these images, containers containing a running application are created at startup. The same images (i.e. the same operating system with the application and its dependencies) are launched on different test and production environments, which guarantees that applications have an identical runtime environment and reduces the risk of errors. At the same time, isolating entire systems for each process, application instance, and microservice significantly improves security.

Why Not Use Standalone Docker Engines on Multiple Machines?

Once we made the decision to use containers, we had to choose an architecture model for the container infrastructure. One option was using standalone Docker engines on separate physical machines. This approach, however, is characterized by numerous imperfections. It can lead to integrity and application update problems in the case of node failures. In this model, if an application is updated while a node is turned off, the node will contain an outdated application version at the next startup. We had to avoid such situations.

There was another problem with this approach. If we needed to change the number of applications, scale up the system, or implement additional microservices, manual changes in configuration would have to be performed by developers and DevOps engineers; similar work would be needed for hosting on front-end HTTP servers and on firewalls. To prevent conflicts, we would also have to keep records on things like open TCP ports.

A third risk with this approach involved granting and maintaining proper access rights. Access to the entire engine would be possible (e.g. via the Docker API), which would also provide access to the physical or virtual machine’s operating system. Due to the number of people and teams involved, we wanted to avoid such situations.

As far as security was concerned, passwords for databases, external systems, and other components would be stored in plain text on hard drives on the server, just as they would when running machines with separate accounts for individual microservices. Both of these models could cause problems with maintaining application configurations.

We considered the above-mentioned limitations and difficulties in the implementation of a high-availability environment and decided to use a platform for running containers. We examined Docker Swarm, but its stability left a lot to be desired, especially in terms of networking. Apart from that, Swarm’s access control granularity, authorization policies, firewall settings, and container management capabilities were insufficient for our purposes. The lack of dynamic storage management was also an issue.

The Infrastructure Architecture

OpenShift

Having rejected the above-mentioned solutions, we decided to use a PaaS (Platform as a Service). Still, we had to take several factors into account.

First, the platform would have to integrate with Cloud services to provide dynamic provisioning and other functionalities.

Secondly, all of our nodes would be running on Linux systems. For security reasons, we were only able to consider the distributions which supported SELinux. SELinux is natively run in RedHat systems and was developed mainly by the RedHat team, which ensures its stable operation.

Two additional factors in our decision were the stability of the Docker engine itself and the need for efficient cooperation with the OS kernel and the base system.

We narrowed down the choice of orchestration technologies to Kubernetes-based solutions. After further analysis, we chose OpenShift. The following criteria were considered during the selection process:

  • Security
  • Stability
  • Efficiency
  • Functionality

We noted the following advantages:

  • Kubernetes is stable and provides high availability of maintained systems.
  • The availability of numerous extensions enhancing the functionality of the platform, including Routers, DeploymentConfigs, a built-in image registry, Hawkular Metrics, and the EFK stack (ElasticSearch, Fluentd, Kibana).
  • There is a good integration with various elements of the entire ecosystem: operating system, PaaS platform, and management software.
  • Scalability is relatively easy. There is no need to change the application configuration or other parts of the architecture in order to increase the number of Pods (implementation units comprising single or multiple containers). All of this can be done by a DevOps engineer with no infrastructure administration involved.
  • Adding computing power to the cluster can be achieved by simply adding another machine. There is no need to reconfigure the application or involve developers, and no system interruptions are caused.
  • New environments (projects) can be quickly and easily added to the cluster.
  • Resources assigned to services can be limited so that one service doesn’t starve the others.
  • Kubernetes supports centralized user identity management and extensive RBAC (Role Based Access Control).
  • Updates are uninterrupted and rollbacks are available if there is a need to quickly return to a previous version of the application.

Also, OpenShift is developed by a well-known manufacturer of enterprise-class solutions, which increases its credibility and reduces design risks.

Diagram 1. OpenShift Projects View

The RedHat family of Linux distributions was chosen as the node operating system for the following reasons:

  • The high stability of the Docker engine available in RedHat systems, as compared to other engines tested at that time.
  • Native SELinux support.
  • Integration with the OpenShift platform.

As the e-point team had previous experience with the implementation and maintenance of the OpenShift platform, we launched the OpenShift Origin platform version on CentOS systems. This option doesn’t include paid RedHat support, which helped us reduce ING’s maintenance costs.

AWS Cloud

The architecture is based on three AWS zones, which ensures high availability. Each zone contains a single master node. Compute nodes are also distributed across three zones. Secure SSL encryption is anchored on AWS load balancers. We also used separate routers within the OpenShift cluster for production traffic, cluster services, and test services.

Diagram 2. OpenShift Cloud Infrastructure View

Note that we implemented dynamic provisioning of EBS cloud disks and EFS shares. This means we won’t need to involve infrastructure administrators when adding services that require persistent storage and the entire operation can be performed with no system interruptions.

The system contains a PostgreSQL database. We decided to use the RDS service with replication. The replica is located in a different zone than the master database and in a different physical and network infrastructure, which meets the requirements of a high availability system. Access to the database from microservices running in the cluster is implemented with OpenShift external services. Thanks to this, we reduced the risk of connecting the client’s application to a wrong database, such as a database from another project. This also improved project visibility.

Internal Image Registry

We also deployed an internal Docker image registry with an object storage back-end. Each OpenShift project has space inside this registry where images are uploaded and stored.Users and services are given appropriate permissions to access the correct images from the registry.

It is worth remembering that an image or its layer located in two projects’ spaces does not take twice as much space. If an image with the same identifier (i.e. the same SHA256 hash) belongs to two or more projects, it physically occupies the same space as one image. Thanks to this, we can ensure a high level of security while reducing the cost of object storage.

Meeting Security Needs with OpenShift Secrets

We paid particular attention to password security in this project. Passwords should not be stored on machines that run applications. Incorporating passwords into images would compromise the security of the entire system. That’s why we chose OpenShift Secrets, i.e. objects that allow passwords (and other data requiring special protection) to 1) be mounted as a directory or file in the container running the application, or 2) be transferred as environment variables. In either case, the passwords are not stored on the drive of the node with the container.

To ensure security at the OS level, OpenShift defaults to running containers with high user numbers (i.e. 1000100000) in a restricted security context. Each Pod has a separate hostname and IP address. Moreover, files such as /etc /passwd or /etc /shadow are not shared. For these reasons, it’s essential to plan and properly prepare images (and the way an application runs) in advance.

Monitoring and Logs

Without monitoring and application logs, maintenance teams are blind and cannot act proactively. This makes it impossible to detect extraordinary situations, find bugs, or carry out post-incident /post-attack analysis (i.e. forensics).

The OpenShift console enables its users to track applications when containers write logs to the standard output, which makes analysis much easier. We configured the project’s applications and containers in this way. Logs are also saved in persistent resources and – thanks to DaemonSet Pods – transferred to the log aggregator, synchronized, and archived to object storage via the S3 protocol. An established log policy ensures proper log retention and protects against disk resource overflow.

The following stack of monitoring tools was implemented within OpenShift to monitor application parameters and the state of Java Virtual Machines: Grafana, InfluxDB, Telegraf, Jolokia, and Spring Boot Actuator.

OpenShift will autoheal a properly configured and deployed application. Health checks are deployed for each container, monitoring whether it works as expected.  Based on these checks, the controller decides whether to allow production traffic into a given container (application instance) or to request a restart.

Conclusion

By choosing OpenShift, we were able to provide a high level of security and reliability while reducing project costs. The implementation and maintenance of applications in the OpenShift cluster:

  • Unifies application and architecture development processes, system implementation, and maintenance.
  • Increases the coherence and transparency of the project.
  • Reduces the differences between development, CI, test, and production environments; this is essentially impossible in the classic model.
  • Facilitates the control of a large number of microservices.
  • Streamlines the process of introducing a new employee to application implementation and deployment.

Considering all of the above, we concluded that the use of OpenShift improved the quality of this project.