Overview
Before you install any hardware or software, you must know what you're trying to achieve. This section looks at the basic components of an OpenStack infrastructure and organizes them into one of the more common reference architectures. You'll then use that architecture as a basis for installing OpenStack in the next section.
As you know, OpenStack provides the following basic services:
- Compute:
- Compute servers are the workhorses of your installation; they're the servers on which your users' virtual machines are created. nova-compute controls the life cycle of these VMs.
- Networking:
-
Typically, an OpenStack environment includes multiple servers that need to communicate to each other and to outside world. Fuel supports both old nova-network and new neutron based OpenStack Networking implementations:
- With nova-network, Flat-DHCP and VLAN modes are available.
- With neutron, GRE tunnels or VLANs can be used for network segmentation.
- Storage:
-
OpenStack requires block and object storage to be provisioned. Fuel provides the following storage options out of the box:
- Cinder LVM provides persistent block storage to virtual machines over iSCSI protocol
- Swift object store can be used by Glance to store VM images and snapshots, it may also be used directly by applications
- Ceph combines object and block storage and can replace either one or both of the above.
Compute, Networking, and Storage services can be combined in many different ways. Out of the box, Fuel supports the following deployment configurations:
Multi-node Deployment
In a production environment, you will not likely ever have a Multi-node deployment of OpenStack, partly because it forces you to make a number of compromises as to the number and types of services that you can deploy. It is, however, extremely useful if you just want to see how OpenStack works from a user's point of view.
More commonly, your OpenStack installation will consist of multiple servers. Exactly how many is up to you, of course, but the main idea is that your controller(s) are separate from your compute servers, on which your users' VMs will actually run. One arrangement that will enable you to achieve this separation while still keeping your hardware investment relatively modest is to house your storage on your controller nodes.
Multi-node with HA Deployment
Production environments typically require high availability, which involves several architectural requirements. Specifically, you will need at least three controllers, and certain components will be deployed in multiple locations to prevent single points of failure. That's not to say, however, that you can't reduce hardware requirements by combining your storage, network, and controller nodes:
We'll take a closer look at the details of this deployment configuration in Details of Multi-node with HA Deployment section.
Details of Multi-node with HA Deployment
OpenStack services are interconnected by RESTful HTTP-based APIs and AMQP-based RPC messages. So redundancy for stateless OpenStack API services is implemented through the combination of Virtual IP (VIP) management using Pacemaker and load balancing using HAProxy. Stateful OpenStack components, such as the state database and messaging server, rely on their respective active/active and active/passive modes for high availability. For example, RabbitMQ uses built-in clustering capabilities, while the database uses MySQL/Galera replication.
Lets take a closer look at what an OpenStack deployment looks like, and what it will take to achieve high availability for an OpenStack deployment.
Red Hat OpenStack Architectures
Red Hat has partnered with Mirantis to offer an end-to-end supported distribution of OpenStack powered by Fuel. Because Red Hat offers support for a subset of all available open source packages, the reference architecture has been slightly modified to meet Red Hat's support requirements to provide a highly available OpenStack environment.
Below is the list of modifications:
- Database backend:
- MySQL with Galera has been replaced with native replication in a Master/Slave configuration. MySQL master is elected via Corosync and master and slave status is managed via Pacemaker.
- Messaging backend:
- RabbitMQ has been replaced with QPID. Qpid is an AMQP provider that Red Hat offers, but it cannot be clustered in Red Hat's offering. As a result, Fuel configures three non-clustered, independent QPID brokers. Fuel still offers HA for messaging backend via virtual IP management provided by Corosync.
- Nova networking:
- Neutron (Quantum) is not available for Red Hat OpenStack because the Red Hat kernel lacks GRE tunneling support for OpenVSwitch. This issue should be fixed in a future release. As a result, Fuel for Red Hat OpenStack Platform will only support Nova networking.
Multi-node Red Hat OpenStack Deployment
In a production environment, it is not likely you will ever have a Multi-node deployment of OpenStack, partly because it forces you to make a number of compromises as to the number and types of services that you can deploy. It is, however, extremely useful if you just want to see how OpenStack works from a user's point of view.
More commonly, your OpenStack installation will consist of multiple servers. Exactly how many is up to you, of course, but the main idea is that your controller(s) are separate from your compute servers, on which your users' VMs will actually run. One arrangement that will enable you to achieve this separation while still keeping your hardware investment relatively modest is to house your storage on your controller nodes.
Multi-node with HA Red Hat OpenStack Deployment
Production environments typically require high availability, which involves several architectural requirements. Specifically, you will need at least three controllers, and certain components will be deployed in multiple locations to prevent single points of failure. That's not to say, however, that you can't reduce hardware requirements by combining your storage, network, and controller nodes:
OpenStack services are interconnected by RESTful HTTP-based APIs and AMQP-based RPC messages. So redundancy for stateless OpenStack API services is implemented through the combination of Virtual IP (VIP) management using Corosync and load balancing using HAProxy. Stateful OpenStack components, such as the state database and messaging server, rely on their respective active/passive modes for high availability. For example, MySQL uses built-in replication capabilities (plus the help of Pacemaker), while QPID is offered in three independent brokers with virtual IP management to provide high availability.
HA Logical Setup
An OpenStack Multi-node HA environment involves three types of nodes: controller nodes, compute nodes, and storage nodes.
Controller Nodes
The first order of business in achieving high availability (HA) is redundancy, so the first step is to provide multiple controller nodes.
As you may recall, the database uses Galera to achieve HA, and Galera is a quorum-based system. That means that you should have at least 3 controller nodes.
Every OpenStack controller runs HAProxy, which manages a single External Virtual IP (VIP) for all controller nodes and provides HTTP and TCP load balancing of requests going to OpenStack API services, RabbitMQ, and MySQL.
When an end user accesses the OpenStack cloud using Horizon or makes a request to the REST API for services such as nova-api, glance-api, keystone-api, quantum-api, nova-scheduler, MySQL or RabbitMQ, the request goes to the live controller node currently holding the External VIP, and the connection gets terminated by HAProxy. When the next request comes in, HAProxy handles it, and may send it to the original controller or another in the environment, depending on load conditions.
Each of the services housed on the controller nodes has its own mechanism for achieving HA:
- nova-api, glance-api, keystone-api, quantum-api and nova-scheduler are stateless services that do not require any special attention besides load balancing.
- Horizon, as a typical web application, requires sticky sessions to be enabled at the load balancer.
- RabbitMQ provides active/active high availability using mirrored queues.
- MySQL high availability is achieved through Galera active/active multi-master deployment and Pacemaker.
- Quantum agents are managed by Pacemaker.
- Ceph monitors implement their own quorum based HA mechanism and require time synchronization between all nodes. Clock drift higher than 50ms may break the quorum or even crash the Ceph service.
Compute Nodes
OpenStack compute nodes are, in many ways, the foundation of your environment; they are the servers on which your users will create their Virtual Machines (VMs) and host their applications. Compute nodes need to talk to controller nodes and reach out to essential services such as RabbitMQ and MySQL. They use the same approach that provides redundancy to the end-users of Horizon and REST APIs, reaching out to controller nodes using the VIP and going through HAProxy.
Storage Nodes
Depending on the storage options you select for your environment, you may have Ceph, Cinder, and Swift services running on your storage nodes.
Ceph implements its own HA, all you need is enough controller nodes running Ceph Monitor service to form a quorum, and enough Ceph OSD nodes to satisfy the object replication factor.
Swift API relies on the same HAProxy setup with VIP on controller nodes as the other REST APIs. If don't expect too much data traffic in Swift, you can also deploy Swift Storage and Proxy services on controller nodes. For a larger production environment you'll need dedicated nodes: two for Swift Proxy and at least three for Swift Storage.
Whether or not you'd want separate Swift nodes depends primarily on how much data you expect to keep there. A simple test is to fully populate your Swift object store with data and then fail one controller node. If replication of the degraded Swift objects between the remaining nodes controller generates enough network traffic, CPU load, or disk I/O to impact performance of other OpenStack services running on the same nodes, you should separate Swift from controllers.
If you select Cinder LVM as the block storage backend for Cinder volumes, you should have at least one Cinder LVM node. Unlike Swift and Ceph, Cinder LVM doesn't implement data redundancy across nodes: if a Cinder node is lost, volumes stored on that node cannot be recovered from the data stored on other Cinder nodes. If you need your block storage to be resilient, use Ceph for volumes.
Cluster Sizing
This reference architecture is well suited for production-grade OpenStack deployments on a medium and large scale when you can afford allocating several servers for your OpenStack controller nodes in order to build a fully redundant and highly available environment.
The absolute minimum requirement for a highly-available OpenStack deployment is to allocate 4 nodes:
- 3 controller nodes, combined with storage
- 1 compute node
If you want to run storage separately from the controllers, you can do that as well by raising the bar to 9 nodes:
- 3 Controller nodes
- 3 Storage nodes
- 2 Swift Proxy nodes
- 1 Compute node
Of course, you are free to choose how to deploy OpenStack based on the amount of available hardware and on your goals (such as whether you want a compute-oriented or storage-oriented environment).
For a typical OpenStack compute deployment, you can use this table as high-level guidance to determine the number of controllers, compute, and storage nodes you should have:
# of Nodes | Controllers | Computes | Storages |
---|---|---|---|
4-10 | 3 | 1-7 | 3 (on controllers) |
11-40 | 3 | 3-32 | 3+ (swift) + 2 (proxy) |
41-100 | 4 | 29-88 | 6+ (swift) + 2 (proxy) |
>100 | 5 | >84 | 9+ (swift) + 2 (proxy) |
Network Architecture
For better network performance and manageability, Fuel place different types of traffic into separate networks. This section describes how to distribute the network traffic in an OpenStack cluster.
Public Network
This network allows inbound connections to VMs from the outside world (allowing users to connect to VMs from the Internet). It also allows outbound connections from VMs to the outside world. For security reasons, the public network is usually isolated from other networks in cluster. The word "Public" means that these addresses can be used to communicate with cluster and its VMs from outside of cluster.
To enable external access to VMs, the public network provides the address space for the floating IPs assigned to individual VM instances by the project administrator. Nova Network or Neutron services can then configure this address on the public network interface of the Network controller node. E.g. environments based on Nova Network use iptables to create a Destination NAT from this address to the private IP of the corresponding VM instance through the appropriate virtual bridge interface on the Network controller node.
In the other direction, the public network provides connectivity to the globally routed address space for VMs. The IP address from the public network that has been assigned to a compute node is used as the source for the Source NAT performed for traffic going from VM instances on the compute node to Internet.
The public network also provides Virtual IPs for Endpoint nodes, which are used to connect to OpenStack services APIs.
Internal (Management) Network
The internal network connects all OpenStack nodes in the environment. All components of an OpenStack environment communicate with each other using this network. This network must be isolated from both the private and public networks for security reasons.
The internal network can also be used for serving iSCSI protocol exchanges between Compute and Storage nodes.
Private Network
The private network facilitates communication between each tenant's VMs. Private network address spaces are not a part of the enterprise network address space. Fixed IPs of virtual instances are directly unaccessible from the rest of Enterprise network.
NIC usage
The current architecture assumes the presence of 3 NICs, but it can be customized for two or 4+ network interfaces. Most servers are built with at least two network interfaces. In this case, let's consider a typical example of three NIC cards. They're utilized as follows:
- eth0:
- The internal management network, used for communication with Puppet & Cobbler
- eth1:
- The public network, and floating IPs assigned to VMs
- eth2:
- The private network, for communication between OpenStack VMs, and the bridge interface (VLANs)
The figure below illustrates the relevant nodes and networks in Neutron VLAN mode.