In the Machbase Cluster Edition, several nodes that reside in the Host constitute one cluster.
The service can be continued even if one of all the internal nodes is interrupted.
Data storage can be distributed, and parallel analysis is possible from the distributed data, so
performance increases as the cluster grows.
Classifications of Node
Each Node can be classified as follows:
The process of managing all general purpose servers and nodes
The process that resides on each host
Responsible for installing, upgrading, and monitoring the Broker / Warehouse.
The process of welcoming an actual client program.
Serves to distribute client data insert / data lookup queries to the warehouse.
Brokers that can perform DDL
The process that stores the actual data
Stores some of the entire cluster data and executes the commands received from the Broker.
Coordinator is a process that manages the state of all nodes, and can be a maximum of two.
First, the generated Coordinator is called the Primary Coordinator, and the other is called the Secondary Coordinator, and only the Primary Coordinator manages the state of all the Nodes.
When the Primary Coordinator is down, the Secondary Coordinator is upgraded to Primary Coordinator.
Special Node : Deployer
It is managed by the Coordinator, but it is simply a process that installs / removes Broker / Warehouse Nodes.
Normally, only one Deployer can be added to the Host at a time when installing Nodes, but multiple Deployers can be added for installation performance.
Example of Installation From Server
Figure (a) below shows the installation of two Coordinators, two Brokers, four Warehouse Active, and four Warehouse Standby Nodes on four generic servers.
As shown in the figure, you can distinguish each node with 'hostname: port' followed by the host name of the general-purpose server and the assigned port number.
The Broker delivers the Client's commands to the Warehouse, and then collects the results of the Warehouse and transmits them back into the Client.
- When entering data, the Broker makes sure the data enters evenly into the Warehouse.
- When retrieving data, the Broker fetches the data to the Warehouse and collects and delivers all the results.
The Broker does not have the data in the Log Table, but it has the data in the Volatile Table.
If all Brokers run DDL at the same time, there will be a problem with the overall consistency of the cluster.
If the DDL command starts to be controlled at the cluster level to solve this problem, the performance of the DDL itself deteriorates.
Therefore, there needs to be a specified Broker that can perform DDL, which is called the Leader Broker .
The Leader Broker can perform DDL, but other Brokers can not perform DDL.
The Warehouse will store the Log Table data directly, and will act as the actual execution of the command delivered by the Broker.
Like the Broker, there is direct client access to the Warehouse, but the data can not be input / updated / deleted; the Warehouse data can only be retrieved.
The Warehouse can specify the Group to which it belongs.
- When the Broker inputs data, all Warehouses in the same Group receive the same records.
- Even if a specific Warehouse of the group is dropped, there are no issues with data retrieval.
- When a new Warehouse is added to a group, the same records are maintained through Replication.
Warehouse Group State
|INSERT / APPEND||SELECT|
The conditions that change to Readonly state are as follows.
- During INSERT / APPEND, if some Warehouses in the Group fail to be input
- Because there is a data inconsistency between the failed Warehouses and the successful Warehouses,
the failed Warehouse is placed in the Scrapped state and the group is moved to the Readonly state to avoid receiving further input.
- Because there is a data inconsistency between the failed Warehouses and the successful Warehouses,
- When a new Warehouse is added
- If the input is received even while the replication process is in progress, the state is changed to the Readonly state because the end of the replication is unknown.
Node Port Management
Each Node must have several ports open, which are distinguished as follows:
Port for communication between Nodes
Port directly connected by client
|Broker / Warehouse|
Port for communication for management purpose
|Coordinator / Deployer|
Port for communication between warehouses for replication
Commands That Can Be Executed After Direct Connection
The following table lists the possible and not possible commands to connect directly to each Node.
All nodes are accessible via the client, but there are queries that are not possible depending on the type of Node.
|Broker (Leader)||Broker (non-leader)||Warehouse Standby|
Save / Lookup Data
Machbase Cluster Edition can distribute the data and collect the results computed by distributed query execution. This section explains how to store and lookup the table type.
When data is entered into the Log Table through the Broker, it is distributed to all warehouses. (The data is not stored in the Broker that performs the input.) The Coordinator determines the database size of each warehouse, and the Broker distributes the data based on that.
If data is entered directly into the Log Table via the Warehouse, it is stored only in the corresponding Warehouse. Can be selected to avoid performance degradation due to distributed algorithm and network bottleneck.
When a Broker enters data into a Volatile Table, it is stored in the corresponding Broker. In other words, no data is entered or synchronized with other Brokers.
The reason for not supporting replication for Volatile Tables is that if it matches the characteristics of the Volatile Table able to DELETE, it affects the replication performance.
Volatile tables are created only in the Broker, so they can not be entered in the Warehouse.
When you view the data in the Log Table through the Broker, queries are distributed to all Warehouses. Each Warehouse actually performs the query, exchanging intermediate results between the Warehouses if necessary. The Broker collects the partial results generated in this way and returns the final result.
When viewing the data in the Log Table through the Warehouse, the query is executed only in the corresponding warehouse. This process is identical to query execution in the Fog Edition.
When viewing data in a Volatile Table through a Broker, the query is executed only by the Broker. This process is identical to query execution in the Fog Edition.
JOIN can not be done through the Warehouse, because Volatile Tables are not created.
JOIN Between Two Tables
When joining a Log Table and a Volatile Table through a Broker, the connected Broker and the rest of the Warehouse execute the query at the same time. The Broker distributes the results of the Volatile table to the Warehouse.
The Warehouse JOINs the data delivered by the broker and returns the result. The Broker collects the partial results generated in this way and returns the final result.
JOIN can not be done through the warehouse, because volatile tables are not created.
Replication refers to the process of replicating the same node in preparation for failure of an existing node.
Up to two Coordinators can be created in Cluster Edition.
Both Coordinators continuously maintain Cluster Node information.
Even if either end abnormally, the remaining Coordinator can continue managing the Cluster Node.
The Broker is not a Replication target.
Therefore, the data record of the Volatile Table in Broker A is not kept the same in Broker B. (not synchronized)
However, because the table / index scheme of the entire Cluster are all the same, if the Volatile Table
VOL_TBL1 exists with Broker A, Volatile Table
VOL_TBL1 also exists with Broker B.
If a new Warehouse is added to the Group, the Warehouse is replicated through the following process.
The Coordinator starts DDL replication to the new Warehouse.
Group switches to Readonly state.
One of the Warehouses in the group starts data replication to the new warehouse.
When the data replication is completed, the group is switched to the Normal state.
In the case of data insert, the Broker guarantees redundancy by sending the same data to the same Group.
How To Recover
Even if the Node terminates abnormally, the service can be continued in the following manner.
For more information, refer to the Operations Guide.
Even if the Primary Coordinator is abnormally terminated, the Secondary Coordinator becomes the Primary Coordinator and the cluster management can continue.
Even if the Coordinator is terminated in the worst case scenario, the entire service (data insert / inquiry) can be continued without the cluster management.
Node operation (ADD, REMOVE ..) can not be performed on the corresponding Host, and statistical information of the host can not be collected.
Even if the Broker is terminated, the service can continue if another Broker exists.
However, because the connection to the client that has been terminated is disconnected, it must be reconnected to another Broker.
If the Leader Broker is stopped, the Coordinator re-selects the Leader Broker in the Broker to enable DDL execution.
Lookup Table data stored in the Broker is not replicated, so be careful.
If there is another / other Warehouse (s) in the Group, the Warehouse (s) will participate in SELECT and APPEND.
Not Supported Features
Currently, the Cluster Edition does not distinguish between table spaces.
BACKUP / MOUNT
Currently, the Cluster Edition does not distinguish between databases.
LOAD IN FILE
The ability to read and distribute CSV files is currently not implemented.
ALTER TABLE FORGERY CHECK
Result File can not be collected in one place as client's user data is checked for any changes.
Clause / Function
Execution units are complex and are currently not supported.
The entire contents of the CONCAT for the subgroups collected by each Warehouse can not be processed as a simple accumulation.
(ORDER BY in GROUP CONCAT)
The TS_CHANGE_COUNT results for the subgroups collected in each Warehouse can not be processed as simple accumulations.
In addition, TS_CHANGE_COUNT () is significant if the entire result is sorted, but if the results are distributed in the Warehouse, it is meaningless.
Supported Hardware and Operating Systems
Intel Core i Series (Nehalem ~) or higher recommended
2 GB or more recommended for each Node to be installed
|Linux (Any distribution)|