The CREATE TABLESPACE statement creates a tablespace in $MACHBASE_HOME/dbs/ where the indexes of the log table or log table will be stored.
Tablespace can have multiple disks. When each Partition File that stores data of Table and Index is stored, it is distributed and stored in Data Disks belonging to Tablespace.
If two or more disks are used, the index and table files are distributed and stored on each disk, and I/O is performed in parallel on each device. As the number of disks increases, disk I / O throughput increases, and a large amount of data can be stored on the disk quickly
Also, if tables and index tablespace are separately created and different disks are defined, I/O of table and index can be logically separated without reconfiguration of physical disk.
Defines disk belonging to a tablespace. Each Disk has the following properties.
Specifies the name of the Disk object. It is used to change the attributes of the Disk object through Alter Tablespace syntax later.
Specifies the attributes of the disk.
Specifies the Directory Path of the disk. This Directory must be created. When a path is specified as a relative path, PATH is searched based on $MACHBASE_HOME/dbs. For example, if PATH = 'disk1', Disk Path is recognized as $MACHBASE_HOME/dbs/disk1.
Determines how many disk IO requests are allowed to be paralleled. (DEF: 3, MIN: 1, MAX: 128)
If there is no keyword between CREATE TABLE, a log table is created.
VOLATILE_TABLE is a temporary table in which all data resides in temporary memory and joins the log table to improve the results,
The Machbase server disappears as soon as it is shut down.
Like VOLATILE_TABLE, LOOKUP_TABLE can perform fast query processing by storing all the data in memory.
A table column is usually a string, but it can be enclosed in single quotes (') or double quotes (") to contain special characters.
When querying the column through a SELECT query, it must be enclosed in either single or double quotes.
- When enclosed in single quotation marks, the name of the SELECT query result column is output with single quotation marks.
- When enclosed in double quotes, the name of the SELECT query result column is output as it is.
Specifies the attributes for the table.
Available Table Types
This is a property supported only for Disk Column Table. If appended data is maliciously changed later, it can check whether data is changed or not. Check the ALTER TABLE CHECK FORGERY RESULTFILE syntax.
Specifies the attribute for the column.
Available Table Types
This property represents the number of pages a partition has. The number of values that a partition has is PART_PAGE_COUNT * PAGE_VALUE_COUNT.
This property represents the number of values that a page has.
MAX_CACHE_PART_COUNT (Default : 0)
This property sets the cache area for performance.
When Machbase accesses a partition, it first looks for a structure that contains the meta information of that partition in memory. It determines how many partition information it contains in memory. Larger size will help performance, but memory usage will increase. The minimum value is 1 and the maximum value is 65535.
MINMAX_CACHE_SIZE (Default : 10240)
This property specifies how much cache memory to use for the MINMAX of the corresponding column. The default is 100MB for _ARRIVAL_TIME, the 0th hidden column. However, other columns are specified as 10KB by default. This size can be changed after the creation of the table through the "ALTER TABLE MODIFY" statement.
NOT NULL Constraint
Specifies NOT NULL if the column value does not allow NULL, and omit it if it is allowed (Default).
You can change the constraint with the ALTER TABLE MODIFY COLUMN command to drop or add this constraint defined after the creation of the table.
Pre-defined System Columns
When you create a table using the Create Table statement, the system creates two additional predefined system columns. _ARRIVAL_TIME and _RID columns.
The _ARRIVAL_TIME column is inserted into the DATETIME column based on the system time at which data is inserted into the INSERT statement or AppendData, and the value can be used as the unique key of the generated record. The value of this column can be inserted by specifying the value in the machloader or INSERT statement if the order is guaranteed (in the order of past-present). When data is retrieved using the DURATION conditional expression, data is retrieved based on the value of this column.
The _RID column is created by the system as a unique value for a particular record. The data type of this column is a 64-bit integer. For this column, the user can not specify a value and can not create an index. It is automatically generated at the time of data INSERT. You can retrieve records by the value of the _RID column.
MINMAX Cache Concept
In general, in the Disk DBMS, when a specific value is searched using the index, the disk is accessed to access the disk area including the index, and the final disk page including the corresponding value is searched.
On the other hand, Machbase is a chronologically partitioned structure in order to maintain time series information, which means that a particular piece of index information is divided into chunks of files in chronological order. Therefore, when a Machbase index is used, an index file fragmented by such a partition is sequentially searched.
If the range of data to be searched is divided into 1000 partitions, it means that 1000 files should be opened and retrieved every time. Although it is designed as an efficient columnar database structure, the MINMAX_CACHE structure is a way to improve the performance because the I/O cost is proportional to the number of index partitions.
MINMAX_CACHE is a structure that holds the index file information of the partition in memory, and is a contiguous memory space that keeps the minimum and maximum values of the column in memory. By maintaining such structure, when a partition containing a specific value is searched, if the value is smaller than the minimum value of the index or is larger than the maximum value, the corresponding partition can be skipped altogether, thereby enabling high-performance data analysis.
As shown in the figure above, to find the value 85, only the partitions 1 and 5 included in MIN/MAX among the 5 partitions are actually searched, and the partitions 2, 3 and 4 are skipped altogether.
MINMAX Cache Column
You can decide whether to use MINMAX Cache for a particular column when creating the table.
If the minmax_cache_size is set to a value other than 0, the MINMAX Cache will be active when the index is searched for that column and will not be active if
MINMAX_CACHE_SIZE = 0.
Please note the following when using this MINMAX Cache.
MINMAX Cache does not need to explicitly create an index on the column.
As default for all columns,
MINMAX_CACHE_SIZEis set to 10KB and the Alter Table syntax can be used to reset the memory size to a reasonable size.
The hidden column _arrival_time is 100MB by default and automatically uses MINMAX Cache memory.
In the case of VARCHAR type, MINMAX Cache is not covered. Therefore, if you explicitly specify whether the VARCHAR type is cached, an error will occur.
When the corresponding table is created, the
MINMAX_CACHE_SIZEmaximum memory can be used as much as the property is set. As the number of partitions grows, the memory grows gradually and increases by the maximum memory above.
If there are no records in the table, MINMAX Cache memory is not allocated at all.
Below is an example of table creation using actual MINMAX.
This is a constraint that can be assigned to a Volatile/Lookup table column. The Volatile / Lookup table does not always need to have a primary key, but you can not use the INSERT ON DUPLICATE KEY UPDATE statement without a primary key.
When a primary key is assigned, a red-black tree index corresponding to the primary key is generated.
Supported version 5.5 or later.
This feature limits the amount of data to maintain disk storage space. It is only supported for log table. It is set by using AUTO_DEL clause before specifying table property in CREATE TABLE statement. The AUTO_DEL clause can be set based on the storage time or the number of records.
The above example deletes data that is more than 30 days old if there is more input five seconds after the automatic deletion is performed. If the interval specified is too long, auto delete is performed for a long time, which may affect the input. If it is too short, it may affect overall system performance.
The following example shows how to specify the automatic deletion function as the number of data to save.
It checks the number of records in the table for every 5 input data and automatically deletes it and keep only 3 records if there are more than three.
CREATE TAG TABLE
The primary key, basetime, and summarized must be included in the tag table creation.
Specifies the Index Type to be created. If it is not Keyword Index, Index Type is created as Default Index Type according to Table Type if Index Type is not specified.
Default Index Type
This can be created only for varchar and text column of log table. It can be created for only one column.
LSM (Log Structure Merge) Index is an index optimized for storing and searching Big Data. The partitions of the LSM indexes are maintained for each level, and the lower level partitions are merged to move to the upper level. Lower partitions used to create a higher level partition are deleted.
This Index Level Partition Building is performed by Background Thread. The upper level partitions are merged with the lower level partitions and are created as one partition, so there are the following advantages when searching through the index.
If the key is duplicated, the disk space for key storage is saved because it is stored only once.
Searching for multiple partitions reduces the cost of opening and closing the file when searching for one index partition, and the number of index pages accessed is also reduced.
LSM Index Property
(DEFAULT = 3, MIN = 0, MAX = 3 )
The maximum level of the LSM Index, and the current value of 3 is the maximum value. And the maximum number of records of one partition can not exceed 200 million. The partition size of each level is the number of values of the previous partition * 10. For example, if MAX_LEVEL = 3 and PART_VALUE_COUNT is 100,000, then Level 0 = 100,000, Level 1 = 1,000,0000, Level 2 = 10,000,000, and Level 3 = 100,000,000. If the Partition Size of the last level exceeds 200 million, index creation will fail.
(DEFAULT = 512 * 1024, MIN = 32 * 1024,
MAX = 1 * 1024 * 1024)
Specifies the size of the page in which the index key value and bitmap value are stored. Default is 512K.
(DEFAULT = 1, DISABLE = 0, ENABLE(DEFAULT) = non_zero_integer)
Sets whether to set Bloom filter on index. Setting the Bloom filter allows you to quickly search for values that do not exist in the index, but it will increase the time it takes to create the index.
If you use only Range conditions, Bloom filters are not available and need not be created.
(DEFAULT = EQUAL, RANGE)
Sets the bitmap type of the index.
If BITMAP_ENCODE = EQUAL (default), generates a bitmap for the same value as the key value. If BITMAP = RANGE, generates a bitmap according to the range of the key value.
This is an index for data analysis and can be created only in the log table. It can be created on all columns except varchar, text, and binary, and can only be created on a single column.
This is a memory index for real-time data retrieval. It can be created only in the Volatile/Lookup table. It can be created in all columns of this table and can only be created for a single column.
The properties that can be applied in the LSM Index are as follows.
Indicates the number of rows stored in the Partition of Index.
Deletes the specified tablespace. However, if the object created in Tablespace exists, deletion fails.
Deletes the specified table. However, if there is another session in which the table is being searched, it fails with an error.
Deletes the specified index. However, if there is another session in which the table is being searched, it fails with an error.
The ALTER TABLESPACE statement is used to change the information associated with the specified tablespace.
ALTER TABLESPACE MODIFY DATADISK
This syntax is used to change the properties of DATADISK in Tablespace.
The ALTER TABLE statement is used to change the schema information of the specified table, and only the Log Table is available.
ALTER TABLE SET
This syntax changes the properties of a table. Currently there are no dynamically changeable properties.
ALTER TABLE ADD COLUMN
This syntax is the ability to add a specific column to the table in real time. You can add the name and type of the column, and set the default data values through the DEFAULT clause.
ALTER TABLE DROP COLUMN
This syntax is to delete a specific column in the table in real time.
ALTER TABLE RENAME COLUMN
This syntax is a function that changes a specific column name in a table.
ALTER TABLE MODIFY COLUMN
This syntax changes the properties of a particular column of a table. Currently it is possible to modify MINMAX CACHE attributes and NOT NULL constraints for column lengths and other types of VARCHAR types.
This syntax supports changing the column length of VARCHAR type only. This operation can not be reduced in length to preserve existing data, and should always be increased.
This syntax changes MINMAX_CACHE_SIZE for a particular column.
Adds a NOT NULL constraint to the column. If you add a NOT NULL constraint, the DDL operation fails for columns with NULL values.
If you want to allow NULL values in a column, use the MODIFY COLUMN NULL command in the next section.
Releases the NOT NULL constraint. Performance improvement due to min_max cache of LSM index can not be obtained. NULL values can be input.
ALTER TABLE FLUSH
Waits until the input data for the specified table is fully reflected in the data file.
ALTER TABLE FLUSH INDEX
Waits until the index data of the specified table is completely reflected in the index file.
ALTER TABLE CHECK FORGERY RESULT FILE
Checks whether the table has changed since it was appended to the specified table (Forgery). Forgery Check is only supported for Disk Column Table, and when creating Disk Column Table, it should be created by setting FORGERY_CHECK attribute to 1. When the Forgery is detected, a file corresponding to result_filename is created under $MACHBASE_HOME/trc and the file shows which part of the table has been changed.
ALTER TABLE RENAME TO
Changes the name of the table.
Metatables can not be renamed, and you can not use the $ character in the name to be changed. Table renaming is only possible for log tables.
Deletes all data in the specified table. However, if there is another session in which the table is being searched, it fails with an error.