This chapter describes the definition of user-defined log data and how it is collected through the collector.

Data Conversion Concept


The following figure shows the order in which the collector operates. First, the original data is read and parsed using a regular expression file. (The location of this regular expression file is described in the template file.) The parsed data is then assigned to the column using the COL_LIST value defined in the rgx file.

To collect custom log data, create a regular expression to parse the original data, describe the COL_LIST and regular expression file in the template file, and create and execute the collector using the template.

machregex


To collect data with the Collector, you must create a regular expression to parse the input data. Machbase provides machregex, a tool for checking whether a regular expression you create can correctly parse the desired input data.

Machbase provides examples of regular expressions that can parse SYSLOG, the ACCESS log of the Apache web server, and the TRACE LOG of the Machbase server. Machbase uses Perl Compatible Regular Expressions (PCRE) libraries to support regular expressions.

Index


Run machregex

[mach@localhost ~/mach_collector_home/bin]$ ./machregex 
=================================================================
     Machbase Collector Regex Utility
     Release Version 3.0.0.8634.official
     Copyright 2015, Machbase Inc. or its subsidiaries.
     All Rights Reserved.
=================================================================

Usage> ./machregex Pattern NewlinePattern

Result file : machregex.ok machregex.err

<< APACHE access log >>
  => machregex "^([0-9.:]+)\\s([\\w.-]+)\\s([\\w.-]+)\\s(\\[[^\\[\\]]+\\])\\s\"((?:[^\"]|\")+)\"\\s(\\d{3})\\s(\\d+|-)\\s\"((?:[^\"]|\")*)\"\\s\"
((?:[^\"]|\")*)\"$" "^([0-9.:]+)\s" < DATA.LOG

<< MACH trace log >>
  => machregex "^\\[(\\d+[-]\\d+[-]\\d+\\s\\d+[:]\\d+[:]\\d+)+\\s([P][-]\\d+)+\\s([T][-]\\d+)+\\]\\s((?:[^\\0])*)$" "^\\[" < DATA.LOG

<< syslog >>
  => machregex "^(([a-zA-Z]+)\\s+([0-9]+)\\s+([0-9:]*))\\s(\\S*)\\s+((?:[^\\0])*)$" ".*" < DATA.LOG

This is an example of the machregex run screen.

machregex Test

This is a test that parses Syslog data into machregex using regular expressions.

[mach@localhost bin]$ machregex "^(([a-zA-Z]+)\\s+([0-9]+)\\s+([0-9:]*))\\s(\\S*)\\s+((?:[^\\0])*)$" ".*" </var/log/syslog
machregex "^(([a-zA-Z]+)\\s+([0-9]+)\\s+([0-9:]*))\\s(\\S*)\\s+((?:[^\\0])*)$" ".*" </var/log/syslog
Pattern => (^(([a-zA-Z]+)\s+([0-9]+)\s+([0-9:]*))\s(\S*)\s+((?:[^\0])*)$)
========================================================================
.............
========================================================================
SUCCESS[107] (rc=7)(Aug 19 18:17:01 localhost CRON[6553]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
)
  ALL (0:110) => [Aug 19 18:17:01 localhost CRON[6553]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
]
  0 (0:15) => [Aug 19 18:17:01]
  1 (0:3) => [Aug]
  2 (4:6) => [19]
  3 (7:15) => [18:17:01]
  4 (16:37) => [localhost]
  5 (38:110) => [CRON[6553]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
]
=======================================================================
SUCCESS[107] (rc=7)(Aug 19 18:39:01 localhost CRON[6616]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && 
[ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
)
  ALL (0:232) => [Aug 19 18:39:01 localhost CRON[6616]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && 
[ -d /var/lib/php5 ] && /usr/lib/php5/sephp5/maxlifetime))
]
  0 (0:15) => [Aug 19 18:39:01]
  1 (0:3) => [Aug]
  2 (4:6) => [19]
  3 (7:15) => [18:39:01]
  4 (16:37) => [localhost]
  5 (38:232) => [CRON[6616]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sphp5/maxlifetime))
]
Summary : Success(107), Failure(0) <== It shows that all of them were successfully completed.

In the above example, machregex parses the syslog text file into the given regular expression and splits it into six tokens. To use 0, 4, or 5 of these tokens as database input, use the COL_LIST variable in the template file to associate the token with the database column.


Example of Creating Custom Template


In this chapter, we will use a sample text log file to create a collector template that collects data from this file.

test.log

The input sample text file looks like this:

[2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever.
[2014-08-18 13:51:19] superman  message-2 : This is the best machine data DBMS ever.
[2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever.
[2014-08-18 13:51:33] superman  message-4 : This is the best machine data DBMS ever.
[2014-08-18 13:51:34] batman    message-5 : This is the best machine data DBMS ever.
[2014-08-18 13:52:34] superman  message-6 : This is the best machine data DBMS ever.
[2014-08-18 13:53:34] batman    message-7 : This is the best machine data DBMS ever.
[2014-08-18 13:54:31] superman  message-8 : This is the best machine data DBMS ever.
[2014-08-18 13:55:30] batman    message-9 : This is the best machine data DBMS ever.
[2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever.
[2014-08-18 13:57:59] superman  message-11 : This is the best machine data DBMS ever.

The above sample file can be converted into three columns: tm, user, and msg. The data type of each column can be specified as datetime, varchar (16), varchar (512).


Example of Creating Regular Expression



Creating Regular Expression

\[([0-9-: ]+)\]: First, date data enclosed in square brackets comes in. The following expressions are used to retrieve only the numeric values ​​inside the tokens except for the square brackets. 
(\S+): Second, user name data comes in, and strings excluding blanks are input.
([^\0]*): Third, string is entered to the end.
\[([0-9-: ]+)\]\s(\S+)\s+([^\0]*): Combines the space between the three tokens.
"\\[([0-9-: ]+)\\]\\s(\\S+)\\s+([^\\0]*)": Processes double slashing to use strings in the shell.
"^\\[": New line regular expression is a square bracket at the beginning of time.


Checking Regular Expression

[mach@localhost ~/mach_collector_home/bin]$ machregex "\\[([0-9-: ]+)\\]\\s(\\S+)\\s+([^\\0]+)" "\\[" <test.log
Pattern => (\[([0-9-: ]+)\]\s(\S+)\s+([^\0]+))
============================================================================
SUCCESS[2] (rc=4)([2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:19]
  1 (22:31) => [spiderman]
  2 (32:85) => [message-1 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[3] (rc=4)([2014-08-18 13:51:19] superman  message-2 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:19] superman  message-2 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:19]
  1 (22:30) => [superman]
  2 (32:85) => [message-2 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[4] (rc=4)([2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:33]
  1 (22:31) => [spiderman]
  2 (32:85) => [message-3 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[5] (rc=4)([2014-08-18 13:51:33] superman  message-4 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:33] superman  message-4 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:33]
  1 (22:30) => [superman]
  2 (32:85) => [message-4 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[6] (rc=4)([2014-08-18 13:51:34] batman    message-5 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:34] batman    message-5 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:34]
  1 (22:28) => [batman]
  2 (32:85) => [message-5 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[7] (rc=4)([2014-08-18 13:52:34] superman  message-6 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:52:34] superman  message-6 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:52:34]
  1 (22:30) => [superman]
  2 (32:85) => [message-6 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[8] (rc=4)([2014-08-18 13:53:34] batman    message-7 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:53:34] batman    message-7 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:53:34]
  1 (22:28) => [batman]
  2 (32:85) => [message-7 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[9] (rc=4)([2014-08-18 13:54:31] superman  message-8 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:54:31] superman  message-8 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:54:31]
  1 (22:30) => [superman]
  2 (32:85) => [message-8 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[10] (rc=4)([2014-08-18 13:55:30] batman    message-9 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:55:30] batman    message-9 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:55:30]
  1 (22:28) => [batman]
  2 (32:85) => [message-9 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[11] (rc=4)([2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever.
)
  ALL (0:86) => [[2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:56:44]
  1 (22:31) => [spiderman]
  2 (32:86) => [message-10 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[11] (rc=4)([2014-08-18 13:57:59] superman  message-11 : This is the best machine data DBMS ever.)
  ALL (0:85) => [[2014-08-18 13:57:59] superman  message-11 : This is the best machine data DBMS ever.]
  0 (1:20) => [2014-08-18 13:57:59]
  1 (22:30) => [superman]
  2 (32:85) => [message-11 : This is the best machine data DBMS ever.]
Summary : Success(11), Failure(0)


Creating test.rgx

After checking that the generated regular expression is parsed normally through the above process, if there is no problem in parsing, write rgx file for regular expression and column binding as follows. This file is written in $MACHBASE_HOME/collector/samples/test.rgx.

###############################################################################
# Copyright of this product 2013-2023,
# Machbase Corporation (Incorporation) or its subsidiaries.
# All Rights reserved
###############################################################################

#
# This file is for Machbase trace collector regex file.
#

LOG_TYPE=custom

COL_LIST= (
     (
        REGEX_NO = 0
        NAME = tm
        TYPE = datetime
        SIZE = 8
        DATE_FORMAT="%Y-%m-%d %H:%M:%S"
         ),
     (
        REGEX_NO = 1
        NAME = user
        TYPE = varchar
        SIZE = 16
        USE_INDEX = 1
         ),
     (
        REGEX_NO = 2
        NAME = msg
        TYPE = varchar
        SIZE = 512
        USE_INDEX = 1
         )
)

REGEX="\[([0-9-: ]+)\]\s(\S+)\s+([^\0]+)"

END_REGEX="\["


Creating test.tpl

$MACHBASE_HOME/collector/custom.tpl is copied to the $MACHBASE_HOME/collector/test.tpl name and modifies the file as follows:

###############################################################################
# Copyright of this product 2013-2023,
# Machbase Corporation(Incorporation) or its subsidiaries.
# All Rights reserved
###############################################################################

#
# This file is for Machbase collector template file.
#

###################################################################
# Collect setting
###################################################################

COLLECT_TYPE=FILE
LOG_SOURCE=/home/mach/machbase_home/collector/samples/test.log

###################################################################
# Process setting
###################################################################

REGEX_PATH=/home/mach/machbase_home/collector/samples/test.tpl

###################################################################
# Output setting
###################################################################

DB_TABLE_NAME = "custom_table"
DB_ADDR       = "127.0.0.1"
DB_PORT       = 5656
DB_USER       = "SYS"
DB_PASS       = "MANAGER"

# 0: Direct insert
# 1: Prepared insert
# 2: Append
APPEND_MODE=2

# 0: None, just append.
# 1: Truncate.
# 2: Try to create table. If table already exists, warn it and proceed.
# 3: Drop and create.
CREATE_TABLE_MODE=2

Create and Execute a Collector

Create/Run Collector

Create a "myclt" collector and run it.

Mach> create collector localhost.myclt from "/home/mach/mach_collector_home/collector/samples/test.tpl";
Created successfully.
Elapsed Time : 0.106
Mach>
Mach> alter collector localhost.myclt start;
Altered successfully.

Debugging Collector

TESTTABLE was not created to record the input data.

Mach> select * from custom_table;
[ERR-02025 : Table CUSTOM_TABLE does not exist.]


Writes the error of the collector to the trace file and generates trace file to solve the error. Execute the following command to create a trace file.

Mach> alter collector localhost.myclt stop;
Altered successfully.
Mach> alter collector localhost.myclt start trace;
Altered successfully.


Problem Detection/Resolution Through Trace Log

If there is an error when running the Collector, you can look for the $MACHBASE_HOME/trc/machbase.trc file and look for database execution errors. If an error occurs in the collector, you must run collector in TRACE mode.

[2016-03-13 23:44:35 P-29741 T-139982693979904][INFO] PREPARE Error [create table custom_table ( collector_type varchar(32), collector_addr ipv4, collector_origin varchar(512), 
collector_offset long, tm datetime, user varchar(16), msg varchar(512))] (100007DA:Error in parse (syntax): near token (user varchar(16), msg varchar(512))).)

Looking at the above message, the table creation query failed because the user set to the column name is not a built-in keyword and can not be used as a column name. Therefore, in the COL_LIST section of the rgx file, change the user column to myuser and run the collector again.

A partial contents from "test.rgx"
...........

COL_LIST= (
     (
        REGEX_NO = 0
        NAME = tm
        TYPE = datetime
        SIZE = 8
        DATE_FORMAT="%Y-%m-%d %H:%M:%S"
         ),
     (
        REGEX_NO = 1
        NAME = myuser   <== Modified part
        TYPE = varchar
        SIZE = 16
        USE_INDEX = 1
         ),
     (
        REGEX_NO = 2
        NAME = msg
        TYPE = varchar
        SIZE = 512
        USE_INDEX = 1
         )
)
..................

Check Run/Results

Rerun it with the modified rgx file.

Mach> alter collector localhost.myclt stop; <== Stop the TRACE mode.
Altered successfully.
Mach> alter collector localhost.myclt start; <== Execute it again in a normal mode after the modification 
Altered successfully.


If executed normally, the collector can query the contents of the table in which the data is stored.

Mach> select tm, myuser, msg from custom_table;
tm                              myuser            
-----------------------------------------------------
msg                                                                               
------------------------------------------------------------------------------------
2014-08-18 13:57:59 000:000:000 superman          
message-11 : This is the best machine data DBMS ever.

2014-08-18 13:56:44 000:000:000 spiderman         
message-10 : This is the best machine data DBMS ever.

2014-08-18 13:55:30 000:000:000 batman            
message-9 : This is the best machine data DBMS ever.

2014-08-18 13:54:31 000:000:000 superman          
message-8 : This is the best machine data DBMS ever.

2014-08-18 13:53:34 000:000:000 batman            
message-7 : This is the best machine data DBMS ever.

2014-08-18 13:52:34 000:000:000 superman          
message-6 : This is the best machine data DBMS ever.

2014-08-18 13:51:34 000:000:000 batman            
message-5 : This is the best machine data DBMS ever.

2014-08-18 13:51:33 000:000:000 superman          
message-4 : This is the best machine data DBMS ever.

2014-08-18 13:51:33 000:000:000 spiderman         
message-3 : This is the best machine data DBMS ever.

2014-08-18 13:51:19 000:000:000 superman          
message-2 : This is the best machine data DBMS ever.

2014-08-18 13:51:19 000:000:000 spiderman         
message-1 : This is the best machine data DBMS ever.

[11] row(s) selected.
  • No labels