30
loading...
This website collects cookies to deliver better user experience
mzcli
, a CLI tool similar to psql
used to connect to Materialize and execute SQL on it.materialized
(d for daemon, following Unix conventions). Since we're running on Linux, we'll just install Materialize directly. To install it, run the following command:sudo apt install materialized
sudo materialized
materialized
running, we need to open a new terminal to install and run a CLI tool that we use to interact with our Materialize instance! mzcli
tool lets us connect to Materialize similar to how we would use a SQL client to connect to any other database.psql
already installed you could use it instead of mzcli
, but with mzcli
you get nice syntax highlighting and autocomplete when writing your queries.mzcli
is via pipx
, so first run:apt install pipx
pipx
is installed, install mzcli
with:pipx install mzcli
mzcli
we can connect to materialized
with:mzcli -U materialize -h localhost -p 6875 materialize
sudo apt install nginx
for i in {1..200} ; do curl -s 'localhost/materialize' > /dev/null ; echo $i ; done
If you have an actual nginx access.log
, you can skip the step above.
/var/log/nginx/access.log
access log file that we would be able to able to feed into Materialize.mzcli
command:mzcli -U materialize -h localhost -p 6875 materialize
CREATE SOURCE nginx_log
FROM FILE '/var/log/nginx/access.log'
WITH (tail = true)
FORMAT REGEX '(?P<ipaddress>[^ ]+) - - \[(?P<time>[^\]]+)\] "(?P<request>[^ ]+) (?P<url>[^ ]+)[^"]+" (?P<statuscode>\d{3})';
CREATE SOURCE
: First we specify that we want to create a sourceFROM FILE
: Then we specify that this source will read from a local file, and we provide the path to that fileWITH (tail = true)
: Continually check the file for new contentFORMAT REGEX
: as this is an unstructured file we need to specify regex as the format so that we could get only the specific parts of the log that we need. The Materialize-specific behavior to note here is the ?P<NAME_HERE>
pattern extracts the matched text into a column named NAME_HERE
.
123.123.123.119 - - [13/Oct/2021:10:54:22 +0000] "GET / HTTP/1.1" 200 396 "-" "Mozilla/5.0 zgrab/0.x"
(?P<ipaddress>[^ ]+)
: With this pattern we match the IP address for each line of the nginx log, e.g. 123.123.123.119
.\[(?P<time>[^\]]+)\]
: the timestamp string from inside square brackets, e.g. [13/Oct/2021:10:54:22 +0000]
"(?P<request>[^ ]+)
: the type of request like GET
, POST
etc.(?P<url>[^ ]+)
: the relative URL, eg. /favicon.ico
(?P<statuscode>\d{3})
: the three digit HTTP status code.mz> SHOW SOURCES;
// Output
+-----------+
| name |
|-----------|
| nginx_log |
+-----------+
SELECT 1
Time: 0.021s
CREATE MATERIALIZED VIEW aggregated_logs AS
SELECT
ipaddress,
request,
url,
statuscode::int,
COUNT(*) as count
FROM nginx_log GROUP BY 1,2,3,4;
CREATE MATERIALIZED VIEW aggregated_logs
which identifies that we want to create a new Materialized view named aggregated_logs
.SELECT
statement that we are interested in keeping track of over time. In this case we are aggregating the data in our log file by ipaddress
, request
, url
and statuscode
, and we are counting the total instances of each combo with a COUNT(*)
We specified a simple SELECT
that we want the view to be based on but this could include complex operations like JOIN
s, however for the sake of this tutorial we are keeping things simple.
SELECT
on this Materialized view, we get a nice aggregated summary of stats:SELECT * FROM aggregated_logs;
ipaddress | request | url | statuscode | count
----------------+---------+--------------------------+------------+-------
127.0.0.1 | GET | /materialize | 404 | 200
SELECT url, SUM(count) as total FROM aggregated_logs GROUP BY 1 ORDER BY 2 DESC;
+--------------------------+-------+
| url | total |
|--------------------------+-------|
| /materialize/demo-page-2 | 1255 |
| /materialize/demo-page | 1957 |
| /materialize | 400 |
+--------------------------+-------+
psql
together with the watch
command to see this in action. psql
already isntalled you can install it with the following command:sudo apt install postgresql-client
SELECT * FROM aggregated_logs
statement every second using the watch
command:watch -n1 "psql -c 'select * from aggregated_logs' -U materialize -h localhost -p 6875 materialize"
for
loop to generate some new nginx logs and see how the results change:for i in {1..2000} ; do curl -s 'localhost/materialize/demo-page-2' > /dev/null ; echo $i ; done
watch
command would look like this: