Impact on binary log of bulk row update query

Overview

I suddenly wondered how it would affect the binary log of bulk row update queries.

For example, if a table of several tens of giga-bytes is updated at once, and single update row event is sent, OOM will occur in the target database. Extremely dangerous.

How to handle this in MySQL, let’s talk lightly.

Bulk row update

In MySQL, the data actually written to the DB for the Insert query is written to the binlog as it is.

If you use a syntax like insert into .. select .. All of these data are recorded in the binlog in the form of an Array. This means that multiple rows are written to one write row event.

So, using mysql-binlog-connector-java , I simply printed out the row count applied to WriteRowsEvent and UpdateRowsEvent. Each event is passed in the form of or in the data area.

Each event is passed in the form of <Serializable[]> or <Serializable[],Serializable[]> in the data area.

// Write row event processing
WriteRowsEventData eventData = (WriteRowsEventData) event.getData();
System.out.println("WriteRowsEventData>>"+eventData.getRows().size());

// Update row event processing
UpdateRowsEventData eventData = (UpdateRowsEventData) event.getData();
System.out.println("UpdateRowsEventData>>"+eventData.getRows().size());

I simply checked the rows count.

mysql> CREATE TABLE `binlog_test` (
    ->   `i` int NOT NULL AUTO_INCREMENT,
    ->   `v` text,
    ->   PRIMARY KEY (`i`)
    -> );

mysql> INSERT INTO `binlog_test` (v) VALUES (repeat(md5(rand()), 10));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `binlog_test` (v) SELECT repeat(md5(rand()), 10) FROM binlog_test;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> INSERT INTO `binlog_test` (v) SELECT repeat(md5(rand()), 10) FROM binlog_test;
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> INSERT INTO `binlog_test` (v) SELECT repeat(md5(rand()), 10) FROM binlog_test;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> INSERT INTO `binlog_test` (v) SELECT repeat(md5(rand()), 10) FROM binlog_test;
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> INSERT INTO `binlog_test` (v) SELECT repeat(md5(rand()), 10) FROM binlog_test;
Query OK, 16 rows affected (0.00 sec)
Records: 16  Duplicates: 0  Warnings: 0

mysql> INSERT INTO `binlog_test` (v) SELECT repeat(md5(rand()), 10) FROM binlog_test;
Query OK, 32 rows affected (0.01 sec)
Records: 32  Duplicates: 0  Warnings: 0

mysql> INSERT INTO `binlog_test` (v) SELECT repeat(md5(rand()), 10) FROM binlog_test;
Query OK, 64 rows affected (0.00 sec)
Records: 64  Duplicates: 0  Warnings: 0

This is a query that repeats INSERT as many as the table count each time based on the first one. (1 -> 1-> 2 -> .. -> 64)

WriteRowsEventData>>1
======================
WriteRowsEventData>>1
======================
WriteRowsEventData>>2
======================
WriteRowsEventData>>4
======================
WriteRowsEventData>>8
======================
WriteRowsEventData>>16

## devided ##
======================
WriteRowsEventData>>25
WriteRowsEventData>>7
======================
WriteRowsEventData>>25
WriteRowsEventData>>25
WriteRowsEventData>>14

Interestingly, up to 16 rows cases are transmitted as write rows events as intended. From 32 rows case, two write rows events are transmitted. As a result, it is divided into approximately 25 pieces.

mysql> update binlog_test set v = repeat(md5(rand()), 10);

This time, approximately 12 rows are included in single update rows event. This is because the update rows event includes additional before content (this format <Serializable[],Serializable[]>).

UpdateRowsEventData>>12
UpdateRowsEventData>>12
UpdateRowsEventData>>12
UpdateRowsEventData>>12
UpdateRowsEventData>>12
UpdateRowsEventData>>12
UpdateRowsEventData>>12
UpdateRowsEventData>>12
UpdateRowsEventData>>12
UpdateRowsEventData>>12
UpdateRowsEventData>>8
======================

If you roughly calculate this size, it is calculated to be about 8K. This is calculated by --binlog-row-event-max-size. And Variable 'binlog_row_event_max_size' is a read only variable, so you need to add in my.cnf before startup mysql server.

Specify the maximum size of a row-based binary log event, in bytes. Rows are grouped into events smaller than this size if possible. The value should be a multiple of 256. The default is 8192.

binlog_row_event_max_size

You might ask if one operation is unstable if data is split and passed in this way. However, in binlog, it is recorded in the binlog in the form below and data is processed.

In other words, there is no problem because data is processed in a single transaction.

query event(begin)
table map event
> update row event (12)
> update row event (12)
> update row event (12)
> update row event (12)
> update row event (12)
> update row event (8)
xid event (commit)

Conclusion

This is something I checked out of personal curiosity. Since the case of using binlog is frequent, this content is meaningful for me.

In a situation where a single row exceeds 8k, if 5-6 rows are frequently updated at a time, binlog data can be received efficiently by adjusting binlog_row_event_max_size upward.

Each time only a single update rows event is delivered on default 8k binlog_row_event_max_size, one update rows event will be delivered as an array. (Of course, as long as memory allows.)

Hope this helps. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s