cbrake
January 18, 2024, 9:53pm
1
We run several Wordpress sites on a Linode server. Lately MariaDB has been getting killed by the OOM (out of memory killer):
[1308796.534092] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mariadb.service,task=mariadbd,pid=162181,uid=973
[1308796.534492] Out of memory: Killed process 162181 (mariadbd) total-vm:667280kB, anon-rss:27808kB, file-rss:128kB, shmem-rss:0kB, UID:973 pgtables:692kB oom_score_adj:0
[1308797.835946] systemd-journald[187]: Under memory pressure, flushing caches.
The main software components involved are MariaDB, Caddy2, and php-fpm.
The Linode metrics are interesting:
But are not very granular.
To get more insight into what is happening, I installed SIOT and pointed it to our portal instance so I can view the data with Grafana.
No smoking guns yet, but the php-fpm CPU usage looks interesting.
1 Like
cbrake
January 19, 2024, 8:24pm
2
Got a few more events in the last 12h:
[root@web3 cbrake]# journalctl --since yesterday -u mariadb | grep "OOM killer"
Jan 19 14:15:54 web3 systemd[1]: mariadb.service: A process of this unit has been killed by the OOM killer.
Jan 19 14:17:57 web3 systemd[1]: mariadb.service: A process of this unit has been killed by the OOM killer.
Jan 19 18:34:54 web3 systemd[1]: mariadb.service: A process of this unit has been killed by the OOM killer.
Interestingly, php-fpm is consuming a lot of memory but mariadb is getting killed ā hmm ā¦
Looking at dmesg, it appears there are many php-fpm tasks. SIOT is adding the total from all those tasks and the OOM killer probably just looks at single tasks.
[72420.250119] Tasks state (memory values in pages):
[72420.250120] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[72420.250126] [ 186] 0 186 20808 56 176128 784 -250 systemd-journal
[72420.250132] [ 188] 0 188 8683 24 81920 512 -1000 systemd-udevd
[72420.250138] [ 251] 977 251 5534 64 86016 576 0 systemd-resolve
[72420.250141] [ 252] 976 252 22950 69 90112 736 0 systemd-timesyn
[72420.250143] [ 260] 81 260 2298 60 61440 160 -900 dbus-broker-lau
[72420.250146] [ 261] 81 261 1094 53 49152 128 -900 dbus-broker
[72420.250151] [ 262] 0 262 32905 49 167936 6656 0 firewalld
[72420.250153] [ 263] 0 263 4493 64 73728 222 0 systemd-logind
[72420.250158] [ 265] 980 265 4759 64 77824 256 0 systemd-network
[72420.250161] [ 280] 33 280 317218 515 184320 2920 0 caddy2
[72420.250165] [ 282] 0 282 66515 197 208896 1760 0 php-fpm
[72420.250168] [ 283] 0 283 2775 64 65536 256 -1000 sshd
[72420.250173] [ 287] 0 287 1495 32 49152 64 0 agetty
[72420.250175] [ 288] 0 288 1374 32 53248 32 0 agetty
[72420.250178] [ 337] 1003 337 316761 1964 221184 4214 0 siot
[72420.250180] [ 9464] 973 9464 168356 5425 729088 32085 0 mariadbd
[72420.250183] [ 9897] 33 9897 103392 13502 376832 6825 0 php-fpm
[72420.250185] [ 9900] 33 9900 103241 13051 376832 6634 0 php-fpm
[72420.250188] [ 9902] 33 9902 103382 12550 376832 7210 0 php-fpm
[72420.250191] [ 11104] 0 11104 3630 32 73728 352 0 sshd
[72420.250196] [ 11121] 1000 11121 3702 57 73728 416 0 sshd
[72420.250199] [ 11122] 1000 11122 1944 32 53248 192 0 bash
[72420.250204] [ 11128] 1000 11128 4516 32 73728 288 0 sudo
[72420.250206] [ 11130] 1000 11130 4516 18 61440 256 0 sudo
[72420.250211] [ 11131] 0 11131 2464 32 61440 96 0 su
[72420.250214] [ 11133] 0 11133 1969 32 57344 192 0 bash
[72420.250216] [ 11134] 0 11134 5109 28 81920 224 0 journalctl
[72420.250219] [ 11135] 0 11135 1692 32 49152 64 0 less
[72420.250221] [ 11137] 33 11137 83562 9405 303104 9033 0 php-fpm
[72420.250224] [ 11139] 33 11139 78314 7718 266240 6014 0 php-fpm
[72420.250226] [ 11141] 33 11141 79850 10173 278528 4691 0 php-fpm
[72420.250229] [ 11143] 33 11143 84392 7472 311296 11491 0 php-fpm
[72420.250231] [ 11145] 33 11145 83368 10080 303104 8195 0 php-fpm
[72420.250234] [ 11147] 33 11147 83368 11570 303104 6691 0 php-fpm
[72420.250236] [ 11149] 33 11149 79607 8385 278528 6563 0 php-fpm
[72420.250241] [ 11151] 33 11151 79607 8382 278528 6211 0 php-fpm
[72420.250244] [ 11152] 33 11152 79095 10519 274432 3907 0 php-fpm
[72420.250246] [ 11155] 33 11155 77046 9562 253952 2759 0 php-fpm
[72420.250248] [ 11156] 33 11156 72806 4711 221184 3079 0 php-fpm
[72420.250251] [ 11159] 33 11159 72294 5933 217088 1479 0 php-fpm
[72420.250253] [ 11160] 33 11160 72294 5741 217088 1575 0 php-fpm
[72420.250255] [ 11163] 33 11163 72294 5989 217088 1380 0 php-fpm
[72420.250258] [ 11164] 0 11164 2825 192 61440 128 0 sshd
[72420.250260] [ 11166] 33 11166 72219 5967 217088 1188 0 php-fpm
[72420.250265] [ 11168] 33 11168 70683 4875 204800 1124 0 php-fpm
[72420.250268] [ 11170] 33 11170 70683 4927 204800 1092 0 php-fpm
[72420.250270] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mariadb.service,task=mariadbd,pid=9464,uid=973
[72420.250356] Out of memory: Killed process 9464 (mariadbd) total-vm:673424kB, anon-rss:21572kB, file-rss:128kB, shmem-rss:0kB, UID:973 pgtables:712kB oom_score_adj:0
cbrake
February 7, 2024, 1:44pm
3
Recent activity:
[root@web3 cbrake]# journalctl --since -7d | grep "invoked oom-killer"
Feb 01 17:26:25 web3 kernel: systemd-timesyn invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 02 16:59:32 web3 kernel: sshd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 02 17:00:05 web3 kernel: siot invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 03 14:01:19 web3 kernel: systemd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 04 06:27:45 web3 kernel: siot invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 06 01:08:39 web3 kernel: php-fpm invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 06 01:09:01 web3 kernel: sshd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=-1000
Feb 06 02:49:31 web3 kernel: mariadbd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 06 02:50:02 web3 kernel: systemd-journal invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=-250
Feb 06 10:13:19 web3 kernel: siot invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 06 10:14:22 web3 kernel: mariadbd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 07 06:46:47 web3 kernel: sshd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 07 06:47:38 web3 kernel: systemd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 07 06:48:39 web3 kernel: siot invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 07 06:48:47 web3 kernel: mariadbd invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0
php-fpm CPU spikes, but everything else looks pretty flat.
This problem could likely be solved very easily by adjusting the php-fpm setup, but Iām using this as an exercise to learn how to instrument things better. The next task is to collect metrics from Caddy:
Should I spin up a Prometheus instance, or add metrics scraping to SIOT? It would be nice to have everything in one system (SIOT), but Prometheus also has some nice advantages.