Debugging Out of Memory issues on Wordpress Server

We run several Wordpress sites on a Linode server. Lately MariaDB has been getting killed by the OOM (out of memory killer):

[1308796.534092] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mariadb.service,task=mariadbd,pid=162181,uid=973
[1308796.534492] Out of memory: Killed process 162181 (mariadbd) total-vm:667280kB, anon-rss:27808kB, file-rss:128kB, shmem-rss:0kB, UID:973 pgtables:692kB oom_score_adj:0
[1308797.835946] systemd-journald[187]: Under memory pressure, flushing caches.

The main software components involved are MariaDB, Caddy2, and php-fpm.

The Linode metrics are interesting:

But are not very granular.

To get more insight into what is happening, I installed SIOT and pointed it to our portal instance so I can view the data with Grafana.

No smoking guns yet, but the php-fpm CPU usage looks interesting.

1 Like

Got a few more events in the last 12h:

[root@web3 cbrake]# journalctl --since yesterday -u mariadb | grep "OOM killer"
Jan 19 14:15:54 web3 systemd[1]: mariadb.service: A process of this unit has been killed by the OOM killer.
Jan 19 14:17:57 web3 systemd[1]: mariadb.service: A process of this unit has been killed by the OOM killer.
Jan 19 18:34:54 web3 systemd[1]: mariadb.service: A process of this unit has been killed by the OOM killer.

Interestingly, php-fpm is consuming a lot of memory but mariadb is getting killed ā€“ hmm ā€¦

Looking at dmesg, it appears there are many php-fpm tasks. SIOT is adding the total from all those tasks and the OOM killer probably just looks at single tasks.

[72420.250119] Tasks state (memory values in pages):
[72420.250120] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[72420.250126] [    186]     0   186    20808       56   176128      784          -250 systemd-journal
[72420.250132] [    188]     0   188     8683       24    81920      512         -1000 systemd-udevd
[72420.250138] [    251]   977   251     5534       64    86016      576             0 systemd-resolve
[72420.250141] [    252]   976   252    22950       69    90112      736             0 systemd-timesyn
[72420.250143] [    260]    81   260     2298       60    61440      160          -900 dbus-broker-lau
[72420.250146] [    261]    81   261     1094       53    49152      128          -900 dbus-broker
[72420.250151] [    262]     0   262    32905       49   167936     6656             0 firewalld
[72420.250153] [    263]     0   263     4493       64    73728      222             0 systemd-logind
[72420.250158] [    265]   980   265     4759       64    77824      256             0 systemd-network
[72420.250161] [    280]    33   280   317218      515   184320     2920             0 caddy2
[72420.250165] [    282]     0   282    66515      197   208896     1760             0 php-fpm
[72420.250168] [    283]     0   283     2775       64    65536      256         -1000 sshd
[72420.250173] [    287]     0   287     1495       32    49152       64             0 agetty
[72420.250175] [    288]     0   288     1374       32    53248       32             0 agetty
[72420.250178] [    337]  1003   337   316761     1964   221184     4214             0 siot
[72420.250180] [   9464]   973  9464   168356     5425   729088    32085             0 mariadbd
[72420.250183] [   9897]    33  9897   103392    13502   376832     6825             0 php-fpm
[72420.250185] [   9900]    33  9900   103241    13051   376832     6634             0 php-fpm
[72420.250188] [   9902]    33  9902   103382    12550   376832     7210             0 php-fpm
[72420.250191] [  11104]     0 11104     3630       32    73728      352             0 sshd
[72420.250196] [  11121]  1000 11121     3702       57    73728      416             0 sshd
[72420.250199] [  11122]  1000 11122     1944       32    53248      192             0 bash
[72420.250204] [  11128]  1000 11128     4516       32    73728      288             0 sudo
[72420.250206] [  11130]  1000 11130     4516       18    61440      256             0 sudo
[72420.250211] [  11131]     0 11131     2464       32    61440       96             0 su
[72420.250214] [  11133]     0 11133     1969       32    57344      192             0 bash
[72420.250216] [  11134]     0 11134     5109       28    81920      224             0 journalctl
[72420.250219] [  11135]     0 11135     1692       32    49152       64             0 less
[72420.250221] [  11137]    33 11137    83562     9405   303104     9033             0 php-fpm
[72420.250224] [  11139]    33 11139    78314     7718   266240     6014             0 php-fpm
[72420.250226] [  11141]    33 11141    79850    10173   278528     4691             0 php-fpm
[72420.250229] [  11143]    33 11143    84392     7472   311296    11491             0 php-fpm
[72420.250231] [  11145]    33 11145    83368    10080   303104     8195             0 php-fpm
[72420.250234] [  11147]    33 11147    83368    11570   303104     6691             0 php-fpm
[72420.250236] [  11149]    33 11149    79607     8385   278528     6563             0 php-fpm
[72420.250241] [  11151]    33 11151    79607     8382   278528     6211             0 php-fpm
[72420.250244] [  11152]    33 11152    79095    10519   274432     3907             0 php-fpm
[72420.250246] [  11155]    33 11155    77046     9562   253952     2759             0 php-fpm
[72420.250248] [  11156]    33 11156    72806     4711   221184     3079             0 php-fpm
[72420.250251] [  11159]    33 11159    72294     5933   217088     1479             0 php-fpm
[72420.250253] [  11160]    33 11160    72294     5741   217088     1575             0 php-fpm
[72420.250255] [  11163]    33 11163    72294     5989   217088     1380             0 php-fpm
[72420.250258] [  11164]     0 11164     2825      192    61440      128             0 sshd
[72420.250260] [  11166]    33 11166    72219     5967   217088     1188             0 php-fpm
[72420.250265] [  11168]    33 11168    70683     4875   204800     1124             0 php-fpm
[72420.250268] [  11170]    33 11170    70683     4927   204800     1092             0 php-fpm
[72420.250270] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mariadb.service,task=mariadbd,pid=9464,uid=973
[72420.250356] Out of memory: Killed process 9464 (mariadbd) total-vm:673424kB, anon-rss:21572kB, file-rss:128kB, shmem-rss:0kB, UID:973 pgtables:712kB oom_score_adj:0

Recent activity:

[root@web3 cbrake]# journalctl --since -7d | grep "invoked oom-killer"
Feb 01 17:26:25 web3 kernel: systemd-timesyn invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 02 16:59:32 web3 kernel: sshd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 02 17:00:05 web3 kernel: siot invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 03 14:01:19 web3 kernel: systemd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 04 06:27:45 web3 kernel: siot invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 06 01:08:39 web3 kernel: php-fpm invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 06 01:09:01 web3 kernel: sshd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=-1000
Feb 06 02:49:31 web3 kernel: mariadbd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 06 02:50:02 web3 kernel: systemd-journal invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=-250
Feb 06 10:13:19 web3 kernel: siot invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 06 10:14:22 web3 kernel: mariadbd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 07 06:46:47 web3 kernel: sshd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 07 06:47:38 web3 kernel: systemd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 07 06:48:39 web3 kernel: siot invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 07 06:48:47 web3 kernel: mariadbd invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0

php-fpm CPU spikes, but everything else looks pretty flat.

This problem could likely be solved very easily by adjusting the php-fpm setup, but Iā€™m using this as an exercise to learn how to instrument things better. The next task is to collect metrics from Caddy:

Should I spin up a Prometheus instance, or add metrics scraping to SIOT? It would be nice to have everything in one system (SIOT), but Prometheus also has some nice advantages.