Hello together,
We installed two ASG425 in an Active/Passive Configuration about 11 Days ago. At the moment, we just have Firewall, HTTP/FTP Proxy, Antivirus, AntiSpyware and HA systems running. We have v7.502 installed.
The CPU load was never really high, we had a average Load of about 10%. That changed today, now the CPU load often peaks to 100% for several minutes, resulting in an average load of over 70% within the past few hours. After reading several posts, I saw that there have been some postgres Porblems. It seems like the high load always occurs, if postgres is working. A ps aux | grep postgres shows:
Code:
postgres 3571 0.0 0.2 48820 5340 ? S Jan20 0:08 /usr/bin/postgres -D /var/storage/pgsql/data
postgres 3637 0.0 1.6 49036 34996 ? Ss Jan20 0:41 postgres: writer process
postgres 3638 0.0 0.0 48820 1136 ? Ss Jan20 0:07 postgres: wal writer process
postgres 3639 0.0 0.0 49148 1400 ? Ss Jan20 0:01 postgres: autovacuum launcher process
postgres 3640 0.0 0.0 6972 1048 ? Ss Jan20 0:47 postgres: stats collector process
postgres 3880 0.0 0.1 6084 3556 ? Ss Jan20 0:03 slon_control
postgres 7198 0.0 1.8 50332 37848 ? Ss Jan20 6:24 postgres: ha_sync pop3 198.19.250.2(49612) idle
postgres 7362 0.0 0.6 50128 13352 ? Ss Jan20 0:00 postgres: postgres smtp 198.19.250.2(49621) idle
postgres 7379 0.0 0.9 50320 19024 ? Ss Jan20 0:00 postgres: postgres smtp 127.0.0.1(44289) idle
postgres 13437 0.0 0.0 2036 696 ? S Jan21 0:00 slon asg_cluster dbname=reporting user=ha_sync
postgres 13438 0.0 0.0 2040 700 ? S Jan21 0:00 slon asg_cluster dbname=pop3 user=ha_sync
postgres 13439 0.0 1.8 96476 37540 ? Sl Jan21 0:51 slon asg_cluster dbname=pop3 user=ha_sync
postgres 13440 0.0 1.9 99092 40380 ? Sl Jan21 2:04 slon asg_cluster dbname=reporting user=ha_sync
postgres 13450 0.0 1.8 50208 37236 ? Ss Jan21 1:21 postgres: ha_sync reporting [local] idle
postgres 13451 0.0 1.7 50208 36272 ? Ss Jan21 0:02 postgres: ha_sync pop3 [local] idle
postgres 13458 0.0 1.9 52560 40208 ? Ss Jan21 1:52 postgres: ha_sync reporting [local] idle
postgres 13459 0.0 1.9 52448 40416 ? Ss Jan21 14:01 postgres: ha_sync reporting [local] idle
postgres 13460 0.0 1.8 50636 38700 ? Ss Jan21 4:39 postgres: ha_sync reporting [local] idle
postgres 13465 0.0 1.9 52500 40044 ? Ss Jan21 1:14 postgres: ha_sync pop3 [local] idle
postgres 13468 0.0 1.9 51936 39668 ? Ss Jan21 1:33 postgres: ha_sync pop3 [local] idle
postgres 13469 0.0 1.8 50628 38156 ? Ss Jan21 0:07 postgres: ha_sync pop3 [local] idle
postgres 13482 0.0 1.8 50188 37288 ? Ss Jan21 2:22 postgres: ha_sync pop3 198.19.250.2(38350) idle
postgres 13483 0.3 2.0 55804 42452 ? Ds Jan21 54:33 postgres: ha_sync reporting 198.19.250.2(38351) FETCH
postgres 13485 0.9 1.9 51616 39696 ? Ss Jan21 165:21 postgres: reporting reporting [local] idle
postgres 16150 0.0 1.8 50240 37560 ? Ss Jan23 10:19 postgres: ha_sync reporting 198.19.250.2(47747) idle
postgres 9952 0.0 0.9 50032 18680 ? Ss 00:15 0:31 postgres: postgres smtp 198.19.250.2(40711) idle
postgres 27381 0.0 0.7 50032 15068 ? Ss 14:21 0:01 postgres: postgres smtp 127.0.0.1(42425) idle
postgres 11410 0.0 0.3 52016 7204 ? Ss 14:54 0:00 postgres: reporting reporting [local] idle
postgres 11441 0.0 0.2 50116 4608 ? Ss 14:54 0:00 postgres: reporting reporting [local] idle
postgres 24225 20.2 4.9 125704 102444 ? Ds 15:17 2:25 postgres: reporting reporting [local] SELECT
postgres 31679 0.1 0.6 181760 14328 ? Ds 15:24 0:00 postgres: autovacuum worker process reporting
postgres 2112 0.0 0.3 165428 7356 ? Ss 15:26 0:00 postgres: autovacuum worker process reporting
postgres 4800 0.0 0.1 49676 3792 ? Ds 15:29 0:00 postgres: autovacuum worker process pop3
which seems to be quite a lot processes to me. In addition, top shows that there is quite much wa, Waiting for IO to complete while postgres is using quite much CPU. This results in 0% CPU Idle.
Code:
top - 15:23:06 up 11 days, 15:30, 1 user, load average: 11.15, 8.85, 6.91
Tasks: 144 total, 1 running, 141 sleeping, 0 stopped, 2 zombie
Cpu0 : 35.4%us, 5.1%sy, 0.0%ni, 0.0%id, 51.5%wa, 2.0%hi, 6.1%si, 0.0%st
Cpu1 : 47.5%us, 12.1%sy, 0.0%ni, 0.0%id, 32.3%wa, 2.0%hi, 6.1%si, 0.0%st
Mem: 2067084k total, 2012572k used, 54512k free, 6164k buffers
Swap: 1052248k total, 132k used, 1052116k free, 1135420k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13483 postgres 16 0 55976 40m 35m D 47 2.0 53:53.52 postgres
2205 chroot 16 0 332m 109m 7588 S 30 5.4 95:47.67 httpproxy
13485 postgres 16 0 51616 38m 36m S 10 1.9 165:07.92 postgres
7277 root 14 -1 27736 25m 708 S 7 1.3 126:07.40 ulogd
24225 postgres 18 0 122m 72m 8448 D 5 3.6 1:07.88 postgres
7790 root 14 -1 3720 2736 472 S 4 0.1 58:56.98 ctsyncd
13459 postgres 16 0 52448 39m 36m D 3 2.0 14:00.68 postgres
12112 postgres 15 0 177m 158m 35m D 3 7.9 2:45.18 postgres
9188 root 15 0 14140 11m 2732 S 2 0.6 12:31.78 websec-reporter
5174 root 15 0 2936 1984 684 S 1 0.1 12:34.63 syslog-ng
30795 root 15 0 13908 8404 7032 S 1 0.4 6:11.78 winbindd
1 root 16 0 720 280 244 S 0 0.0 0:01.07 init
2 root RT 0 0 0 0 S 0 0.0 0:00.70 migration/0
[...]
Is this some kind of Bug, or is there a known workaround for this problem? I really do not like this behaivure, I can't really imagine why that this problem just occured out of nowhere. We have not changed anything within the last days. Any help would be very appreciated.
Thanks