Hello Guest

Author Topic: Pandora_Server Process Runaway  (Read 208 times)

0 Members and 1 Guest are viewing this topic.

Offline Murigar

  • Full Member
  • ***
  • Posts: 43
  • Karma: 1
    • View Profile
Pandora_Server Process Runaway
« on: December 15, 2016, 01:56:28 PM »
I have had my Pandora server for many years and been though many successful upgrades.
In the last year or so the Pandora_Server process will now and then act up and consume 100% of CPU resources.
(between every day to every other month)
A reboot will resolve the issue for a non predictable amount of time.
A restart of pandora_server service will not resolve the issue.

Within Centos.
#TOP will show 120-200% utilization.
# iostat -x 1 will show basically no utilization.
Memory utilization really does not increase either.

It really appears to be strictly the CPU utilization that gets out of hand.

I would appreciate any assistance in troubleshooting this reoccurring issue.

Offline Murigar

  • Full Member
  • ***
  • Posts: 43
  • Karma: 1
    • View Profile
Re: Pandora_Server Process Runaway
« Reply #1 on: December 15, 2016, 02:44:16 PM »
Output of # ./pandora_diagnostic.sh below

Code: [Select]
Information gathered at 20161215_152819
Linux pandorafms.████████████ 2.6.32-642.11.1.el6.x86_64 #1 SMP Fri Nov 18 19:25:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
=========================================================================
-----------------------------------------------------------------
CPUINFO
-----------------------------------------------------------------
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU            5160  @ 3.00GHz
stepping : 6
microcode : 210
cpu MHz : 2992.499
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc arch_perfmon pebs bts tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni ssse3 cx16 x2apic hypervisor lahf_lm dtherm
bogomips : 5984.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU            5160  @ 3.00GHz
stepping : 6
microcode : 210
cpu MHz : 2992.499
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc arch_perfmon pebs bts tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni ssse3 cx16 x2apic hypervisor lahf_lm dtherm
bogomips : 5984.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

-----------------------------------------------------------------
MEMINFO
-----------------------------------------------------------------
MemTotal:       13924732 kB
MemFree:        13056804 kB
Buffers:          148744 kB
Cached:           160592 kB
SwapCached:            0 kB
Active:           562368 kB
Inactive:         133996 kB
Active(anon):     387192 kB
Inactive(anon):      228 kB
Active(file):     175176 kB
Inactive(file):   133768 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       3096572 kB
SwapFree:        3096572 kB
Dirty:                44 kB
Writeback:             0 kB
AnonPages:        387084 kB
Mapped:            26084 kB
Shmem:               400 kB
Slab:              46208 kB
SReclaimable:      18372 kB
SUnreclaim:        27836 kB
KernelStack:        3152 kB
PageTables:        10784 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    10058936 kB
Committed_AS:     810156 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      179744 kB
VmallocChunk:   34359547264 kB
HardwareCorrupted:     0 kB
AnonHugePages:     40960 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       10240 kB
DirectMap2M:    14325760 kB
-----------------------------------------------------------------
Other System Parameters
-----------------------------------------------------------------
Uptime:  15:28:19 up  2:01,  1 user,  load average: 0.61, 0.71, 0.57
-----------------------------------------------------------------
PROC INFO (Pandora)
-----------------------------------------------------------------
root      1903  0.0  0.0 145536  7416 ?        SN   13:28   0:02 /usr/bin/perl /usr/bin/pandora_agent /etc/pandora
pandora   2343  0.0  0.0  18604   960 ?        Ss   13:28   0:00 /usr/bin/anytermd --port 8023 --user pandora -c telnet %p
pandora   2345  0.0  0.0  18604   956 ?        Ss   13:28   0:00 /usr/bin/anytermd --port 8022 --user pandora -c ssh %p
root      2388 25.7  2.2 1661464 310408 ?      Ssl  13:28  30:52 /usr/bin/perl /usr/bin/pandora_server /etc/pandora/pandora_server.conf -D
root      2393  0.0  0.0 191152  2268 ?        Ss   13:28   0:00 /usr/sbin/snmptrapd -t -On -n -a -Lf /var/log/pandora/pandora_snmptrap.log -p /var/run/pandora_snmptrapd.pid --format1=SNMPv1[**]%4y-%02.2m-%l[**]%02.2h:%02.2j:%02.2k[**]%a[**]%N[**]%w[**]%W[**]%q[**]%v\n --format2=SNMPv2[**]%4y-%02.2m-%l[**]%02.2h:%02.2j:%02.2k[**]%b[**]%v\n
pandora   2433  0.0  0.0 228124  9692 ?        Ss   13:28   0:01 /usr/bin/perl /usr/bin/tentacle_server -a 0.0.0.0 -p 41121 -s /var/spool/pandora/data_in -i.*\.conf:conf;.*\.md5:md5;.*\.zip:collections -d
root      6852  0.0  0.0 106112  1224 pts/0    S+   15:28   0:00 /bin/bash ./pandora_diagnostic.sh
root      6860  0.0  0.0 103320   840 pts/0    S+   15:28   0:00 grep pandora
-----------------------------------------------------------------
MySQL Configuration file
-----------------------------------------------------------------
-----------------------------------------------------------------
Pandora FMS Server Configuration file
-----------------------------------------------------------------
#############################################################################
# Pandora FMS Server Parameters
# Pandora FMS, the Flexible Monitoring System.
# Version 5.0SP2
# Licensed under GPL license v2,
# (c) 2003-2013 Artica Soluciones Tecnologicas
# http://www.pandorafms.com
# Please change it for your setup needs
#############################################################################

# Servername: Name of this server
# if not given, it takes hostname. It's preferable to setup one
# because machine name could change by some reason.

#servername adama

# incomingdir:  Defines directory where incoming data packets are stored
# You could set directory relative to base path or absolute, starting with /

incomingdir /var/spool/pandora/data_in

# log_file: Main logfile for pandora_server
# You could set file relative to base path or absolute, starting with /

log_file /var/log/pandora/pandora_server.log

# Log file for Pandora FMS SNMP console. Its generated by NetSNMP Trap daemon

snmp_logfile /var/log/pandora/pandora_snmptrap.log

# Error logfile: aux logfile for pandora_server errors (in Daemon mode)
# You could set file relative to base path or absolute, starting with /

errorlog_file /var/log/pandora/pandora_server.error

# daemon: Runs in daemon mode (background) if 1, if 0 runs in foreground
# this could be also configured on commandline with -D option

# daemon 1

# dbengine: mysql, postgresql or oracle (mysql by default)

dbengine mysql

# Database credentials. A VERY important configuration.
# This must be the same credentials used by your Pandora FMS Console
# but could be different if your console is not running in the same
# host than the server. Check your console setup in /include/config.php

# dbname: Database name (pandora by default)

dbname pandora

# dbuser:  Database user name (pandora by default)

dbuser pandora



# dbhost: Database hostname or IP address

dbhost localhost

# dbport: Database port number
# Default value depends on the dbengine (mysql: 3306, postgresql: 5432, oracle: 1521)

#dbport 3306

# By default, parent agent will not update

#update_parent 0

# verbosity: level of detail on errors/messages (0 default, 1 verbose, 2 debug.... 10 noisy)
# -v in command line (verbose) or -d (debug). Set this to 10 when try to locate problems and
# set to 0 or 1 on production enviroments.

verbosity 0

# Master Server, 1 if master server (normal mode), 0 for slave mode (slave in multi-server setup)

master 1

# Activate Pandora SNMP console (depending on snmptrapd)

snmpconsole 1

# snmptrapd will ignore authenticationFailure traps if set to 1.

snmp_ignore_authfailure 1

# snmptrapd will read the PDU source address instead of the agent-addr field is set to 1.

snmp_pdu_address 0

# Activate (1) Pandora Network Server

networkserver 1

# Activate (1) Pandora Data Server

dataserver 1

# Activate (1) Pandora FMS Recon server

reconserver 1

# pluginserver : 1 or 0. Set to 1 to activate plugin server with this setup

pluginserver 1

# Pandora FMS Plugin exec tool filepath (by default at /usr/bin)

plugin_exec /usr/bin/timeout

# predictionserver : 1 or 0. Set to 1 to activate prediction server with this setup
# DISABLED BY DEFAULT

predictionserver 0

# wmiserver : 1 or 0. Set to 1 to activate WMI server with this setup
# DISABLED BY DEFAULT

wmiserver 1

# Network timeout (in seconds) for timeout in network connections for Network agents

network_timeout 3

# Server keepalive (in seconds)

server_keepalive 45

# Server Threshold: defines number of seconds of main loop (in sec)

server_threshold 10

# Network threads: Do not set too high (~40). Each threads make a network module check.

network_threads 5

# icmp_checks x : defines number of pings for each icmp_proc module type. at least one of
# that ping should be 1 to report 1

icmp_checks 1

# tcp specific options :
# tcp_checks: number of tcp retries if first attempt fails.
# tcp_timeout: specific timeout for tcp connections

tcp_checks 1
tcp_timeout 30

# snmp specific options :
# snmp_checks: number of snmp request retries if first attempt fails.
# snmp_timeout: specific timeout for snmp request.

snmp_checks 1
snmp_timeout 5

# snmp_proc_deadresponse 1 (default): Return DOWN if cannot contact
# or receive NULL from a SNMP PROC module.

snmp_proc_deadresponse 1

# plugin_threads: Specify number of plugin server threads for processing plugin calls

plugin_threads 1

# plugin_timeout: Specify number of seconds calling plugin exec waiting for response
# after this time, call is aborted and result is "unknown".

plugin_timeout 15

# wmi_timeout : specific timeout for wmi request.

wmi_timeout 20

# wmi_threads: Specify number of WMI server threads for processing WMI remote calls

wmi_threads 2

# recon_threads. Each thread will scan a different scantask.

recon_threads 1

# dataserver_threads: Number of threads for data server (XML processing threads)

dataserver_threads 1

# mta_address: External Mailer (MTA) IP Address to be used by Pandora FMS internal email capabilities

mta_address localhost

# mta_port, this is the mail server port (default 25)

#mta_port 25

# mta_user MTA User (if needed for auth, FQD or simple user, depending on your server)

#mta_user myuser@mydomain.com



# mta_auth MTA Auth system (if needed, it supports LOGIN, PLAIN, CRAM-MD5, DIGEST-MD)

#mta_auth LOGIN

# mta_from Email address that sends the mail, by default is pandora@localhost
#           probably you need to change it to avoid problems with your antispam

#mta_from Pandora FMS <pandora@mydomain.com>

# Set 1 if want eMail deliver alert in separate mail  (default).
# Set 0 if want eMail deliver shared mail by all destination.
mail_in_separate 1


# xprobe2: Optional package to detect OS types using advanced TCP/IP
# fingerprinting tecniques, much more accurates than stadard nmap.
# If not provided, nmap is used insted xprobe2

xprobe2 /usr/bin/xprobe2

# nmap: If provided, is used to detect OS type with recon server using
# advanded OS fingerprint technique. Xprobe2 gives more accurate results
# Nmap is also used to do TCP port scanning in detected host.

nmap /usr/bin/nmap

# snmpget: Needed to do SNMP checks. By default is on /usr/bin/snmpget

snmpget /usr/bin/snmpget

# Location of the braa binary needed by the Enterprise SNMP Server (/usr/bin/braa by default) (PANDORA FMS ENTERPRISE ONLY).

braa /usr/bin/braa

# Number of retries before braa hands a module over to the Network Server (PANDORA FMS ENTERPRISE ONLY).

braa_retries 3

# Default group id for new agents created with Pandora FMS Data Server

autocreate_group 2

# Set to 1 if want to autocreate agents with Pandora FMS Data Server,
# set to 0 to disable

autocreate 1

# max_log_size: Specify max size of Pandora FMS server log file (1MB by default). If
# log file grows above this limit, is renamed to "pandora_server.log.old".

max_log_size 65536

# max_queue_files (500 by default)
# When server have more than max_queue_files in incoming directory, skips the read   
# the directory to avoid filesystem overhead.

max_queue_files 500

# Use the XML file last modification time as timestamp.

# use_xml_timestamp 1

# Pandora FMS will autorestart itself each XXX seconds, use this if you experience problems with
# shutting down threads, or other stability problems.

# auto_restart 86400

# Pandora FMS will restart after restart_delay seconds on critical errors.

# restart 0
# restart_delay 60

# More information about GIS Setup in /usr/share/pandora_server/util/gis.README
# Flag to activate GIS (positional information for agents and maps)
# by default it is desactivated

activate_gis 0

# Radius of error in meters to consider two gis locations as the same location.

#location_error 50

# Recon reverse geolocation mode [disabled, sql, file]
#   disabled    The recon task doesn't try to geolocate the ip discovered.
#   sql         The recon task trys to query the SQL database to geolocate the
#               ip discovered
#   file        The recon task trys to find the geolocation information of the
#               ip discovered in the file indicated in the
#                recon_reverse_geolocation_file parameter

# recon_reverse_geolocation_mode disabled

# Recon reverse geolocation file. This is the database with the reverse
# geolocation information using MaxMind GPL GeoLiteCity.dat format).

#recon_reverse_geolocation_file /usr/local/share/GeoIP/GeoIPCity.dat

# Radius (in meters) of the circle in where the agents will be place randomly
# when finded by a recon task. Center of the circle is guessed
# by geolocating the IP.

#recon_location_scatter_radius 1000

# Pandora Server self-monitoring (embedded agent) (by default enabled)

self_monitoring 1

# Update parent from the agent xml

#update_parent 1
#
#
# This enable realtime reverse geocoding using Google Maps public api.
# This requires internet access, and could have performance penalties processing GIS
# information due the connetion needed to resolve all GIS input.
# NOTE: If you dont pay the service to google, they will ban your IP in a few days.

 google_maps_description 1

# This enable realtime reverse geocoding using Openstreet Maps public api.
# This requires internet access, and could have performance penalties processing GIS
# information due the connetion needed to resolve all GIS input.
# You can alter the code to use a local (your own) openstreet maps server.

# openstreetmaps_description 1

# Enable (1) or disable (0) Pandora FMS Event Web Server (PANDORA FMS ENTERPRISE ONLY).

webserver 1

# Number of threads for the Web Server (PANDORA FMS ENTERPRISE ONLY).

web_threads 1

# Uncomment to perform web checks with CURL instead of LWP.
#web_engine curl

# Enable (1) or disable (0) Pandora FMS Inventory Server (PANDORA FMS ENTERPRISE ONLY).

inventoryserver 1

# Number of threads for the Web Server (PANDORA FMS ENTERPRISE ONLY).

inventory_threads 1

# Enable (1) or disable (0) Pandora FMS Export Server (PANDORA FMS ENTERPRISE ONLY).

exportserver 0

# Number of threads for the Export Server (PANDORA FMS ENTERPRISE ONLY).

export_threads 1

# Enable (1) or disable (0) Pandora FMS Event Server (PANDORA FMS ENTERPRISE ONLY).

eventserver 0

# Event Server event window in seconds (3600 by default) (PANDORA FMS ENTERPRISE ONLY).

event_window 3600

# Enable (1) or disable (0) Pandora FMS Enterprise ICMP Server (PANDORA FMS ENTERPRISE ONLY).
# You need nmap 5.20 or higher in order to use this !

icmpserver 0

# Number of threads for the Enterprise ICMP Server (PANDORA FMS ENTERPRISE ONLY).

icmp_threads 2

# Enable (1) or disable (0) Pandora FMS Enterprise SNMP Server (PANDORA FMS ENTERPRISE ONLY).
# Check braa tool is running and operative.

snmpserver 0

# Number of threads for the Enterprise SNMP Server (PANDORA FMS ENTERPRISE ONLY).

snmp_threads 2

# Block size for block producer/consumer servers, that is, the number of modules
# per block (15 by default) (PANDORA FMS ENTERPRISE ONLY).

block_size 15

# If set to 1, process XML data files in a stack instead of a queue. 0 by default.
# WARNING: Incremental modules will not work properly if dataserver_lifo is set to 1!!!

dataserver_lifo 0

# If set to 1, the policy manager is enabled and the server is listening the policy queue.
# 0 by default (PANDORA FMS ENTERPRISE ONLY)

policy_manager 1

# If set to 1, the event replicate process is enabled. 0 by default. (PANDORA FMS ENTERPRISE ONLY)
# WARNING: This process doesn't do anything if is not properly configured from the console setup

event_replication 0

# If set to 1, new events validate older event for the same module. This will
# affect the performance of the server. This was the "normal behaviour" on previous (4.x) versions.
# disable only if you really know what you are doing !!.

event_auto_validation 1

# If defined, events generated by Pandora FMS will be written to the specified text file.
#event_file /var/log/pandora/pandora_events.txt

# Set the maximum number of traps that will be processed from a single source in a
# configured time interval.
#snmp_storm_protection 10

# Time interval for snmp_storm protection (in seconds).
#snmp_storm_timeout 600

# Default texts for some events. The macros _module_ and _data_ are supported.
#text_going_down_normal Module '_module_' is going to NORMAL (_data_)
#text_going_up_critical Module '_module_' is going to CRITICAL (_data_)
#text_going_up_warning Module '_module_' is going to WARNING (_data_)
#text_going_down_warning Module '_module_' is going to WARNING (_data_)
#text_going_unknown Module '_module_' is going to UNKNOWN

# Events older that the specified time (in seconds) will be auto-validated. Set to 0 to disable this feature.
event_expiry_time 0

# Only events more recent than the specified time window (in seconds) will be auto-validated. This value must
# be greater than event_expiry_time.
#event_expiry_window 86400


mta_address 127.0.0.1
mta_port 25
mta_from Pandora FMS <pandorafms@████████████>

-----------------------------------------------------------------
Pandora FMS Logfiles information
-----------------------------------------------------------------
total 264
drwxr-xr-x.  2 pandora root  4096 Dec 15 13:28 .
drwxr-xr-x. 13 root    root  4096 Dec 15 15:12 ..
-rw-r--r--.  1 root    root  1007 Dec 15 13:28 pandora_agent.log
-rw-rw-rw-   1 root    root    43 Jul 26 16:17 pandora_alert.log
-rw-rw-rw-.  1 root    root 86058 Dec 15 15:01 pandora_server.error
-rw-rw-rw-   1 root    root 10812 Nov  4 03:13 pandora_server.error-20161104.gz
-rw-rw-rw-   1 root    root 10733 Nov 24 03:17 pandora_server.error-20161124.gz
-rw-rw-rw-   1 root    root 11294 Dec 12 04:14 pandora_server.error-20161212.gz
-rw-rw-rw-   1 root    root 40159 Apr 13  2016 pandora_server.log
-rw-rw-rw-   1 root    root 65576 May 29  2014 pandora_server.log.old
-rw-rw-rw-   1 root    root    21 Dec 15 13:28 pandora_snmptrap.log
-rw-rw-rw-   1 root    root     4 Dec 15 13:28 pandora_snmptrap.log.index
-----------------------------------------------------------------
System disk
-----------------------------------------------------------------
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_pandorafms-lv_root
                       50G  4.3G   45G   9% /
tmpfs                 6.7G     0  6.7G   0% /dev/shm
/dev/sda1             477M  173M  279M  39% /boot
/dev/mapper/vg_pandorafms-lv_home
                       11G  462M  9.3G   5% /home
-----------------------------------------------------------------
Vmstat (5 execs)
-----------------------------------------------------------------
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 13056788 148744 160592    0    0    13    97  685  185  5 13 80  2  0
 0  0      0 13056556 148744 160628    0    0     0    96  395  244  2  5 92  2  0
 0  0      0 13056556 148744 160628    0    0     0  1560  157  167  1  1 92  7  0
 1  0      0 13056400 148760 160612    0    0     0   200 6450 1000 11 49 37  3  0
 2  1      0 13056540 148852 160556    0    0     0   400 8832 1290 16 59 23  3  0
-----------------------------------------------------------------
System dmesg
-----------------------------------------------------------------
DMESG OMITTED

Offline ayad99

  • Newbie
  • *
  • Posts: 3
  • Karma: 0
    • View Profile
Re: Pandora_Server Process Runaway
« Reply #2 on: January 15, 2017, 08:42:24 AM »
Hi Murigar

CPU utilisation of above 100% shown in response to TOP command is normal. This %age is from the total available processors in your server. So if you have 8 processors, it means you have 800% available. So even if your CPU util goes 200-400%, I think it's normal.

What issue are you facing specifically which makes you think that it's a CPU issue?

Regards

Offline antonio.s

  • Administrator
  • Smart member
  • *****
  • Posts: 288
  • Karma: 2
    • View Profile
Re: Pandora_Server Process Runaway
« Reply #3 on: January 16, 2017, 04:17:29 AM »
Hello Murigar,

Have you seen if there is some big amount of pandorafms_server threads or processes open by the time your server reach high CPU ratios?
Have you made any change on the configuration like increasing the number of server threads (dataserver_threads, networkserver_threads...) ?

Also, keep in mind what Murigar explains accurately. Maybe even a different version of top program may display the results and CPU utilization in different ways.

Kind regards,
Antonio.

Offline Murigar

  • Full Member
  • ***
  • Posts: 43
  • Karma: 1
    • View Profile
Re: Pandora_Server Process Runaway
« Reply #4 on: January 16, 2017, 12:29:27 PM »
ayad99 - The reason I believe this is acting abnormally is. The "Lag" as listed in "Pandora Servers" is generally 0 - 5 seconds. With 0 - 10 items queued.
Generally CPU as listed in VMware is about 10% utilized.

When this runaway occurs, the "Lag" spikes up to 20 minutes with hundreds of items in the queue.
CPU as listed in VMware will sit at 100% utilized.
With this huge "Lag" things will start triggering causing false alerts.

It will sit this way for hours requiring a reboot to clear right up.

antonio.s -
During normal operation there is the one process and 19 threads. I believe this does not increase when acting up.