blog.sepa.spb.ru: ElasticSearch internals monitoring by Zabbix (v2 traps)

2014/12/06

ElasticSearch internals monitoring by Zabbix (v2 traps)

Here is more resource oriented version of ElasticSearch monitoring from previous article with using zabbix-traps. Also, it comes with very basic template, which was so asked in comments:

Graphs included:

Shard's nodes status
Indices tasks speed
Indices tasks time spend

Installation:

Save this as /etc/zabbix/zabbix_agentd.d/elasticsearch.conf

#Key jvm.uptime_in_millis used to trigger trap sending
UserParameter=es[*],/etc/zabbix/elasticsearch.py $1

And here is data-getter
/etc/zabbix/elasticsearch.py
Then import template
template_app_elasticsearch.xml

How to add a new counter:

Browse JSON output of your server's
http://localhost:9200/_nodes/_local/stats?all=true
Write path to value of interest using dot as separator, for example
indices.docs.count
Create new counter in zabbix with
key name = es[path.you.found]
and type = zabbix_trap
And here is difference from previous version: Please note that you also need to add this path to counter to traps2 section of elasticsearch.py file. Then execute elasticsearch.py without any parameters and debug of zabbix_sender should be written to console. In top section you should find your new counter key (if it isn't - key is not found or empty in JSON output), and in bottom section number of failed items should be zero (if it isn't - there is no such key for this server configured in zabbix web)

49 comments:

AnonymousMonday, January 05, 2015 11:11:00 AM
Hi, Can you please add the steps to monitor another host instead of localhost .
if i want to monitor like 10.0.1.10 host from my zabbix server 10.0.1.5, so what steps i need to do on remote host and zabbix server as well.
ReplyDelete
Replies
UnknownWednesday, January 07, 2015 7:08:00 PM
when I run script I get this error:

12539:20150107:102815.007 item "nw-esclient-201.prod.pcln.com:es[jvm.uptime_in_millis]" became not supported: Received value [File "/usr/local/zabbix/bin/elasticsearch.py", line 22 "status", ^SyntaxError: invalid syntax] is not suitable for value type [Numeric (float)]
ReplyDelete
Replies
UnknownWednesday, January 07, 2015 7:08:00 PM
when I run script I get this error:

12539:20150107:102815.007 item "nw-esclient-201.prod.pcln.com:es[jvm.uptime_in_millis]" became not supported: Received value [File "/usr/local/zabbix/bin/elasticsearch.py", line 22 "status", ^SyntaxError: invalid syntax] is not suitable for value type [Numeric (float)]
ReplyDelete
Replies
UnknownWednesday, January 14, 2015 7:51:00 PM
Thanks Federico, that helped:

bash-4.1$ ./elasticsearch.py jvm.uptime_in_millis
10116541216

But this errors:

bash-4.1$ ./elasticsearch.py jvm_heap_p_used
zabbix_sender [18783]: Warning: [line 1] '-' encountered as 'Hostname', but no default hostname was specified
Sending failed.
ReplyDelete
Replies
AnonymousWednesday, January 21, 2015 7:41:00 PM
Hi,
Can you help me:

Traceback (most recent call last):
File "./elasticsearch.py", line 70, in
main()
File "./elasticsearch.py", line 64, in main
stats=stats[c.pop(0)]
UnboundLocalError: local variable 'stats' referenced before assignment
ReplyDelete
Replies
AnonymousWednesday, March 04, 2015 4:10:00 PM
Hi There,

This is a great solution, thank you for sharing. I have a small problem with it. elasticsearch.py script runs OK on my other nodes but on my master node, it returns "Unable to load JSON data!" error. I don't know python therefore cannot really figure out what the code does but I assume something fails here:

for node_id in all['nodes']:
if all['nodes'][node_id]['host'].startswith(os.uname()[1]):
node = all['nodes'][node_id]
if len(sys.argv) == 1:
print "node found"
except:
print "Unable to load JSON data!"
sys.exit(1)

Any ideas?
ReplyDelete
Replies
sepaWednesday, March 04, 2015 5:33:00 PM
Looks like cluster node name differs from hostname on that server. Tell me what are they and I'll think how script should be modified to handle this
ReplyDelete
Replies
AnonymousThursday, March 05, 2015 11:50:00 AM
Hmm, that's strange :) I have the same naming convention on all nodes.
hostnames: eslog001.abc.local to eslog005.abc.local (first one is the master and the problem is there)
Node names: ES_ONE, ES_TWO, ES_THREE, ES_FOUR, ES_FIVE
ReplyDelete
Replies
AnonymousTuesday, March 10, 2015 11:34:00 AM
This is really strange. I have created a script as you advised with the hard coded node address and hard coded node name. It still does not work on the problematic server (unable to load JSON) but if I run the exact same script on others with the problematic server's address and name hardcoded, it works :)
ReplyDelete
Replies
AnonymousThursday, March 19, 2015 7:52:00 PM
Hi

I have tried using this script but keep getting this error when I run it:

File "/etc/zabbix/elasticsearch.py", line 22
"status",
^
SyntaxError: invalid syntax
ReplyDelete
Replies
humitWednesday, June 24, 2015 4:56:00 PM
Hi all,

If you receive the error "Unable to load JSON data!" try running the command:

curl -XGET 'http://localhost:9200/_cluster/health'

In my case, the elasticsearch process was not listening on localhost (127.0.0.1) and I got the error:

curl: (7) couldn't connect to host

Then making a "sudo netstat -ntlp" revealed the IP address and port elasticsearch is listening to and replacing "localhost" with this IP address in the script solved the problem.
ReplyDelete
Replies
AnonymousTuesday, August 04, 2015 4:20:00 PM
zabbix_sender [6743]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 33; total: 33; seconds spent: 0.000243"}]
info from server: "processed: 0; failed: 33; total: 33; seconds spent: 0.000243"
sent: 33; skipped: 0; total: 33
ReplyDelete
Replies
AnonymousMonday, October 05, 2015 7:58:00 PM
Hello.
I performed the process but do not have the /etc/zabbix/zabbix_agendtd.d directory, just /etc/zabbix/agentd_conf.d.
Use version 2.4 of zabbix, but monitoring is not bringing any results.

Can you help me?
ReplyDelete
Replies
AnonymousMonday, October 05, 2015 7:58:00 PM
Hello.
I performed the process but do not have the /etc/zabbix/zabbix_agendtd.d directory, just /etc/zabbix/agentd_conf.d.
Use version 2.4 of zabbix, but monitoring is not bringing any results.

Can you help me?
ReplyDelete
Replies
David LangMonday, October 05, 2015 11:48:00 PM
one bug, it doesn't find the node if the case is difference between the hostname and the node name, inserting a couple of .lower() into the comparison fixes that

I also think it's a good idea to modify line 72 to be:

out += "{0} es.{1} {2}\n".format(os.uname()[1],t,s)
ReplyDelete
Replies
AlperFriday, November 13, 2015 11:38:00 AM
Hi there, I had elasticsearch 1.5 and this script was working like a charm but after update to elasticsearch 2.0 it stopped working and giving this error:

Traceback (most recent call last):
File "/etc/zabbix/elasticsearch.py", line 117, in
main()
File "/etc/zabbix/elasticsearch.py", line 94, in main
out += getKeys(node,traps2) #getting stats values
UnboundLocalError: local variable 'node' referenced before assignment

I don't see any difference in the output of http://localhost:9200/_nodes/_local/stats?all=true
Any idea?
ReplyDelete
Replies
UnknownWednesday, November 18, 2015 4:51:00 PM
Having a problem.

If i run the script directly as root or zabbix (with shell)

./elasticsearch.py `hostname`
: No such file or directory

or ./elasticsearch.py
: No such file or directory

If I run it as python ./elasticsearch.py it procudes the output fine.

suggestions
ReplyDelete
Replies
UnknownWednesday, November 25, 2015 2:58:00 PM
Please consider Elasticsearch 2.0 compatibility patch:
https://github.com/islepnev/zabbix/commit/f413717ce5c3a4b9ead0c8f417a2ba2a53
6b78d9
ReplyDelete
Replies
UnknownWednesday, November 25, 2015 3:04:00 PM
Please consider Elasticsearch 2.x compatibility patch:

https://github.com/islepnev/zabbix/commit/f413717ce5c3a4b9ead0c8f417a2ba2a536b78d9
ReplyDelete
Replies
UnknownSaturday, December 26, 2015 10:37:00 PM
Dear Sepa, please explain how do you make your script send traps periodically? It works fine once, but then nothing happens. Do you use cron to schedule it?
ReplyDelete
Replies
UnknownSunday, December 27, 2015 1:56:00 AM
I didn't get sorry,
I put
sudo zabbix_agentd -t 'es[jvm.uptime_in_millis]'
and get
elasticsearch% sudo zabbix_agentd -t 'es[jvm.uptime_in_millis]'
es[jvm.uptime_in_millis] [t|8242898]

and that's all, zabbix_agentd doesn't start with this parametr
ReplyDelete
Replies
UnknownSunday, December 27, 2015 3:08:00 AM
Well, finally I get the thing. I added jvm.uptime_in_millis param to trap list that is sended to server in .py file and set time period for this trap, which is actually Zabbix agent (active)
ReplyDelete
Replies
cookiesWednesday, May 18, 2016 8:34:00 PM
Hi, can you give me some more hints about this, please? I added to traps:

out += "- {0} {1}\n".format("es[jvm.uptime_in_millis]","5000")

but I still don't get how this is supposed to work.
ReplyDelete
Replies
AnonymousTuesday, August 23, 2016 3:36:00 PM
Hi,
i am struggeling with the elasticsearch.py.
Everything works well since month with RHEL6.4, zabbix 2.4 client; Java 1.7 and Elasticsearch 1.2.4
Now we upgrade and have RHEL6.7, zabbix Client 3.0.4, Java 1.8 and ES 2.1.2
In this new enviroment elasticsearch.py does not return any info when startet from the commandline. It does in the old enviroment. Do you have any idear what i can do to get this working?
ReplyDelete
Replies

Add comment