Graphs included:
- Shard's nodes status
- Indices tasks speed
- Indices tasks time spend
Installation:
- Save this as /etc/zabbix/zabbix_agentd.d/elasticsearch.conf
#Key jvm.uptime_in_millis used to trigger trap sending UserParameter=es[*],/etc/zabbix/elasticsearch.py $1
- And here is data-getter
/etc/zabbix/elasticsearch.py
- Then import template
template_app_elasticsearch.xml
- Browse JSON output of your server's
http://localhost:9200/_nodes/_local/stats?all=true
- Write path to value of interest using dot as separator, for example
indices.docs.count
- Create new counter in zabbix with
key name = es[path.you.found]
and type = zabbix_trap
- And here is difference from previous version: Please note that you also need to add this path to counter to traps2 section of elasticsearch.py file. Then execute elasticsearch.py without any parameters and debug of zabbix_sender should be written to console. In top section you should find your new counter key (if it isn't - key is not found or empty in JSON output), and in bottom section number of failed items should be zero (if it isn't - there is no such key for this server configured in zabbix web)
Hi, Can you please add the steps to monitor another host instead of localhost .
ReplyDeleteif i want to monitor like 10.0.1.10 host from my zabbix server 10.0.1.5, so what steps i need to do on remote host and zabbix server as well.
Just a usual monitoring of remote host by zabbix-agent. Something like this:
Delete- install zabbix-agent on remote host
- add this host to zabbix-server, add it to some "generic linux template" and check that basic data started to collect (zabbix-agent must return the same "hostname" as it is configured for this host on zabbix-server side)
- then proceed with expanding data with elasticsearch checks - import template from this article on zabbix-server side, and add it to your host
- on host side configure
/etc/zabbix/zabbix_agentd.d/elasticsearch.conf and
/etc/zabbix/elasticsearch.py
So, the template should be installed only once on server side, and then assigned to hosts needed. But config for agent-check and script to collect the data should be installed on each elasticsearch host (and they are sending data about localhost back to zabbix-server)
when I run script I get this error:
ReplyDelete12539:20150107:102815.007 item "nw-esclient-201.prod.pcln.com:es[jvm.uptime_in_millis]" became not supported: Received value [File "/usr/local/zabbix/bin/elasticsearch.py", line 22 "status", ^SyntaxError: invalid syntax] is not suitable for value type [Numeric (float)]
when I run script I get this error:
ReplyDelete12539:20150107:102815.007 item "nw-esclient-201.prod.pcln.com:es[jvm.uptime_in_millis]" became not supported: Received value [File "/usr/local/zabbix/bin/elasticsearch.py", line 22 "status", ^SyntaxError: invalid syntax] is not suitable for value type [Numeric (float)]
Maybe there is some EOL issue when you'd downloaded the script?
DeleteWhat is shown when you try:
# python elasticsearch.py
?
bash-4.1$ python elasticsearch.py
DeleteFile "elasticsearch.py", line 22
"status",
^
SyntaxError: invalid syntax
Dunno, syntax looks valid. At least for python2 i have installed at my home box (there is no elasticsearch)
Delete# wget https://github.com/sepich/zabbix/raw/master/elasticsearch.py &>/dev/null
# python elasticsearch.py
Unable to load JSON data!
So, no any syntax errors. What's your OS and python?
Thanks for helping out, OS=Red Hat Enterprise Linux Server release 6.5
Deletewith Python 2.6.6
Hi Tom,
Deleteadd None value to all traps1 and traps2 keys
ie (line 22) "status": None,
Worked for me on same environment
Hi Tom,
Deleteadd None value to all keys in traps1 and traps2 dictionaries
ie. (line 22) "status": None,
Worked for me on same environment
Thanks Federico, that helped:
ReplyDeletebash-4.1$ ./elasticsearch.py jvm.uptime_in_millis
10116541216
But this errors:
bash-4.1$ ./elasticsearch.py jvm_heap_p_used
zabbix_sender [18783]: Warning: [line 1] '-' encountered as 'Hostname', but no default hostname was specified
Sending failed.
I hardcoded my hostname
Deleteline 71
out += "YOURHOSTNAMEHERE es.{0} {1}\n".format(t,s)
Hi,
ReplyDeleteCan you help me:
Traceback (most recent call last):
File "./elasticsearch.py", line 70, in
main()
File "./elasticsearch.py", line 64, in main
stats=stats[c.pop(0)]
UnboundLocalError: local variable 'stats' referenced before assignment
There is no such code (stats=stats[c.pop(0)]) anywhere in file:
Deletehttps://github.com/sepich/zabbix/blob/master/elasticsearch.py
Maybe you are trying with some older version?
Could you try with the latest?
Hi There,
ReplyDeleteThis is a great solution, thank you for sharing. I have a small problem with it. elasticsearch.py script runs OK on my other nodes but on my master node, it returns "Unable to load JSON data!" error. I don't know python therefore cannot really figure out what the code does but I assume something fails here:
for node_id in all['nodes']:
if all['nodes'][node_id]['host'].startswith(os.uname()[1]):
node = all['nodes'][node_id]
if len(sys.argv) == 1:
print "node found"
except:
print "Unable to load JSON data!"
sys.exit(1)
Any ideas?
Looks like cluster node name differs from hostname on that server. Tell me what are they and I'll think how script should be modified to handle this
ReplyDeleteI don't know if you got the reply, if not, let me know and I'll send it again.
DeleteHmm, that's strange :) I have the same naming convention on all nodes.
ReplyDeletehostnames: eslog001.abc.local to eslog005.abc.local (first one is the master and the problem is there)
Node names: ES_ONE, ES_TWO, ES_THREE, ES_FOUR, ES_FIVE
In this case it should not work on other nodes too ;)
Deleteif all['nodes'][node_id]['host'].startswith(os.uname()[1]):
Basically what this code does - it search all cluster nodes looking for that which starts with current hostname. So, for eslog001.abc.local hostname would be eslog001.
If your cluster nodes names are not equal to hostnames - you can just hardcode it in script like this:
if all['nodes'][node_id]['host'].startswith('ES_ONE'):
This is really strange. I have created a script as you advised with the hard coded node address and hard coded node name. It still does not work on the problematic server (unable to load JSON) but if I run the exact same script on others with the problematic server's address and name hardcoded, it works :)
ReplyDeleteActually python scripts are not so hard in debugging
DeleteJust open python prompt:
# python
And then copy-paste everything from script which is above
def main():
line (actually only imports and getKeys needed here). Then copy-paste beginning of main():
f = requests.get("http://localhost:9200/_cluster/health")
health = f.json()
f = requests.get("http://localhost:9200/_nodes/_local/stats?all=true")
all = f.json()
Now we are coming to problem place. Let's check what nodes are exist in health report:
>>> for node_id in all['nodes']: print all['nodes'][node_id]['host']
there should be all nodes names printed out. And we looking here for current server, with this name:
>>> print os.uname()[1]
Do you see some node in previous output with name starting with exact same characters?
Oh, I've remember that error was "Unable to load JSON data!", so it is exception during one of those commands execution. When you did copy-paste lines from script - where do you saw errors?
Hi
ReplyDeleteI have tried using this script but keep getting this error when I run it:
File "/etc/zabbix/elasticsearch.py", line 22
"status",
^
SyntaxError: invalid syntax
Hi all,
ReplyDeleteIf you receive the error "Unable to load JSON data!" try running the command:
curl -XGET 'http://localhost:9200/_cluster/health'
In my case, the elasticsearch process was not listening on localhost (127.0.0.1) and I got the error:
curl: (7) couldn't connect to host
Then making a "sudo netstat -ntlp" revealed the IP address and port elasticsearch is listening to and replacing "localhost" with this IP address in the script solved the problem.
zabbix_sender [6743]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 33; total: 33; seconds spent: 0.000243"}]
ReplyDeleteinfo from server: "processed: 0; failed: 33; total: 33; seconds spent: 0.000243"
sent: 33; skipped: 0; total: 33
incorrect settings zabbiks-agent?
DeleteHello.
ReplyDeleteI performed the process but do not have the /etc/zabbix/zabbix_agendtd.d directory, just /etc/zabbix/agentd_conf.d.
Use version 2.4 of zabbix, but monitoring is not bringing any results.
Can you help me?
Sure, you need to use '/etc/zabbix/agentd_conf.d' folder if it is in your /etc/zabbix/zabbix_agent.conf 'Include' statement
DeleteTks for your fast reply.
DeleteLook that:
/etc/zabbix/zabbix_agentd.conf, is all right.
### Option: Include
# You may include individual files or all files in a directory in the configuration file.
# Installing Zabbix will create include directory in /etc/zabbix, unless modified during the compile time.
#
# Mandatory: no
# Default:
# Include=
# Include=/etc/zabbix/zabbix_agentd.userparams.conf
# Include=/etc/zabbix/zabbix_agentd.conf.d/
Include=/etc/zabbix/zabbix_agentd.conf.d/
I also changed the /etc/zabbix/elasticsearch.py
cfg = '/etc/zabbix/zabbix_agentd.conf.d' # path to zabbix-agent config
Nothing ocurrs
Did you tried to run:
Delete# /etc/zabbix/elasticsearch.py
Did you tried to run:
# zabbix_agentd -k 'es[indices.docs.count]'
What are the results?
Runnig /etc/zabbix/elasticsearch.py, nothing occurs.
DeleteThe other command return these result:
aryel@backend-elasticsearch-dev-data-igm-pri-instance-l7d5:/usr/sbin$ ./zabbix_agentd -t 'es[indices.docs.count]'
es[indices.docs.count] [m|ZBX_NOTSUPPORTED]
Retrying the command:
Delete- es.status 503
- es.indices.search.fetch_total 0
- es.indices.merges.total 11950
- es.indices.refresh.total_time_in_millis 1394850
- es.indices.indexing.index_time_in_millis 372322
- es.os.mem.actual_used_in_bytes 3064516608
- es.indices.merges.total_time_in_millis 3231557
- es.indices.flush.total 253
- es.indices.indexing.index_total 328279
- es.indices.indexing.delete_total 0
- es.indices.docs.deleted 0
- es.jvm.mem.heap_committed_in_bytes 3203792896
- es.indices.indexing.delete_time_in_millis 0
- es.indices.get.missing_total 0
- es.indices.search.query_time_in_millis 0
- es.indices.get.exists_time_in_millis 0
- es.indices.search.fetch_time_in_millis 0
- es.indices.search.query_total 0
- es.indices.docs.count 573692
- es.jvm.mem.heap_used_in_bytes 756227784
- es.indices.refresh.total 108393
- es.indices.store.throttle_time_in_millis 25133
- es.indices.warmer.total_time_in_millis 1250
- es.indices.get.exists_total 0
- es.indices.get.missing_time_in_millis 0
- es.indices.flush.total_time_in_millis 15132
- es.os.mem.actual_free_in_bytes 4799635456
- es.indices.warmer.total 5491
ZBX_NOTSUPPORTED - means zabbix agent does not know what 'es[]' item key mean. Agent should be restarted after you put config file in place.
Delete- es.indices.merges.total 11950 ... - means data is collecting ok.
I reinstall the agent, but anything still wrong
DeleteCould you help me?
aryel@backend-elasticsearch-dev-data-igm-pri-instance-l7d5:/etc/zabbix$ ./elasticsearch.py
node found
- es.status yellow
- es.unassigned_shards 0
- es.active_primary_shards 10
- es.initializing_shards 1
- es.active_shards 19
- es.relocating_shards 0
- es.indices.search.fetch_total 0
- es.indices.merges.total 13889
- es.indices.refresh.total_time_in_millis 1618108
- es.indices.indexing.index_time_in_millis 441191
- es.os.mem.actual_used_in_bytes 3084935168
- es.indices.merges.total_time_in_millis 3596862
- es.indices.flush.total 311
- es.indices.indexing.index_total 395003
- es.indices.indexing.delete_total 0
- es.indices.docs.deleted 0
- es.jvm.mem.heap_committed_in_bytes 3203792896
- es.indices.indexing.delete_time_in_millis 0
- es.indices.get.missing_total 0
- es.indices.search.query_time_in_millis 0
- es.indices.get.exists_time_in_millis 0
- es.indices.search.fetch_time_in_millis 0
- es.indices.search.query_total 0
- es.indices.docs.count 401643
- es.jvm.mem.heap_used_in_bytes 1063793304
- es.indices.refresh.total 124231
- es.indices.store.throttle_time_in_millis 94
- es.indices.warmer.total_time_in_millis 2
- es.indices.get.exists_total 0
- es.indices.get.missing_time_in_millis 0
- es.indices.flush.total_time_in_millis 17301
- es.os.mem.actual_free_in_bytes 4779216896
- es.indices.warmer.total 19
zabbix_sender [2106]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 33; total: 33; seconds
spent: 0.000331"}]
info from server: "processed: 0; failed: 33; total: 33; seconds spent: 0.000331"
sent: 33; skipped: 0; total: 33
Hello.
ReplyDeleteI performed the process but do not have the /etc/zabbix/zabbix_agendtd.d directory, just /etc/zabbix/agentd_conf.d.
Use version 2.4 of zabbix, but monitoring is not bringing any results.
Can you help me?
one bug, it doesn't find the node if the case is difference between the hostname and the node name, inserting a couple of .lower() into the comparison fixes that
ReplyDeleteI also think it's a good idea to modify line 72 to be:
out += "{0} es.{1} {2}\n".format(os.uname()[1],t,s)
Hi there, I had elasticsearch 1.5 and this script was working like a charm but after update to elasticsearch 2.0 it stopped working and giving this error:
ReplyDeleteTraceback (most recent call last):
File "/etc/zabbix/elasticsearch.py", line 117, in
main()
File "/etc/zabbix/elasticsearch.py", line 94, in main
out += getKeys(node,traps2) #getting stats values
UnboundLocalError: local variable 'node' referenced before assignment
I don't see any difference in the output of http://localhost:9200/_nodes/_local/stats?all=true
Any idea?
I've found the issue. Previously I was using hostnames now I am using IP addresses. I forgot to change this in the script. :)
DeleteHaving a problem.
ReplyDeleteIf i run the script directly as root or zabbix (with shell)
./elasticsearch.py `hostname`
: No such file or directory
or ./elasticsearch.py
: No such file or directory
If I run it as python ./elasticsearch.py it procudes the output fine.
suggestions
if I run it manually just once the graphs build without issue, so added a cronjob to give it a kick
DeleteI still dont understand exactly why this is.
Please consider Elasticsearch 2.0 compatibility patch:
ReplyDeletehttps://github.com/islepnev/zabbix/commit/f413717ce5c3a4b9ead0c8f417a2ba2a53
6b78d9
Thank you
DeletePlease consider Elasticsearch 2.x compatibility patch:
ReplyDeletehttps://github.com/islepnev/zabbix/commit/f413717ce5c3a4b9ead0c8f417a2ba2a536b78d9
Dear Sepa, please explain how do you make your script send traps periodically? It works fine once, but then nothing happens. Do you use cron to schedule it?
ReplyDeleteThere is one 'active check' which is sending traps when it is requested by agent. You see there is even note about it in elasticsearch.conf:
Delete#Key jvm.uptime_in_millis used to trigger trap sending
So you can tune time period for this active check to also change period when traps are send.
I didn't get sorry,
ReplyDeleteI put
sudo zabbix_agentd -t 'es[jvm.uptime_in_millis]'
and get
elasticsearch% sudo zabbix_agentd -t 'es[jvm.uptime_in_millis]'
es[jvm.uptime_in_millis] [t|8242898]
and that's all, zabbix_agentd doesn't start with this parametr
Well, finally I get the thing. I added jvm.uptime_in_millis param to trap list that is sended to server in .py file and set time period for this trap, which is actually Zabbix agent (active)
ReplyDeleteSo i have tried, lets call it option 1
Delete"jvm.mem.heap_used_in_bytes",
"os.mem.actual_free_in_bytes",
"os.mem.actual_used_in_bytes",
"jvm.uptime_in_millis",
and this option 2
"jvm.mem.heap_used_in_bytes",
"os.mem.actual_free_in_bytes",
"os.mem.actual_used_in_bytes",
"es[jvm.uptime_in_millis]",
Changed the key on the Zabbix server to match both the above.
Get this if i use the first option i try
zabbix_sender [21571]: DEBUG: answer [{"response":"success","info":"processed: 33; failed: 1; total: 34; seconds spent: 0.000328"}]
info from server: "processed: 33; failed: 1; total: 34; seconds spent: 0.000328"
sent: 34; skipped: 0; total: 34
I get this result on option 2
zabbix_sender [19248]: DEBUG: answer [{"response":"success","info":"processed: 33; failed: 0; total: 33; seconds spent: 0.000305"}]
info from server: "processed: 33; failed: 0; total: 33; seconds spent: 0.000305"
sent: 33; skipped: 0; total: 33
for the life of me i cant get it to update the trap stats automatically, does a great job if I run the script manually.
is there a bug somewhere?
Hi, can you give me some more hints about this, please? I added to traps:
ReplyDeleteout += "- {0} {1}\n".format("es[jvm.uptime_in_millis]","5000")
but I still don't get how this is supposed to work.
Hi,
ReplyDeletei am struggeling with the elasticsearch.py.
Everything works well since month with RHEL6.4, zabbix 2.4 client; Java 1.7 and Elasticsearch 1.2.4
Now we upgrade and have RHEL6.7, zabbix Client 3.0.4, Java 1.8 and ES 2.1.2
In this new enviroment elasticsearch.py does not return any info when startet from the commandline. It does in the old enviroment. Do you have any idear what i can do to get this working?