blog.sepa.spb.ru: linux

Showing posts with label linux. Show all posts

2016/01/03

Speed up Zabbix Graphs with Nginx caching

After installing zabbixGrapher or implementing Zabbix graphs improvements patch you might face with an issue of slow image loading on graphs page which contains 24 pics at once. And this problem could get worse depending on how much online users you have in Zabbix. In our case solution was to cache images for 1 minute, as we have usual Item interval=60sec. This will help when multiple users looking at the Graphs for same Host (happens when it appears in Monitoring). Also, by default Users in Zabbix have setting to update graphs each 30sec, so caching for 60sec would reduce load twice.
This is how usual URL to graph image looks:

chart2.php?graphid=62014&screenid=1&width=600&height=200&legend=1&updateProfile=1&profileIdx=web.screens&profileIdx2=62014&period=604800&stime=20161226030400&sid=f3df43d8c3f401ec

Nginx cache is fast key-value store, so we need to decide on string Key based on URL to uniquely identify each image.

First issue is that same parameters in URL could be at any place, thus making different string Keys pointing to the same image. So, we need to always store parameters in the same order in the Key.
Another thing is that we do not need all the parameters. For example for different users 'sid' would have different values, but we want to show same image from cache to all the users.

This will leave us with such stripped down URL:
chart2.php?period=604800&stime=20161226030400&width=600&height=200&graphid=62014

For ad-hoc graphs URL would contain two more parameters and point to chart.php:

chart.php?period=604800&stime=20161226030400&width=600&height=200&type=0&itemids%5B0%5D=34843&itemids%5B1%5D=34844&itemids%5B2%5D=34845

And here is resulting nginx configuration for such case:

fastcgi_cache_path /tmp/cache levels=1:2 keys_zone=cache:10m max_size=1G;
upstream fpm {
  server unix:/var/run/php5-fpm.sock;
  server another.fpm.servers:9000;
}
server {
  location ~ \.php$ {
    include snippets/fastcgi-php.conf;
    fastcgi_pass unix:/var/run/php5-fpm.sock;

    location ~ chart2?\.php {
      fastcgi_pass fpm;

      if ($request_uri ~ (period=[0-9]+)) { set $period $1; }
      if ($request_uri ~ (stime=[0-9]+)) { set $stime $1; }
      if ($request_uri ~ (width=[0-9]+)) { set $width $1; }
      if ($request_uri ~ (height=[0-9]+)) { set $height $1; }
      if ($request_uri ~ (graphid=[0-9]+)) { set $graphid $1; }
      if ($request_uri ~ (itemids.*?)&(?!itemids)) { set $itemids $1; }
      if ($request_uri ~ (type=[0-9]+)) { set $type $1; }

      expires 2m;
      set $xkey $period$stime$width$height$graphid$type$itemids;
      add_header "X-key" $xkey;
      fastcgi_cache_key  $xkey;
      fastcgi_ignore_headers Cache-Control Expires Set-Cookie;
      fastcgi_cache cache;
      fastcgi_cache_valid 2m;
      fastcgi_cache_lock on;
    }
  }
}

Main thing is in location 'chart2?\.php' which is regex corresponding to both chart2.php and chart.php. We strip $request_uri to parts we care of, and setting variables to values of those parts.
Then we collect all variables in predefined order, to make consistent Key for same image, this will be stored in $xkey variable.
Then we also adding custom header "X-key" for debugging. It is shown in server response:

We also setting 'Expires' to 2 minutes, and ignoring all Cache-Control headers sent by php (as they are disabling client-side caching setting Expires to year ago)
There is no need to cache graphs for more than 2min, as each image has 'start time' and 'period'. Thus having Key updated each minute, we do not need to store old outdated pics for longer time.

Cache should be working now, you should see folder /tmp/cache increasing in size. But there is no any speedup of page load at all. Having page with all pics loaded you press F5 and they do load slowly again. But you've expected they would be quickly loaded from cache as minute is not passed yet. Answer is javascript Zoom Timeline, which is generate images url based on current time in second. So, each time you refresh the page - stime=20161226030423 value is also changing. As we do not want to show each second images, and only want to show per-minute ones - we also need to fix js to floor values like 20161226030423 to 20161226030400. This is done in gtlc.js

+++ ./js/gtlc.js        2015-11-22 13:11:02.306277281 -0800
@@ -181,6 +182,8 @@
                        period = this.timeline.period(),
                        stime = new CDate((this.timeline.usertime() - this.timeline.period()) * 1000).getZBXDate();

+                       stime = stime - stime % 60;
+
                // image
                var imgUrl = new Curl(obj.src);
                imgUrl.setArgument('period', period);

If you are also using "Zabbix graphs improvements patch" - you might also want to fix generating php side too:

+++ ./include/classes/screens/CScreenGraph.php  2015-11-22 13:02:29.014493480 -0800
@@ -161,7 +161,7 @@
                                .'&height='.$this->screenitem['height'].'&legend='.$legend.$this->getProfileUrlParams();
                        $timeControlData['src'] .= ($this->mode == SCREEN_MODE_EDIT)
                                ? '&period=3600&stime='.date(TIMESTAMP_FORMAT, time())
-                               : '&period='.$this->timeline['period'].'&stime='.$this->timeline['stimeNow'];
+                               : '&period='.$this->timeline['period'].'&stime='.($this->timeline['stimeNow'] - $this->timeline['stimeNow'] % 100);
                }

                // output

Check zabbixGrapher again by moving through pages back and forth, or selecting and deselecting the same Host - and images should appear immediately.

2015/09/23

AWS ELB monitoring by Zabbix using CloudWatch, LLD and traps

It is a short note on getting monitoring data for Elastic Load Balancer to your Zabbix installation.
All monitoring in AWS including ELB is handled and exposed by CloudWatch service. Free tier include 5-minute frequency data gathering. Which then could be increased to 1-minute for money. For ELB we can get such counters from CloudWatch:

BackendConnectionErrors
HTTPCode_Backend_2XX
HTTPCode_Backend_3XX
HTTPCode_Backend_4XX
HTTPCode_ELB_5XX
HealthyHostCount
Latency
RequestCount
SurgeQueueLength
UnHealthyHostCount

Read more details on each item in the docs. One thing to note, that each counter could be accessed as Average, Min, Max, Sum and Count. So, for RequestCount Min and Max would be always 1 but Sum would be equal to Count and mean number or request per interval (1min or 5min). In other case Sum would not have meaning for HealthyHostCount but you would be more interested in Average. That complicate things a little comparing to Zabbix.
But there is one more thing (c) - CloudWatch do store items only when events happens. So, if you have small requests numbers on some ELB you could face with SurgeQueueLength stuck at 1k or something. Which is not meaningful, because it happened once, an hour ago, and there just were no much requests from that time.

Passing this data to Zabbix directly you would end up with line at 900 connecting all the dots. Which is not true, line should be at 0 with intermittent spikes to 900.
Ok, at least we know how to get current data, and we will just return 0 to zabbix when there is no value collected by CloudWatch with current timestamp. I used python and boto and get results pretty easy. Also, there are multiple cloudwatch-to-zabbix scripts around. But they all works as zabbix agent checks (passive or active). So, for example to get those 10 counters for one ELB each minute, zabbix would fire the script 10 times/min, and each time script would connect to AWS to get the data. But API query to get the data is the same, even more - you can get up to 1440 points by one query. That's why it's better to make this monitoring to use zabbix traps. This way zabbix would do only one query to agent per minute, and it would get all 10 counters in one call.
Usually ELB stats are not host bound, so this script should be not 'zabbix agent extension', but 'external check' on server/proxy. To use it, you would create dummy server in zabbix (with pretty name like "ELB"), and attach template to it.

Installation

1. Place script from:
https://github.com/sepich/zabbix/raw/master/cloudwatch.py
to your 'external scripts' directory on zabbix server or proxy. You could get the path of this folder in zabbix_proxy.conf looking for 'ExternalScripts' value. (You might need to do 'apt-get install python-boto' if you don't have it yet)
2. Fix script with your AWS key.

aws_key='INSERT KEY'                    # AWS API key id
aws_secret='INSERT SECRET'              # AWS API key

If you do not have API key yet - you could read on how to generate it here. Due to it is stored in script in clear text you might wish to at least limit script access by chmod/chown. Better way would be if you have zabbix proxy EC2 VM - just grant necessary API rights to it directly without using key at all.
3. Check path to zabbix_sender and zabbix-agent config:

sender = '/usr/bin/zabbix_sender'       # path zabbix_sender
cfg = '/etc/zabbix/zabbix_agentd.conf'  # path to zabbix-agent config

Check that zabbix_sender is installed, and config has valid zabbix-server specified. Trap data would be send there.
4. Open zabbix web interface and create dummy server named, say "ELB". Set corresponding zabbix-proxy for it, which has our script in externalscripts folder.
5. Import template from:
https://github.com/sepich/zabbix/raw/master/templates/template_elb.xml
and assign it to created dummy server. Go to discovery and fix refresh time for the only active check prototype (everything else are traps) to 1min or 5 min depending on if you use detailed CloudWatch checks or not. (Template has 1min set as we are using detailed checks). Also, check filter tab for discovery, as we are filtering ELBs having 'test' in their name.
6. Discovery should create items for all found ELBs.
ELB names are passed through Filter, which is configured on Filter Tab of Discovery rule

In this case it is pointing to Global Regex named "ELB discovery", which is configured in Administration -> General -> Regular Expressions

This will skip all ELBs which name contains 'test'. Configure to your needs or just delete Filter.

Bonus: Importing 2-week data

CloudWatch stores all the collected items for 2 weeks timeframe. Each item has corresponding timestamp. So, it is possible to get all the archive data and put it to zabbix, as zabbix_sender also support providing timestamps along with values. Only issue is as described above, when there were lack of events and items would be unmeaningful, without any drops to zero.
Before importing, check that all your ELBs get discovered in zabbix, and trap items are created. Then go to server with script and run for each ELB command like this:

cloudwatch.py -e NAME -s ELB -i 1209600 -v | tail
info from server: "processed: 250; failed: 0; total: 250; seconds spent: 0.001387"
info from server: "processed: 250; failed: 0; total: 250; seconds spent: 0.001380"
info from server: "processed: 250; failed: 0; total: 250; seconds spent: 0.001391"
info from server: "processed: 250; failed: 0; total: 250; seconds spent: 0.001383"
info from server: "processed: 250; failed: 0; total: 250; seconds spent: 0.001403"
info from server: "processed: 250; failed: 0; total: 250; seconds spent: 0.001389"
info from server: "processed: 189; failed: 0; total: 189; seconds spent: 0.001050"
sent: 102939; skipped: 0; total: 102939

NAME - is your ELB name
ELB - name of dummy server in zabbix with trap items
1209600 - number of seconds in 2 weeks
This process could take up to 5min to run, and should end up with no errors. Wait 5min more and take a look at zabbix graph history for this ELB - you should see data for 2 weeks ago from now.

Usage

Running script with no arguments or '-h' would display usage help :

cloudwatch.py --help
usage: cloudwatch.py [-h] [-e NAME] [-i N] [-s NAME] [-r NAME] [-d {elb}] [-v]

Zabbix CloudWatch client

optional arguments:
  -h, --help            show this help message and exit
  -e NAME, --elb NAME   ELB name
  -i N, --interval N    Interval to get data back (Default: 60)
  -s NAME, --srv NAME   Hostname in zabbix to receive traps
  -r NAME, --region NAME
                        AWS region (Default: eu-west-1)
  -d {elb}, --discover {elb}
                        Discover items (Only discover for ELB supported now)
  -v, --verbose         Print debug info

Appending '-v' argument would display human output. For example this is raw data for zabbix_sender and result of sending:

cloudwatch.py -e NAME -v
ELB cw[NAME,BackendConnectionErrors] 1442923904 0.000000
ELB cw[NAME,HTTPCode_Backend_2XX] 1442923904 0.000000
ELB cw[NAME,HTTPCode_Backend_3XX] 1442923904 0.000000
ELB cw[NAME,HTTPCode_Backend_4XX] 1442923904 0.000000
ELB cw[NAME,HTTPCode_ELB_5XX] 1442923904 0.000000
ELB cw[NAME,HealthyHostCount] 1442923800 2.000000
ELB cw[NAME,Latency] 1442923800 0.000012
ELB cw[NAME,RequestCount] 1442923800 57.000000
ELB cw[NAME,SurgeQueueLength] 1442923800 1.000000
ELB cw[NAME,UnHealthyHostCount] 1442923800 0.000000
info from server: "processed: 10; failed: 0; total: 10; seconds spent: 0.000095"
sent: 10; skipped: 0; total: 10

To check json discovery data:

cloudwatch.py -d elb

2015/08/30

Zabbix graphs improvements patch

Update: You'd better check out zabbixGrapher

Here is the cumulative patch to fix some Zabbix graphs viewing issues. Ideas are not new, a lot of zabbix users complains on current out-of-the-box implementation:

ZBXNEXT-1120 - Enable viewing a graph for all hosts in a given group
ZBXNEXT-75 - Add a "show all" option for viewing all graphs for a host on one page
ZBXNEXT-1262 - Nested host groups
Minor graph appearance fix

Full patch is for Zabbix 2.4.3. You can open it on github and read below what each change do:

include/views/monitoring.charts.php (Javascript in the beginning)

This adds groups filter. Issue is when you have a lot of groups you'd become tired to scroll them. (We have hosts automatically registering to Zabbix and attached to group). For example in this case groups "EXRMF BC", "EXRMF CO", "EXRMF DC3" etc. are merged to one group "EXRMF >". When you select such group another select appears on the right side allowing to specify exact group.

This only happens when user allowed to view more than 50 groups, tweak this line if you need to change it:

if(jQuery('#groupid option').length>50){

include/views/monitoring.charts.php (the rest PHP code)

This implements both ZBXNEXT-1120 and ZBXNEXT-75. So, now you can select host and do not specify graph to view all its graphs on one page. Or select graph to view and do not specify host (or even a group) to view this graph for multiple hosts.

As it is possible to have a lot of graphs attached to one server, or a lot of servers having the same graph (eth0 traffic) - paging is used here. Tweak this line to determine how many graphs should be displayed per page:

CWebUser::$data['rows_per_page'] = 20;

js/class.csuggest.js

This change is for search field. You start typing servers and got list of suggestions. Pressing Enter previously just selects server from list filling in search field. You have to press Search button to do action. Now action is done automatically.

include/defines.inc.php

This changing font to much smaller one "Calibri". You can take .ttf from Windows and place to /usr/share/zabbix/fonts/

The rest of files

Minor changes for single graph appearance to make it more clean and simplier when multiple graphs are displayed on one page. Example of single graph after change:

Also, you might want to set theme graph background to white. Unfortunately, I do not know how to do it from Web Interface, so here are DB queries:

update graph_theme set backgroundcolor='FFFFFF' where graphthemeid='1';
update graph_theme set graphbordercolor='FFFFFF' where graphthemeid='1';

This patch is not depends but meant to be applied after ZBXNEXT-599 "Logarithmic scale for Y-axis in graphs" like this:

wget https://support.zabbix.com/secure/attachment/35716/logarithmic-graphs-zabbix-2.4.5.patch
wget https://github.com/sepich/zabbix/raw/master/patches/graphs.patch
cd /usr/share/zabbix/
patch -p 1 -i ~/logarithmic-graphs-zabbix-2.4.5.patch
patch -p 1 -i ~/graphs.patch

2014/12/06

ElasticSearch internals monitoring by Zabbix (v2 traps)

Here is more resource oriented version of ElasticSearch monitoring from previous article with using zabbix-traps. Also, it comes with very basic template, which was so asked in comments:

Graphs included:

Shard's nodes status
Indices tasks speed
Indices tasks time spend

RabbitMQ internals monitoring by Zabbix

Continuation of extending zabbix-agent to monitor internals of applications. Now it's a RabbitMQ turn:

What's supported:

File descriptors, Memory, Sockets watermarks monitoring
Low level discovery of vhosts/queues
Monitoring for messages, unack, consumers per queue
Triggers for important counters
Data sent in chunks, not one by one, using zabbix traps

Network socket state statistics monitoring by Zabbix

It's strange that zabbix-agent lacks for information about network socket states. At least it would be nice to monitor number of ESTAB, TIME_WAIT and CLOSE_WAIT connections.
Good that we can extend zabbix-agent - so I made this:

putty-nd vs SuperPuTTY

I've been using putty-nd for a long time. Starting from putty_nd2.0 it started to support chrome-like tabs and development was going quite rapidly. Lu Dong (author) was always responding to emails and bugs were fixed fast. But unfortunately this stopped at Feb 2014 with few annoying bugs left:

When some tab connection drops, it can randomly freeze some other tab, so you need to reconnect both of them
Tab names are right aligned. So, when you have a lot of tabs opened, their width is short and you see only endings of names like '...ain.local' instead of 'server1.doma..'
When you click 'open new session' button and starting to type search query - it always skips first letter
And the most unfortunate was that putty-nd sources weren't available. Latest one I was able to find was v6.0_nd1.11 back from 2011.

Nevertheless of those issues I still was using putty-nd. Because other clients like MobaXterm, Xshell, MTPuTTY, mRemoteNG were even more inconvenient. Here is what I liked in putty-nd so much:

When there are >1000 sessions configured - you will never click by mouse in tree-like menu to open a new session. Preferably to have quick live search bind to hotkey
Having only hostname in clipboard and no such session configured, open new one based on some predefined settings in couple of keypress
When session was dropped, restart it without touching mouse (like pressing Enter in it)

SuperPuTTY has almost 2 bullets from 3 above. And most importantly it is opensource. So, next time I got mad from putty-nd frozen tabs I'd decided to move to SuperPuTTY. A little patching and it became a usable client for me ;)

Here is what was done:

1. Open Session dialog

Now search field always stay focused, pressing Up/Down you changing selection in table. Second column shows only folder name of session in tree. To search for all sessions in some folder start searching with '/' (as in example). Searching for hostname was changed to be matched from beginning, to search for any part of hostname - prepend search with '%'.
For example to find connection 'i.sepa.spb.ru' from screenshot, one could search for '%sepa'

2. Added detection of dropped connection. For such tabs icon will be changed (to icon from putty-nd ;)

For those first two tabs context menu will be also reduced. When you switch to such tab and press Enter in console, session will try to reconnect.

For other changes see commit history: github.com/sepich/superputty/commits/master
Download precompiled binaries here: github.com/sepich/superputty/releases

Main patches were submitted back to SuperPuTTY community - hope some of them would be merged upstream.

2014/02/15

ElasticSearch internals monitoring by Zabbix

NOTE: New version of this article with use of zabbix_traps is here

There is quite a lot of Zabbix monitoring agent extensions for ElasticSearch monitoring. But they are limited and provide just some predefined counters. What if you need to collect internal data?

Debian Wheezy and Brocade 1010/1020/1007 10Gbps CNA

There are two ways to make Brocade 1010/1020/1007 10Gbps CNA working in Debian Wheezy:
1. Right way: Go to official Brocade site, download 3Gb(!) iso image with binaries for kernel-2.6 (!) and sources. Compile sources for Debian's kernel-3.2.
2. Quick way: It's not right but just works, if you need quickly enable networking with new kernel.

Precompiled hd-Idle armhf deb-package for Debian

There is still no hd-idle package in standard Debian repository. Even in testing branch. WTF!
You can get sources at official site but you'll need to install a lot of dependencies to compile it.
Or here is already compiled version:
Download hd-idle_1.04_armhf.deb

Installation:

# cd /tmp
# wget http://i.sepa.spb.ru/_/mele/hd-idle_1.04_armhf.deb
# dpkg -i hd-idle_1.04_armhf.deb
# nano /etc/default/hd-idle

Installing HDD inside of Mele A1000/2000

I have a Mele A2000G and use it as home media server (with Debian installed instead of Android). And HDD is the main thing of this set. But i don't have an HDD case (I heard that it goes with A1000 model) And it looks not so accurate to place this box near a TV with a raw HDD without cover installed on top of it. Let's check if it possible to install HDD right inside Mele A2000 box:

D-Link DIR-620 OpenWRT installation

I've use OpenWRT for my D-Link DIR-620 A1 for almost a year now. Good news that OpenWRT community released already compiled beta versions (Latest for now is 12.09-rc1) So now it's no need to compile trunk sources to get firmware for ramips hardware. My custom image is getting old, so I decide to install latest community release.
This article is mostly for myself. Just want to leave some notes in case I'll need to update firmware again. So I'll make it very brief.

Asterisk sync with Active Directory

In my case, all users information is stored in AD (yes, we use Exchange GAL). It'll be great if asterisk get this information directly from AD without need in configuration editing.

Let's look what we have now:
AD user have fields 'telephoneNumber' - for internal extension number, 'fax' - for direct inward dialing number (DID) like this:

bash

Some bash hotkeys:

Ctrl + A  Go to the beginning of the line you are currently typing on
Ctrl + E  Go to the end of the line you are currently typing on
Ctrl + L  Clears the Screen, similar to the clear command
Ctrl + U  Clears the line before the cursor position. If you are at the end of the line, clears the entire line.
Ctrl + R  Let’s you search through previously used commands
Ctrl + C  Kill whatever you are running
Ctrl + D  Exit the current shell
Ctrl + Z  Puts whatever you are running into a suspended background process. "fg" restores it.
Ctrl + W  Delete the word before the cursor
Ctrl + K  Clear the line after the cursor
Ctrl + T  Swap the last two characters before the cursor
Esc + T  Swap the last two words before the cursor

Some notes about reverse history search. It'll iterative show last command for your search query. For next item press Ctrl-R again. To search forward press Ctrl-S. (On some terminals this bind to freeze, press Ctrl-Q to defreeze) More about history navigation here

Using screen:

Debian domain join

How to join Debian Squeeze to Active Directory Domain (winbind method).This makes possible to login as domain user, make samba shares with domain security, ntlm SSO to Apache sites, ntlm auth with Squid proxy.

Mail notifications by GMail from Debian

Описание настройки Debian Exim4 для рассылки почты через GMail apps for domain для получения уведомлений на почту из cron и скриптов.

blog.sepa.spb.ru