Category Archives: Server load

Apache/http monitoring: monitor http traffic in realtime using httptop

Server monitoring is a big part of running a solid web site.  As an admin, you must know what is going on your server.  One of the tools most Linux/Unix admins are used to is called “top”.  “top” by itself is a very powerful tool.  Here is a quick guide on how to read output from top:  introduction to load averages under top.  It just makes sense that somebody went and created httptop to monitor http traffic.

Install perl modules:

install Term::ReadKey
install File::Tail
install Time::HiRes

Now copy paste the script below and save it in a location and set +x attribute on it so you can execute it.  On my setup, I have the script under /usr/bin/httptop:

#!/usr/bin/perl -w
use Time::HiRes qw( time );
use File::Tail (  );
use Term::ReadKey;
use Getopt::Std;
use strict;
### Defaults you might be interested in adjusting.
my $Update = 2; # update every n secs
my $Backtrack = 250; # backtrack n lines on startup
my @Paths = qw(
%
/title/%/logs/access_log
/var/log/httpd/%/access_log
/usr/local/apache/logs/%/access_log
);
my $Log_Format = "combined";
my %Log_Fields = (
combined => [qw/ Host x x Time URI Response x Referer Client /],
vhost => [qw/ VHost Host x x Time URI Response x Referer Client /]
);
### Constants & other thingies. Nothing to see here. Move along.
my $Version = "0.4.1";
sub by_hits_per (  ) { $b->{Rate} <=> $a->{Rate} }
sub by_total (  ) { $b->{Total} <=> $a->{Total} }
sub by_age (  ) { $a->{Last} <=> $b->{Last} }
my $last_field = "Client";
my $index = "Host";
my $show_help = 0;
my $order = \&by_hits_per;
my $Help = "htlwufd?q";
my %Keys = (
h => [ "Order by hits/second" => sub { $order = \&by_hits_per } ],
t => [ "Order by total recorded hits" => sub { $order = \&by_total } ],
l => [ "Order by most recent hits" => sub { $order = \&by_age } ],
w => [ "Show remote host" => sub { $index = "Host" } ],
u => [ "Show requested URI" => sub { $index = "URI" } ],
f => [ "Show referring URL" => sub { $index = "Referer" } ],
d => [ "Show referring domain" => sub { $index = "Domain" } ],
'?' => [ "Help (this thing here)" => sub { $show_help++ } ],
q => [ "Quit" => sub { exit } ]
);
my @Display_Fields = qw/ Host Date URI Response Client Referer Domain /;
my @Record_Fields = qw/ Host URI Referer Domain /;
my $Max_Index_Width = 50;
my $Initial_TTL = 50;
my @Months = qw/ Jan Feb Mar Apr May Jun Jul Aug Sep Nov Dec /;
my %Term = (
HOME => "\033[H",
CLS => "\033[2J",
START_TITLE => "\033]0;", # for xterms etc.
END_TITLE => "\007",
START_RV => "\033[7m",
END_RV => "\033[m"
);
my ( %hist, %opt, $spec );
$SIG{INT} = sub { exit };
END { ReadMode 0 };
### Subs.
sub refresh_output
{
my ( $cols, $rows ) = GetTerminalSize;
my $show = $rows - 3;
my $count = $show;
my $now = (shift || time);
for my $type ( values %hist ) {
for my $peer ( values %$type ) {
# if ( --$peer->{_Ttl} > 0 ) {
my $delta = $now - $peer->{Start};
if ( $delta >= 1 ) {
$peer->{ Rate } = $peer->{ Total } / $delta;
} else {
$peer->{ Rate } = 0
}
$peer->{ Last } = int( $now - $peer->{ Date } );
# } else {
# delete $type->{$peer}
# }
}
}
$count = scalar( values %{$hist{$index}} ) - 1 if $show >= scalar values %{$hist{$index}};
my @list = ( sort $order values %{$hist{$index}} )[ 0 .. $count ];
my $first = 0;
$first = ( $first <= $_ ? $_ + 1 : $first ) for map { $_ ? length($_->{$index}) : 0 } @list;
$first = $Max_Index_Width if $Max_Index_Width < $first;
print $Term{START_TITLE}, "Monitoring $spec at: ", scalar localtime, $Term{END_TITLE} if $ENV{TERM} eq "xterm"; # UGLY!!!
my $help = "Help/?";
my $head = sprintf( "%-${first}s %6s %4s %4s %s (%d total)",
$index, qw{ Hits/s Tot Last }, $last_field,
scalar keys %{$hist{$index}}
);
#
# Truncate status line if need be
#
$head = substr($head, 0, ($cols - length($help)));
print @Term{"HOME", "START_RV"}, $head, " " x ($cols - length($head) - length($help)), $help, $Term{END_RV}, "\n";
for ( @list ) {
# $_->{_Ttl}++;
my $line = sprintf( "%-${first}s %6.3f %4d %3d %s",
substr( $_->{$index}, 0, $Max_Index_Width ), @$_{(qw{ Rate Total Last }, $last_field)} );
if ( length($line) > $cols ) {
substr( $line, $cols - 1 ) = "";
} else {
$line .= " " x ($cols - length($line));
}
print $line, "\n";
}
print " " x $cols, "\n" while $count++ < $show;
}
sub process_line
{
my $line = shift;
my $now = ( shift || time );
my %hit;
chomp $line;
@hit{@{$Log_Fields{$Log_Format}}} = grep( $_, split( /"([^"]+)"|\[([^]]+)\]|\s/o, $line ) );
$hit{ URI } =~ s/HTTP\/1\S+//gos;
$hit{ Referer } = "<unknown>" if not $hit{Referer} or $hit{Referer} eq "-";
( $hit{Domain} = $hit{Referer} ) =~ s#^\w+://([^/]+).*$#$1#os;
$hit{ Client } ||= "<none>";
$hit{ Client } =~ s/Mozilla\/[\w.]+ \(compatible; /(/gos;
$hit{ Client } =~ s/[^\x20-\x7f]//gos;
# if $now is negative, try to guess how old the hit is based on the time stamp.
if ( $now < 0 ) {
my @hit_t = ( split( m![:/\s]!o, $hit{ Time } ))[ 0 .. 5 ];
my @now_t = ( localtime )[ 3, 4, 5, 2, 1, 0 ];
my @mag = ( 3600, 60, 1 );
# If the hit didn't parse right, or didn't happen today, the hell with it.
return unless $hit_t[2] == ( $now_t[2] + 1900 )
and $hit_t[1] eq $Months[ $now_t[1] ]
and $hit_t[0] == $now_t[0];
splice( @hit_t, 0, 3 );
splice( @now_t, 0, 3 );
# Work backward to the UNIX time of the hit.
$now = time;
$now -= (shift( @now_t ) - shift( @hit_t )) * $_ for ( 3600, 60, 1 );
}
$hit{ Date } = $now;
for my $field ( @Record_Fields ) {
my $peer = ( $hist{$field}{$hit{$field}} ||= { Start => $now, _Ttl => $Initial_TTL } );
@$peer{ @Display_Fields } = @hit{ @Display_Fields };
$peer->{ Total }++;
}
}
sub display_help {
my $msg = "httptop v.$Version";
print @Term{qw/ HOME CLS START_RV /}, $msg, $Term{END_RV}, "\n\n";
print " " x 4, $_, " " x 8, $Keys{$_}[0], "\n" for ( split "", $Help );
print "\nPress any key to continue.\n";
}
### Init.
getopt( 'frb' => \%opt );
$Backtrack = $opt{b} if $opt{b};
$Update = $opt{r} if $opt{r};
$Log_Format = $opt{f} if $opt{f};
$spec = $ARGV[0];
die <<End unless $spec and $Log_Fields{$Log_Format};
Usage: $0 [-f <format>] [-r <refresh_secs>] [-b <backtrack_lines>] <logdir | path_to_log>
Valid formats are: @{[ join ", ", keys %Log_Fields ]}.
End
for ( @Paths ) {
last if -r $spec;
( $spec = $_ ) =~ s/%/$ARGV[0]/gos;
}
die "No access_log $ARGV[0] found.\n" unless -r $spec;
my $file = File::Tail->new(
name => $spec,
interval => $Update / 2,
maxinterval => $Update,
tail => $Backtrack,
nowait => 1
) or die "$spec: $!";
my $last_update = time;
my ( $line, $now );
# Backtracking.
while ( $Backtrack-- > 0 ) {
last unless $line = $file->read;
process_line( $line, -1 );
}
$file->nowait( 0 );
ReadMode 4; # Echo off.
print @Term{"HOME", "CLS"}; # Home & clear.
refresh_output;
### Main loop.
while (defined( $line = $file->read )) {
$now = time;
process_line( $line, $now );
while ( $line = lc ReadKey(-1) ) {
$show_help = 0 if $show_help;
$Keys{$line}[1]->(  ) if $Keys{$line};
}
if ( $show_help == 1 ) {
display_help;
$show_help++; # Don't display help again.
} elsif ( $now - $last_update > $Update and not $show_help ) {
$last_update = $now;
refresh_output( $now );
}
}

Save/exit and make sure you make it executable by setting it to +x (chmod +x httptop)

Now you can run httptop by typing:  httptop -f combined -r 1 /usr/local/apache2/logs/access_log

NOTE:  Your access_log file might be in different location.  Point to the right location.  This sets the refresh rate to 1 sec (-r 1).  Now you can run httptop any time you want to checkout how your http traffic is doing.  Remember to press “?” to get help once you are in.

—————

DISCLAIMER: As always, if you find any inaccurate information, please comment and let me know. When you do comment, make sure you give me some references to confirm.

Apache2 gzip compression: How do I speed up my website download time?

One of the things people tend to forget is the ability for web servers to compress content before sending it back to client. Client’s browser then uncompresses the data and displays it to the user. Pretty much all of the recent browsers support gzip compression. In this post, I will go over how to setup apache2 to use compression. First let’s see if your Apache installation has “deflate” enabled. You can check to see if you have deflate by typing:

# /usr/local/apache2/bin/apachectl -t -D DUMP_MODULES
Loaded Modules:
...
deflate_module (static)
...
Syntax OK

If you don’t have have deflate_module, you would have to recompile your apache with “–enable-deflate” option.

Going forward, I am going to assume you have deflate_module. Add the following to your apache conf file:

<Location />
SetOutputFilter DEFLATE
BrowserMatch ^Mozilla/4\.0[678] no-gzip\
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
# Don't compress images
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary
</Location>

The main thing you need to configure is the line which says “no-gzip dont-vary” also in bold above. This tells apache to not compress certain type of files. I have noticed on some of my sites that swf (flash) files do not work as expected if they are compressed. So if you have swf files in your site, you may want to add |swf right after png.

This is all what it takes for you to enable gzip compression in Apache2. Once you restart your apache so it reads the conf file, you can test if your site is getting compressed or not by using this tool: http://www.gidnetwork.com/tools/gzip-test.php

Here are the results for my blog:

Results for: http://crazytoon.com
Web page compressed? Yes
Compression type? gzip
Size, Markup (bytes) 57,337
Size, Compressed (bytes) 11,666
Compression % 79.7

————————————-
DISCLAIMER: Please be smart and use code found on internet carefully. Make backups often. And yeah.. last but not least.. I am not responsible for any damage caused by this posting. Use at your own risk.

Ramdisk: How do you install and set up Ramdisk under Linux (CentOS, RHEL, Fedora)?

Ramdisk is very good to have if you want something to stay in memory.   Files in memory makes it so you can access them with out having to access hard drive all the time.  Perfect candidates would be things which do not change eg. web images or downloadable files, etc.  If you have Linux Kernel 2.4 or later, you already have support of ramdisk built in.  You can check if ramdisk is setup by doing: 

# dmesg | grep RAMDISK
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize

You should get above output on CentOS and RHEL.  Other linux flavors will have similar output as well.  If you would like to see how they are named and what you would need to refer to, do the following:

# ls -l /dev/ram*
lrwxrwxrwx 1 root root 4 Apr 24 12:05 /dev/ram -> ram1
brw-rw---- 1 root disk 1, 0 Apr 24 12:05 /dev/ram0
brw-rw---- 1 root disk 1, 1 Apr 24 12:05 /dev/ram1
brw-rw---- 1 root disk 1, 10 Apr 24 12:05 /dev/ram10
brw-rw---- 1 root disk 1, 11 Apr 24 12:05 /dev/ram11
brw-rw---- 1 root disk 1, 12 Apr 24 12:05 /dev/ram12
brw-rw---- 1 root disk 1, 13 Apr 24 12:05 /dev/ram13
brw-rw---- 1 root disk 1, 14 Apr 24 12:05 /dev/ram14
brw-rw---- 1 root disk 1, 15 Apr 24 12:05 /dev/ram15
brw-rw---- 1 root disk 1, 2 Apr 24 12:05 /dev/ram2
brw-rw---- 1 root disk 1, 3 Apr 24 12:05 /dev/ram3
brw-rw---- 1 root disk 1, 4 Apr 24 12:05 /dev/ram4
brw-rw---- 1 root disk 1, 5 Apr 24 12:05 /dev/ram5
brw-rw---- 1 root disk 1, 6 Apr 24 12:05 /dev/ram6
brw-rw---- 1 root disk 1, 7 Apr 24 12:05 /dev/ram7
brw-rw---- 1 root disk 1, 8 Apr 24 12:05 /dev/ram8
brw-rw---- 1 root disk 1, 9 Apr 24 12:05 /dev/ram9
lrwxrwxrwx 1 root root 4 Apr 24 12:05 /dev/ramdisk -> ram0

All those ramdisks listed have same size.  In above example, they are all 16MB.  Let us change that so we have more space allowed.  Note that I say allowed and not allocated.  We allocate space in one of the later steps by formatting one of the drives above.   Let us set it up so we have 128 MB.  Since this has to be in multiples of 1024, we will setup Ramdisk to have 131072K. 

vi /etc/grub.conf

Find first line which looks similar to following:

kernel /vmlinuz-2.6.9-42.0.10.EL ro root=/dev/VolGroup00/LogVol00

add ramdisk_size=131072 to the end of the line.  Now your line should look like:

kernel /vmlinuz-2.6.9-42.0.10.EL ro root=/dev/VolGroup00/LogVol00 ramdisk_size=131072 Save and exit grub.conf.  At this point you have it configured to have ramdisk with new size but it does not take effect until you reboot your system.  Once you have rebooted your system, we can start doing rest of configurations.

mke2fs -m 0 /dev/ram0

This will format the ram0 ramdrive for us to use. At this point, kernel will allocate space for you.  Let us setup Ramdisk mount point so we can use it.  We will also have it be owned by user “sunny” so that user can read/write to that mount.

mkdir /home/ramdisk
mount /dev/ram0 /home/ramdisk
chown sunny.sunny /home/ramdisk

At this point you should be able to type:  mount and see your new Ramdisk drive mounted on /home/ramdisk

Remember that everything you put on this drive will be gone if you reboot your server.  If you unmounted the Ramdisk drive and remounted it, your files will still be there.  It is because your system has that much ram set aside for your Ramdisk and will not use it for anything else.   If you would like to setup Ramdisk the same next time you boot up, add these lines to your /etc/rc.local files.

mke2fs -m 0 /dev/ram0
mount /dev/ram0 /home/ramdisk
chown sunny.sunny /home/ramdisk

————————————-
DISCLAIMER: Please be smart and use code found on internet carefully. Make backups often. And yeah.. last but not least.. I am not responsible for any damage caused by this posting. Use at your own risk.

MySQL wait_timeout setting

We were having issues with mysql threads where they would be in sleep mode and wouldn’t die off for long time. At the same time we started having issues with our servers where the load will spike and eventually server will come to halt unless we killed all the apache processes and restarted apache (which seems to be the hung application). We traced it back eventually and noticed that the time when server hung was when it burned through all the ram and was using up all the swap also. So we started to work backwards and tried to resolve one thing at a time. We started with MySQL. We put in wait_timeout = 30 in to my.cnf and restarted mysql. Than I closely watched the server for few hours and noticed that we didn’t have any more of those sleep connections. GREAT! A work around until we get to bottom of whats causing this. That was on Friday. Sat we started noticing different problem. Problem worsened and we started to look into what might’ve caused it and found out that we had a script which was pulling row at a time, processing it, and deleting the row. Except, it was never getting to delete the row due to timeout would kick in and close the connection. We found this out when we watched error logs and saw: Mysql has gone away message.

We took out the wait timeout and everything seems to started to work fine. Did anybody ever notice this behavior where you would loose connection to the mysql server due to timeout? The script which processes line by line and deletes line by line takes fraction of second to process that particular line. Does wait timeout starts counting from the starting of the connection? Does it mean that wait timeout is actually a max connection time limit? Suggestions/comments?

Edit 5/31/09: Friend of mine was getting this error: database error: Lost connection to MySQL server during query
he got around it by adjusting wait_timeout setting.

What is this “load average” I keep hearing about?

I have been asked numerous times what does “load average” means in top. If you don’t know what top is and you have access to linux machine, go type top now and see what it shows.

load average: 2.05, 2.17, 1.93

Quick answer is: first number (2.05) is 1 minute avg, second number (2.17) is 5 minute avg, third number (1.93) is 15 min avg. Generally system admins look at these #’s to see how is their server is doing. But now you wonder, if this is the #’s you look at, why is there cpu %? Isn’t that computer load also? Ofcourse it is. BUT, meaning of cpu % shown in [ Cpu(s): 14.2% us, 1.7% sy, 0.0% ni, 80.7% id, 3.1% wa, 0.0% hi, 0.3% si, 0.0% st ] actually just means how much % of time was spent doing stuff on cpu. On the other hand, load average takes other things such as how much cpu’s were being used and how many process had to wait for their turn to use cpu, etc. Thats why sometimes you will see high % for Cpu usage but low # for load average because things didn’t get queued much and cpu just spiked a bit at the time you looked at it. You can also have slow responding server with high cpu % and low load average.

So what is ok and what is not ok # to see in load average? This is actually simpler to answer than explaining what is what. For each cpu you add, you add 1 to your high #. For example, if you dual cpu, its ok to see load upto 2. Which basically says both of the cpus were doing 100% of work and its ok. If that # is above 2, lets say 4, that means your system is working twice as hard as it should. So lets say you have dual cpu with hyperthreading, what is the optimial number to see in load average? If you said 4, you are correct!

So now you know! Here are couple other commands which will show you load averages:

w
18:58:26 up 438 days, 13:32, 1 user, load average: 1.83, 2.26, 2.24

uptime
18:58:44 up 438 days, 13:33, 1 user, load average: 1.59, 2.18, 2.21

To learn about the commands used in this post, see man w, man uptime, man top

—————

DISCLAIMER: As always, if you find any inaccurate information, please comment and let me know. When you do comment, make sure you give me some references to confirm.