Author Archives: Philip Kearns

Streaming rm, a faster alternative to find … exec rm

Anyone who has done Linux or Unix administration will be very familiar with the “find … -exec rm …” used for cleaning up temporary files. For example if some application is creating temporary files, we will remove its temporary files which are more than seven days old with a command like this:

find /var/spool/thisapp -name thisapp\*.tmp -mtime +7 -exec rm "{}" ";"

Some years ago I came across a situation where an application was creating large numbers of temporary files in a huge directory structure and someone was using a find command like the one above to clear them down. It occurred to me that it would be easy to write a command which took the names of files on the standard input and removed them. I wrote a crude version and ran the command like this (using the above example):

find /var/spool/thisapp -name thisapp\*.tmp -mtime +7 | strrm

It was ten times faster than the original form!  It took thirty seconds versus five minutes for find … exec.  Admittedly there were 46,000 files to remove in that example.

A more comprehensive strrm

Recently I decided it was worth re-writing my streaming rm and you can download the source code.  These are its options:

  • -n|–dryrun Don’t do the remove, but echo the file(s) to the standard output.
  • -v|–verbose Normally strrm will run silently, using this option will echo each file name as it’s removed.

Removing files with awkward names

Apart from the aforementioned example strrm also has another advantage, it doesn’t care about weird file names.  So consider this rather artificial example which has three backspaces in it:

vger:~/tmp(99)+>- ls Annoy*
Annoy???ingFileName
vger:~/tmp(100)+>- ls Annoy* | strrm --dryrun
Annoy\010\010\010ingFileName
vger:~/tmp(101)+>- ls Annoy* | strrm --verbose
Annoy\010\010\010ingFileName
vger:~/tmp(102)+>- ls Annoy*
ls: No match.
vger:~/tmp(103)+>-

Removing empty directory paths

For various reasons it is possible to end up with empty directory paths, i.e. paths which contain only directories and no files.  This will delete all the empty directories working back from the longest path:

find testdir -type d | awk '{printf("%04d %s\n",length($0),$0)}' | sort -rn | sed 's/[0-9][0-9][0-9][0-9] //' | strrm --dryrun

Even though it cannot delete empty directories, I’ve show it with “–dryrun” just in case it doesn’t do exactly what you expect.  Once you’re happy you can remove the “–dryrun”.

 

How Red Hat made Linux palatable for business

A recent blog on TechRepublic mentioned the importance of Red Hat for Linux however I don’t think the blogger, Jack Wallen, quite hit the nail on the head.

The first worthwhile Linux distribution was Debian and it continues to be the most important base for user-accessible Linux, most notably Ubuntu. Ubuntu is a superb desktop, but it is exasperating as a server OS. Debian is based on the principle of constant updates which just doesn’t work in a business environment, where configuration management is critical. Constant updating is particularly perilous with Open Source; I have seen point updates break things. Further, in a professional environment you want to stage updates. So for instance you might introduce Apache 2.2 in Test, but leave it on 2.0 for Staging and Production. Later you can introduce it in Staging and Production. This is just too difficult on Debian to bother even trying.

Enter Red Hat. RH understood the needs of business, and created a much more controllable Linux. It also introduced a new, extremely valuable, facility: the ability to stay on old versions of software without the associated risks. For instance, Apache 2.0 has a few known vulnerabilities and the only remedy from the Apache Software Foundation was to upgrade to the latest version of Apache. This left businesseses in a quandary: Upgrade and risk almost certainly breaking the corporate web site, or just hope no one notices it’s running a vulnerable web server. Red Hat had the solution: it back-ported the security fixes into Apache 2.0 and all was well. This is a service it provides for all its packages.

RHEL isn’t a flawless business OS—for instance, patch auditing is unsatisfactory—but it’s what made Linux acceptable to the business community.

LinkedIn streamlining its apps into oblivion

Just before Christmas LinkedIn announced changes to its profile pages adding that it would also be “streamlining” its app offerings, resulting in its link to WordPress® being streamlined into oblivion.  This was bad news for me because it is important to me that my blog is associated with my LinkedIn identity.  Anyone who was using LinkedIn’s apps were told in the same email that they will have new ways to “showcase rich content” on their profile; there was no follow-up email to suggest how or when this might work.  An Internet search for the term “linkedin showcase rich content on your profile” turned up a lot of leads one of which was hosted on wordpress,com, describing in more detail how the rich content feature would work.

That blog described the nature of LinkedIn’s new offerings—which is more than LinkedIn itself did—and I confirmed that none of those new features was any use to me. However a subsequent link in the search results revealed something that really should have been on LinkedIn’s notification: it’s possible to get your WordPress blog to write to LinkedIn.

DROP TABLE failing due to table lock

When a DROP TABLE fails due to a database table lock error it is not necessarily the table that is being dropped that is locked.

The background

Recently I wrote some code for searching a database from a web interface and I wanted to add an option to graph the data.  The first task was to take the search data and reformat it for pChart.  I settled on code that created two temporary tables and at the end I dropped the two tables, even though it’s not necessary since they will cease to exist once the connection is closed.  However when I started using it in anger (so to speak) I was getting an error complaining that table was locked.  Since it wasn’t doing any harm I continued with the rest of the coding.  Once I had finished the main code I couldn’t bring myself to leave the DROP TABLE problem unsolved.

The investigation

Even though I have been programming for over thirty years, I am still relatively inexperienced at SQL programming so I searched the Internet for clues and while one example came quite close, it didn’t contain enough context.  However it did indicate where I should be looking.  The first step was to demonstrate that there was no problem with the code which created and used the temporary tables, GraphExpenditure().  This could be done by closing and re-opening the database connection just before calling the function:

$ExpHandle = 0 ; // Close the database connection
if(($ExpHandle=ExpendConn($DBname))==FALSE) {
        printf("Cannot open database file %s\n",$DBname) ;
} 
else 
        GraphExpenditure($ExpHandle,$SQL_Common,$OutputType) ;

Once I did that the DROP TABLE error went away.  This proved that the problem lay with the code that existed before the graphing code was even introduced.

To debug where that problem was I changed from PDO class to PDOstatement class because that would allow me to explicitly close each connection (with closeCursor()).  There are three pieces of SQL code before graphing:

  1. SELECT SUM() of the data which was searched for.
  2. Much more elaborate SELECT SUM() of the data.
  3. Search for the data itself and display it as a table.

I discovered that if either of the first two sets of code doesn’t have closeCursor() then the DROP TABLE later would fail.

The solution

The solution turned out to be relatively simple: just complete all the queries until they return FALSE.  In PHP (the normal checks have been removed to make the code easier to read), the important part is highlighted:

$CntDataQry = $DBhandle->query($CntDataSQL = "SELECT SUM(Counter) FROM SampleTable") ;
$row = $CntDataQry->fetch() ; // Fetch first (and only) row of the result
printf("SUM(Counter) = %d\n",$row[0]) ;
while($CntDataQry->fetch()!=FALSE) ; // This will fetch nothing

So the table that was locked was the system table.  In a previous job the DBAs regularly uttered oaths about Oracle’s system table, which if it locks, will quickly bring the whole system to a grinding halt.  Trying to compare my situation (SQLite) with Oracle is a bit like comparing a boat with an outboard motor to an ocean-going liner, but it’s interesting to see that they have some things in common.

Example code

I’ve written some sample code which will demonstrate a failure to DROP TABLE, followed by its resolution.  It’s intended to be run from the command line.

 

Missing disk space Linux/Unix: when df disagrees with du -s

A common situation many admins find themselves in is where they quickly have to clear down disk space.  So for instance, say /u01 is filling up.  The Oracle admin knows that the database will simply stop if he doesn’t take action quickly.  With the judicious use of du -s he finds some large directories and quickly deletes a few temporary files he know the database doesn’t immediately need.  He does a ‘df -h’ to find that it hasn’t made any difference!  He then does his ‘du -s’ and it shows the space has been freed up.  He doesn’t know it, but he has deleted at least one open file whose space won’t be freed up until the process is closed.  What he should have done is this:

echo "" > offendingfile

where offendingfile is the huge file.

In the case of the Oracle admin it’s likely his only choice is to restart the database.  Consider a more general case where a Linux/Unix admin has deleted files but has lost track of where the files were and what might be using them.  Or one admin deleted the files and scarpered leaving the other trying to clean up the mess.  He is left with the bigger challenge of trying to find what process is holding what files open.

A starting point: lsof

The lsof command can be a good starting point, however you are now looking for a needle in smaller haystack, so you will have to do some further filtering.  On CentOS 6 it will mark files which have been deleted, however it seems to throw up quite a few false positives.

To illustrate the problem of open files I have created some C code which will create a big file and sleep for 1,000 seconds.  Compiling and running the binary I will get a 10 Mbyte file:

/var/tmp/SampleBigFile

If I then remove the file I have then created the situation described above.  On CentOS 6 I could run:

lsof | fgrep '(deleted)'

but that produces 24 results (among which are files that haven’t been deleted, like /usr/bin/gnome-screensaver), so it would be a good idea to shrink the range.  For instance it’s likely in this situation that is just one file system that is full so you could grep for its mount point.  That does it nicely in our example:

[root@centos6 ~]# lsof | fgrep '(deleted)' | fgrep /var
createope 11012 admin 3u REG 253,3 10485761 693 /var/tmp/SampleBigFile (deleted)
[root@centos6 ~]#

In MacOS (Darwin) there is no ‘(deleted)’ label so go straight for checking for /var:

vger:~ root# lsof | egrep 'REG.*/var/tmp'
mysqld    346 _mysql 4u  REG 14,18        0 6217706 /private/var/tmp/ibu4Nw9X
mysqld    346 _mysql 5u  REG 14,18        0 6217707 /private/var/tmp/ib6jCfyT
mysqld    346 _mysql 6u  REG 14,18        0 6217708 /private/var/tmp/ibu9Zqxb
mysqld    346 _mysql 7u  REG 14,18        0 6217709 /private/var/tmp/iboukiVq
mysqld    346 _mysql 11u REG 14,18        0 6217710 /private/var/tmp/ibLRW39J
createope 42775 admin 3u REG 14,18 10485761 6308941 /private/var/tmp/SampleBigFile
vger:~ root#

(REG indicates a regular file.)  While our big file is clearly identifiable here, if it wasn’t you could try something like sort -k7 to sort on file size.

When all debugging routes have failed: network scans and/or code tracing

In the world of car, bike and motorbike mechanics there is a versatile tool which is something of a last resort: the vice-grips (sometimes referred to as the bodger’s tool, because of people’s tendency to shear bolts with them).  In the world of operating systems there are two tools I have found to be like vice-grips, but not potentially harmful: Network scanning and code tracing.

Network scanning

Most operating systems have a way of scanning the network:

  • Linux: tcpdump, Wireshark
  • Darwin (MacOS): tcpdump, Wireshark
  • Solaris: snoop, tcpdump, Wireshark
  • Windows: Wireshark (there is also a version of tcpdump for Windows)

So, why is network scanning useful?  Well consider the situation where you have installed the monitoring software, Xymon.  The server is already working and most of the clients are responding, but the server isn’t receiving data from one of the clients.  Xymon uses port 1984 so you can check to watch the traffic going to and from the server:

[root@host1 etc]# tcpdump port 1984
tcpdump: verbose output suppressed, use -v or for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
15:08:00.857457 IP host2.linuxtech.ie.32821 > xymonserver.linuxtech.ie.1984: S 1387852790:1387852790(0) win 5840 <mss 1460,sackOK,timestamp 119364978 0,nop,wscale 2>
15:08:00.864380 IP xymonserver.linuxtech.ie.1984 > host2.linuxtech.ie.32821: S 3491816971:3491816971(0) ack 1387852791 win 5792 <mss 1460,sackOK,timestamp 8108268 119364978,nop,wscale 0>
15:08:00.864553 IP host2.linuxtech.ie.32821 > xymonserver.linuxtech.ie.1984: . ack 1 win 1460 <nop,nop,timestamp 119364993 8108268>15:08:00.865187 IP host2.linuxtech.ie.32821 > xymonserver.linuxtech.ie.1984: . 1:1449(1448) ack 1 win 1460 <nop,nop,timestamp 119364993 8108268>
15:08:00.865419 IP host2.linuxtech.ie.32821 > xymonserver.linuxtech.ie.1984: . 1449:2897(1448) ack 1 win 1460 <nop,nop,timestamp 119364994 8108268>
15:08:00.867342 IP xymonserver.linuxtech.ie.1984 > host2.linuxtech.ie.32821: . ack 1449 win 8688 <nop,nop,timestamp 8108268 119364993>
15:08:00.867486 IP host2.linuxtech.ie.32821 > xymonserver.linuxtech.ie.1984: P 2897:4345(1448) ack 1 win 1460 <nop,nop,timestamp 119364996 8108268>
15:08:00.867684 IP host2.linuxtech.ie.32821 > xymonserver.linuxtech.ie.1984: . 4345:5793(1448) ack 1 win 1460 <nop,nop,timestamp 119364996 8108268>
15:08:00.868361 IP xymonserver.linuxtech.ie.1984 > host2.linuxtech.ie.32821: . ack 2897 win 11584 <nop,nop,timestamp 8108268 119364994>
15:08:00.869032 IP host2.linuxtech.ie.32821 > xymonserver.linuxtech.ie.1984: . 5793:7241(1448) ack 1 win 1460 <nop,nop,timestamp 119364997 8108268>

So in this example the traffic is going from host1 to the Xymon server’s port, so it looks like Xymon is receiving the data.  What’s wrong is that DNS knows this host as host2.linuxtech.ie not host1 so Xymon doesn’t realise it’s receiving data for host1.  So there are a few solutions, for example you can configure host1 to explicitly tell Xymon that it is host1.

Another example was when I was trying to get some commercial software working in a firewall, where the DNS servers were locked down to resolve only addresses we allowed them too.  The documentation said that it would need to be able to resolve, say, swcheck.sweet.ie, but it still wasn’t working.  So gave it just one DNS server and watched what addresses it asked for and sure enough it was asking for swcheck.sweet.ie, but also for, say, dwnld.sweet.ie.  So I needed to make sure that was added to the list of addresses it could resolve.

Another nice thing about tcpdump in particular is its data can be saved to a file which can be imported into Wireshark on another server.  This is very handy if you have a sensitive host where you can’t run the GUI of Wireshark.

There’s a lot to this subject but I hope this helps.

Code tracing

When I say code tracing, I mean tracing system and library calls.  Most operating systems have a way to do this:

  • Linux: strace, ltrace
  • Darwin: dtruss, dtrace (both require root/sudo)
  • Solaris: truss, dtrace
  • Windows: (none that I can find)

In my opinion Linux has the best implementation of code tracing.  (Darwin/FreeBSD/Solaris’s DTrace  and Linux’s SystemTap are exceedingly powerful, but beyond the scope of this post.)  Suppose you want to see what environment variables a program is using:

[admin2@centos6 ~]$ ltrace -e getenv -o /tmp/tmp.adm2.ltrace vi
[admin2@centos6 ~]$ ls -l /tmp/tmp.adm2.ltrace
-rw-rw-r--. 1 admin2 admin2 1777 Oct 2 05:08 /tmp/tmp.adm2.ltrace
[admin2@centos6 ~]$ vim /tmp/tmp.adm2.ltrace
[admin2@centos6 ~]$ cat /tmp/tmp.adm2.ltrace
(0, 0, 0, 0x7fcf69b6d918, 88) = 0x3b6ec21160
getenv("HOME") = "/home/admin2"
getenv("VIM_POSIX") = NULL
getenv("SHELL") = "/bin/bash"
getenv("TMPDIR") = NULL
getenv("TEMP") = NULL
getenv("TMP") = NULL
getenv("VIMRUNTIME") = NULL
getenv("VIM") = NULL
getenv("VIM") = NULL
getenv("VIMRUNTIME") = "/usr/share/vim/vim72"
getenv("VIM") = "/usr/share/vim"
getenv("TERM") = "xterm"
getenv("COLORFGBG") = NULL
getenv("VIMINIT") = NULL
getenv("HOME") = "/home/admin2"
getenv("EXINIT") = NULL
getenv("HOME") = "/home/admin2"
(0x3b6ec21160, 0, 0, 0x3b6ec21160, 0) = 140608
(0, 0, 0, 3, 0x963cf85) = 0x3b6ec21160
+++ exited (status 0) +++
[admin2@centos6 ~]$

So consider you have a program which is reading a configuration file from somewhere but you can’t figure out where.  The best thing is to check its open() (which will cover fopen() too), stat() and lstat().  stat and lstat check existence, permissions etc. of a closed file.  So this example uses vi (even though the esteemed Meneer Bram Moolenaar has so extensively documented vim this is a redundant example):

[admin2@centos6 ~]$ strace -e stat,lstat,open -o /tmp/tmp.adm2.strace vi
[admin2@centos6 ~]$ cat /tmp/tmp.adm2.strace
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib64/libm.so.6", O_RDONLY) = 3
open("/lib64/libselinux.so.1", O_RDONLY) = 3
open("/lib64/libncurses.so.5", O_RDONLY) = 3
open("/lib64/libacl.so.1", O_RDONLY) = 3
open("/lib64/libc.so.6", O_RDONLY) = 3
open("/lib64/libtinfo.so.5", O_RDONLY) = 3
open("/lib64/libdl.so.2", O_RDONLY) = 3
open("/lib64/libattr.so.1", O_RDONLY) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
stat("/usr/share/vim/vim72", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/usr/share/vim", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/home/admin2/.terminfo", 0x7fff4b7dbb00) = -1 ENOENT (No such file or directory)
stat("/etc/terminfo", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/usr/share/terminfo", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/usr/share/terminfo/x/xterm", O_RDONLY) = 3
open(".", O_RDONLY) = 3
stat("/etc/virc", {st_mode=S_IFREG|0644, st_size=1962, ...}) = 0
open("/etc/virc", O_RDONLY) = 3
open(".", O_RDONLY) = 3
stat("/home/admin2/.vimrc", 0x7fff4b7dd460) = -1 ENOENT (No such file or directory)
open("/home/admin2/.vimrc", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/admin2/_vimrc", O_RDONLY) = -1 ENOENT (No such file or directory)
open(".", O_RDONLY) = 3
stat("/home/admin2/.exrc", 0x7fff4b7dd460) = -1 ENOENT (No such file or directory)
open("/home/admin2/.exrc", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/nsswitch.conf", O_RDONLY) = 3
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib64/libnss_files.so.2", O_RDONLY) = 3
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
[admin2@centos6 ~]$

It is my belief that the true mastery of a skill is to take from the specific to the general and back to the specific again.  So these are specific examples of using these tools, which I hope gives you an insight into the general principles so you can apply them to your specific problems.

 

Mac OS temporary storage area for App Store installs

Mac OS uses a temporary storage area used for downloading apps before installing.  Apple made it very easy to find in Snow Leopard:

~/Library/Application Support/AppStore

Where ~ is your home directory.

However on Lion (and Mountain Lion) it decided to get creative, so trying to find the temporary files is a little trickier.  This is how I found the temporary storage for a download of iPhoto:

vger.local:~(84)+>- date ; sudo find /var/folders -name \*.pkg -mtime -1 -ls
Sat 22 Sep 2012 21:12:02 IST
6021879 50368 -rw-r--r-- 1 pkearns staff 25788416 22 Sep 21:12 /var/folders/n8/fcgpxxl55dgdwz6fphbd5f_40000gn/C/com.apple.appstore/408981381/mzm.ivjumeji.pkg
vger.local:~(85)+>- sudo ls -l /var/folders/n8/fcgpxxl55dgdwz6fphbd5f_40000gn/C/com.apple.appstore/408981381/
total 57368
-rw-r--r-- 1 pkearns staff 410345 22 Sep 16:00 flyingIcon
-rw-r--r-- 1 pkearns staff 28934144 22 Sep 21:12 mzm.ivjumeji.pkg
-rw-r--r-- 1 pkearns staff 13466 22 Sep 15:59 preflight.pfpkg
-rw-r--r-- 1 pkearns staff 4642 22 Sep 15:59 receipt
vger.local:~(86)+>-

It will be there only during the download.

Objective-C for C programmers

Anyone from a C background like me will find parts of Objective-C—Apple’s primary development language—nigh-on indecipherable.  It’s an amalgam of C and Smalltalk which appeared in 1983.  It seems to have been developed around the same time as C++, but for my money Bjarne Stroustrup et al did a better job.  Maybe I’m comparing Apples and Oranges (apologies for the pun).

Fortunately a helpful man by the name of Tristan O’Tierney has written a very good introduction to Objective C for C programmers.  However he doesn’t mention how to compile Objective C using the command line.

Compiling Objective C from the command line

I’m afraid I’m the computer equivalent of a survivalist and always like to know where possible how to revert to the simplest form.  Here’s a Scripting Bridge example which is derived from code on Stack Overflow.

#import <Foundation/NSString.h>
#import <Foundation/NSAutoreleasePool.h>
#import <stdio.h>
#import "Finder.h"

int main(int argc,const char *argv[])
{
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    FinderApplication *finder = [SBApplication applicationWithBundleIdentifier:@"com.apple.finder"];

    SBElementArray *windows = [finder windows ]; // array of finder windows
   NSArray *targetArray = [windows arrayByApplyingSelector:@selector(target)];// array of targets of the windows

    //gets the first object from the targetArray,gets its URL, and converts it to a posix path
    NSString *newURLString = [[NSURL URLWithString: (id) [[targetArray objectAtIndex:0]URL]] path];

    printf("POSIX path for first Finder window: %s\n",[newURLString UTF8String]) ;

    [pool drain] ;
}

This can be compiled on the command line like this:

cc -Wall -O -l objc -framework Foundation -framework ScriptingBridge ScrBridge.m -o ScrBridge

The key parts of that command line are the framework options.  Now, I’ll concede that as applications get more complicated you need a development environment, such as Xcode, and of course it is all but impossible to do any but the most trivial graphical applications without such an environment.  But for systems programming I’ll stick with the command line for as long as possible.

Sleep command for a random amount of time

Most Unix/Linux users will be familiar the sleep command which you can use to delay for the specified number of seconds.  A few years ago I had need for a sleep command which would sleep for a random amount of time, so I came up with some code which as it happens is a nice example of interrupt handling in Unix/Linux.

The code I’ve written to do this random sleep can be freely used but I would like you to leave the reference to this site, http://linuxtech.ie.  It has no external dependencies, so this should compile it:

cc -Wall -O randsleep.c -o randsleep

The two options I use are to warn about any dodgy coding (-Wall) and -O for optimisation (not really an issue here!).

It is used like this:

randsleep [-v] <lower limit> <upper limit>

The -v option will echo the random time it has calculated, e.g:

vger:~(217)+>- randsleep -v 2 7
Sleeping for 3.49 seconds
vger:~(218)+>- randsleep -v 2 7
Sleeping for 5.30 seconds
vger:~(219)+>-