Category Archives: Scripting

DROP TABLE failing due to table lock

When a DROP TABLE fails due to a database table lock error it is not necessarily the table that is being dropped that is locked.

The background

Recently I wrote some code for searching a database from a web interface and I wanted to add an option to graph the data.  The first task was to take the search data and reformat it for pChart.  I settled on code that created two temporary tables and at the end I dropped the two tables, even though it’s not necessary since they will cease to exist once the connection is closed.  However when I started using it in anger (so to speak) I was getting an error complaining that table was locked.  Since it wasn’t doing any harm I continued with the rest of the coding.  Once I had finished the main code I couldn’t bring myself to leave the DROP TABLE problem unsolved.

The investigation

Even though I have been programming for over thirty years, I am still relatively inexperienced at SQL programming so I searched the Internet for clues and while one example came quite close, it didn’t contain enough context.  However it did indicate where I should be looking.  The first step was to demonstrate that there was no problem with the code which created and used the temporary tables, GraphExpenditure().  This could be done by closing and re-opening the database connection just before calling the function:

$ExpHandle = 0 ; // Close the database connection
if(($ExpHandle=ExpendConn($DBname))==FALSE) {
        printf("Cannot open database file %s\n",$DBname) ;
} 
else 
        GraphExpenditure($ExpHandle,$SQL_Common,$OutputType) ;

Once I did that the DROP TABLE error went away.  This proved that the problem lay with the code that existed before the graphing code was even introduced.

To debug where that problem was I changed from PDO class to PDOstatement class because that would allow me to explicitly close each connection (with closeCursor()).  There are three pieces of SQL code before graphing:

  1. SELECT SUM() of the data which was searched for.
  2. Much more elaborate SELECT SUM() of the data.
  3. Search for the data itself and display it as a table.

I discovered that if either of the first two sets of code doesn’t have closeCursor() then the DROP TABLE later would fail.

The solution

The solution turned out to be relatively simple: just complete all the queries until they return FALSE.  In PHP (the normal checks have been removed to make the code easier to read), the important part is highlighted:

$CntDataQry = $DBhandle->query($CntDataSQL = "SELECT SUM(Counter) FROM SampleTable") ;
$row = $CntDataQry->fetch() ; // Fetch first (and only) row of the result
printf("SUM(Counter) = %d\n",$row[0]) ;
while($CntDataQry->fetch()!=FALSE) ; // This will fetch nothing

So the table that was locked was the system table.  In a previous job the DBAs regularly uttered oaths about Oracle’s system table, which if it locks, will quickly bring the whole system to a grinding halt.  Trying to compare my situation (SQLite) with Oracle is a bit like comparing a boat with an outboard motor to an ocean-going liner, but it’s interesting to see that they have some things in common.

Example code

I’ve written some sample code which will demonstrate a failure to DROP TABLE, followed by its resolution.  It’s intended to be run from the command line.

 

Finding a severe resource hog on your server

Have you ever experienced the situation where a server which becomes bafflingly unresponsive to the point when even your monitoring services are failing to report.  Then to have the server start responding, sometimes after a reboot?  You suspect it was a process that went berserk, but which one?  The monitoring software itself was out of action during the crisis, so it can’t tell you anything.

I have come across this quite a few times in my career, most recently in clustered database servers.  Sometimes these outages can be so severe that even very lightweight monitoring software like sar and Xymon (formerly known as Hobbit and Big Brother) can be taken out of action.  In the past I have resorted to using a loop to save the output of the command top to a file every 30 seconds, hoping to catch an event, but that is ugly for so many reasons.

SQLite to the rescue

SQLite is a serverless implementation of SQL.  It is a tiny binary—less than 50 kbyte on Mac OS X—and it stores its data in a file.  The way SQLite helps us here is that it allows us to store the data and analyse and/or trim it using all sorts of criteria.

So first create the database file:

sqlite3 storetops.sqlite3

That will give you a sqlite> prompt at which you can type this:

CREATE TABLE savetop (
HostName TEXT, -- So we know which host this data came from
DateAndTime TEXT,
Load REAL,
TopProcesses TEXT
) ;

Type .quit to exit.  (SQL professionals will balk at my not using the DATETIME type, but SQLite contains no such data type.)  You next want to write a simple script which will write to that at intervals of, say, 30 seconds.  I’ve written a basic performance-monitoring script which you can use on Linux or MacOS X.

Here is a sample of data I collected on my Mac:

sqlite> SELECT HostName,DateAndTime,Load FROM savetop ;
vger|2012-09-03 19:51:30|0.93
vger|2012-09-03 19:51:43|1.3
vger|2012-09-03 19:51:55|1.17
vger|2012-09-03 19:52:08|1.79
vger|2012-09-03 19:52:20|1.66
vger|2012-09-03 19:52:33|1.44
vger|2012-09-03 19:52:45|1.22
vger|2012-09-03 19:52:57|1.34
vger|2012-09-03 19:53:10|1.36
vger|2012-09-03 19:53:22|1.23
vger|2012-09-03 22:10:11|1.06
vger|2012-09-03 22:10:24|1.59
vger|2012-09-03 22:10:36|1.46
vger|2012-09-03 22:10:49|1.24
vger|2012-09-03 22:11:01|1.2
vger|2012-09-03 22:11:13|1.01
vger|2012-09-03 22:11:26|1.38
vger|2012-09-03 22:11:38|1.48
vger|2012-09-03 22:11:51|1.33
vger|2012-09-03 22:12:03|1.71
sqlite>

If there is an item of interest I can examine its top processes:

SELECT TopProcesses FROM savetop WHERE DateAndTime='2012-09-03 22:10:36' ;

Suppose I want to count the number of events where the load was greater than 1.4:

sqlite> SELECT COUNT(HostName) FROM savetop WHERE Load>1.4 ;
7
sqlite>

Of course this is artificial—normally we are looking at much higher high loads—however it illustrates the advantage of this approach.  If you are monitoring over a long period it’s likely the SQLite file will get very large but that is also very easy to remedy:

sqlite> SELECT COUNT(HostName) FROM savetop ;
20
sqlite> DELETE FROM savetop WHERE Load<1.4 ;
sqlite> SELECT COUNT(HostName) FROM savetop ;
7
sqlite>

Use of load rather than CPU usage

While this is beyond the immediate scope of this post, some of you might be wondering why I am using load instead of CPU usage.  Most operating systems (including Windows) have built-in strategies for handing CPU hogs, for instance in Linux and Unix the process priority for a CPU hog is automatically downgraded.  The result is a server whose CPU is flat out can be still quite usable.  Load—which represents the number of processes waiting to execute—is a much more reliable indicator of a server in distress.  High load can be caused by too many processes trying to access the CPUs and/or I/O delays, which in turn can be caused by busy or slow disks.  On Solaris and Windows you can determine if a server is CPU-bound by checking the percentage of time runnable processes (those not waiting for I/O or sleeping for other reasons) are waiting for CPUs; if this is higher than, say, 5% then the server is CPU-bound.