![]() |
![]() |
Home / Documentation / 1.0 / mod_perl 1.0 User Guide / | ![]() |
|
![]() |
||||
![]() |
![]() |
|||
![]() |
![]() |
|||
![]() |
||||
![]() |
![]() |
|||
![]() |
Debugging mod_perl | ![]() |
||
![]() |
||||
![]() |
![]() |
![]() |
||
![]() |
||||
![]() |
||||
![]() |
![]() |
![]() |
![]() |
||
![]() |
||
![]() |
![]() |
![]() |
![]() |
||
![]() |
||
![]() |
![]() |
![]() |
![]() |
||
![]() |
||
![]() |
![]() |
![]() |
![]() |
||
![]() |
||
![]() |
![]() |
![]() |
![]() |
||
![]() |
||
![]() |
![]() |
![]() |
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
|
|
||
![]() |
||
Tired of Internal Server Errors? Find out how to debug your mod_perl applications, thanks to a number of features of Perl and mod_perl.
Let's talk first about things that bother most web (and non-web) programmers. The bothering things are warning and errors reported by Perl. We are going to learn how to take the best out of both, by turning this obvious to the newbie programmer enemies into our best friends.
You have just installed this new CGI script and when you try it out you see the grey screen of death saying "Internal Server Error"... Or even worse you have a script running on a production server for a long time without problems, when the same grey screen starts to show up occasionally for no apparent reason.
How can we find out what the problem is?
First problem:
You have been coding in Perl for years, and whenever an error occurred in the past it was displayed in the same terminal window that you started the script from. But when you work with a webserver there is no terminal to show you the errors, since the server in most cases has no terminal to send the error messages to.
Actually, the error messages don't disappear, they end up in the
error_log file. It is located in the directory specified by the
ErrorLog directive in httpd.conf. The default setting is
generally:
ErrorLog /usr/local/apache/logs/error_log
So whenever you see "Internal Server Error" it's time to look at this file.
First problem solved!
There are cases when errors don't go to the error_log file. For example, some errors go to the httpd process' STDERR. If you haven't redirected httpd's STDERR then the messages are printed to the console (tty, terminal) from which you executed the httpd. This happens when the server didn't get as far as opening the error_log file for writing before it needed to write an error message.
For example, if you have entered a non-existent directory path in your
ErrorLog directive, the error message will be printed to STDERR.
If the error happens when the server executes a PerlRequire or
PerlModule directive you might also see output sent to STDERR.
You are probably wondering where all the errors go when you are
running the server in single process mode (httpd -X). They go to
STDERR. This is because the error logging for all the httpd children
is normally done by the parent httpd. When httpd runs in single
process mode, it has no parent httpd process to perform all the
logging. The output to the terminal includes all the status messages
that normally go to the error_log file.
Finally with a PerlLogHandler you can take away from Apache its
control of the error logging process for all HTTP transactions. If
you do this, then you are responsible for generating and storing the
error messages. You can do whatever you like with the information,
(including throwing it away -- don't do it!) and, depending on how you
implement you LogHandler, the ErrorLog directive may have no
effect. But you can also do something at this handler and then return
DECLINED status, so the default Apache LogHandler will do the work
as usual.
Second problem:
The usefulness of the error message depends to some extent on the programmer's coding style. An uninformative message might not help you to spot and fix the error.
For example, let's take a function which opens a file passed to it as a parameter. It does nothing else with the file. Here's our first version of the code:
my $r = shift;
$r->send_http_header('text/plain');
sub open_file{
my $filename = shift || '';
die "No filename passed!" unless $filename;
open FILE, $filename or die;
}
open_file("/tmp/test.txt");
Let's assume that /tmp/test.txt doesn't exist so the open() will
fail to open the file. When we call this script from our browser, the
browser returns an "internal error" message and we see the
following error appended to error_log:
Died at /home/httpd/perl/test.pl line 9.
We can use the hint Perl kindly gave to us to find where in the code the die() was called. However, we still don't know what filename was passed to this subroutine to cause the program termination.
If we have only one function call as in the example above, the task of finding the problematic filename will be trivial. Now let's add two more open_file() function calls and assume that among the three files only /tmp/test2.txt exists:
open_file("/tmp/test.txt");
open_file("/tmp/test2.txt");
open_file("/tmp/test3.txt");
When you execute the above call, you will see the same error message twice:
Died at /home/httpd/perl/test.pl line 9. Died at /home/httpd/perl/test.pl line 9.
Based on this error message, can you tell what files your program failed to open? Probably not. Let's fix it by passing the name of the file to die():
sub open_file{
my $filename = shift || '';
die "No filename passed!" unless $filename;
open FILE, $filename or die "failed to open $filename";
}
open_file("/tmp/test.txt");
When we execute the above code, we see:
failed to open /tmp/test.txt at /home/httpd/perl/test.pl line 9.
which makes a big difference.
By the way, if you append a newline to the end of the message you pass to die(), Perl won't report the line number the error has happened at, so if you code:
open FILE, $filename or die "failed to open a file\n";
The error message will be:
failed to open a file
Which gives you very little to go on. It's very hard to debug with such uninformative error messages.
The warn() function, a kinder sister of die(), which logs the message but doesn't cause program termination, behaves in the same way. If you add a newline to the end of the message, the line number warn() was called at won't be logged, otherwise it will.
You might want to use warn() instead of die() if the failure isn't critical. Consider the following code:
if(open FILE, $filename){
# do something with file
} else {
warn "failed to open $filename";
}
# more code here...
Now we've improved our code, by reporting the names of the problematic
files, but we still don't know the reason for the failure. Let's try
to improve the warn() example. The -r operator tests whether the
file is readable:
if(-r $filename){
open FILE, $filename;
# do something with file
} else {
warn "Couldn't open $filename - doesn't exist or is not readable";
}
Now if we cannot read the file we do not even try to open it. But we still see a warning in error_log:
Couldn't open /tmp/test.txt - doesn't exist or is not readable at /home/httpd/perl/test.pl line 9.
The warning tells us the reason for the failure, so we don't have to go to the code and check what it was trying to do with the file.
It could be quite a coding overhead to explain all the possible
failure reasons that way, but why reinvent the wheel? We already have
the reason for the failure stored in the $! variable. Let's go
back to the open_file() function:
sub open_file{
my $filename = shift || '';
die "No filename passed!" unless $filename;
open FILE, $filename or die "failed to open $filename: $!";
}
open_file("/tmp/test.txt");
This time, if open() fails we see:
failed to open /tmp/test.txt: No such file or directory at /home/httpd/perl/test.pl line 9.
Now we have all the information we need to debug these problems: we
know what line of code triggered die(), we know what file we were
trying to open, and last but not least we know the reason, given to us
through Perl's $! variable.
Now let's create the file /tmp/test.txt.
% touch /tmp/test.txt
When we execute the latest version of the code, we see:
failed to open /tmp/test.txt: Permission denied at /home/httpd/perl/test.pl line 9.
Here we see a different reason: we created a file that doesn't belong to the user which the server runs as (usually nobody). It does not have permission to read the file.
Now you can see that it's much easier to debug your code if you validate the return values of the system calls, and properly code arguments to die() and warn() calls. The open() function is just one of the many system calls perl provides to your convenience.
So now you can code and debug CGI scripts and modules as easily as if they were plain Perl scripts that you execute from a shell.
Second problem solved!
It's a good idea to keep it open all the time in a dedicated terminal with the help of tail -f or less -S, whichever you prefer (the latter allows you to page around the file, search etc.)
% tail -f /usr/local/apache/logs/error_log
or
% less -S /usr/local/apache/logs/error_log
So you will see all the errors and warning as they happen.
Another tip is to create a shell alias, to make it easier to execute the above command. In tcsh you would do something like this:
% alias err "tail -f /usr/local/apache/logs/error_log"
For bash users the command is:
% alias err='tail -f /var/log/apache/error_log'
and from now on in the shell you set the alias in, executing
% err
will call tail -f /usr/local/apache/logs/error_log. Since you want this alias to be available to you all the time, you should put it into your .tcshrc file or its equivalent. For bash users this is .bashrc, or you can put it in /etc/profile for use by all users.
If you cannot access your error_log file because you are unable to telnet to your machine (generally the case when an ISP provides user CGI support but no telnet access), you might want to use a CGI script I wrote to fetch the latest lines from the file (with a bonus of colored output for easier reading). You might need to ask your ISP to install this script for general use. See Watching the error_log file without telneting to the server .
Just like errors, Perl's mandatory warnings go to the error_log file, if the they are enabled. Of course you have enabled them in your development server, haven't you?
The code you write lives a dual life. In the first life it's being written, tested, debugged, improved, tested, debugged, rewritten, tested, debugged. In the second life it's just used.
A significant part of the script's first life is spent on the developer's machine. The other part is spent on the production server where the creature is supposed to be perfect.
So when you develop the code you want all the help in the world to help you spot possible problems, and that's where enabling warnings is a must. Whenever you see an error or warning in the error_log, you want to get rid of it. That's very important.
Why?
If there are warnings, your code is not clean. If they are waved away, expect them to come back on the production server in the form of errors, when it's too late.
If each invocation of a script generates more than about five lines of warnings, it will be very hard to catch real problems. You just can't see them among all the other warnings which you used to think were unimportant.
On the other hand, on a production server, you really want to turn warnings off. And there are good reasons for that:
There is no added value in having the same warning showing up, again and again, triggered by thousands of script invocations. If your code isn't very clean and generates even a single warning per script invocation, on the heavily loaded server you will end up with a huge error_log file in a short time.
The warning elimination phase is supposed to be a part of the development process, and should be done before the code goes live.
In any Perl script, not just under mod_perl, enabling runtime warnings has a performance impact.
mod_perl gives you a very simple solution to this warnings saga, don't enable warnings in the scripts unless you really have to. Let mod_perl control this mode globally. All you need to do is put the directive
PerlWarn On
in httpd.conf on your development machine and the directive
PerlWarn Off
on the live box. Here is a complete description on how to manipulate warning modes under mod_perl.
If there is a piece of code that generates warnings and you want to
disable them only in this code, you can do that too. The Perl special
variable $^W allows you dynamically to turn on and off warnings
mode. So just put the code into a block, and disable the warnings in
the scope of this block. The original value of $^W will be restored
upon exit from the block.
{
local $^W=0;
# some code that generates innocuous warnings
}
Unless you have a really good reason, for your own sake the advice is avoid this technique.
Don't forget the local() operand! If you do, setting $^W will
affect all the requests handled by the Apache child that changed
this variable. And for all the scripts it executes, not just the
one which changed $^W!
The diagnostics pragma can shed more light on errors and warnings,
as you will see in a moment.
This module extends the terse diagnostics normally emitted by both the
Perl compiler and the Perl interpreter, augmenting them with the more
verbose and endearing descriptions found in the perldiag manpage.
Like the other pragmata, it affects the compilation phase of your
scripts as well as the execution phase.
To use in your program as a pragma, merely invoke
use diagnostics;
at or near the start of your program. This also turns on -w mode.
This pragma is especially useful when you are new to perl, and want a better explanation of the errors and warnings. It's also helpful when you encounter some warning you've never seen before, e.g. when a new warning has been introduced in an upgraded version of Perl.
You may not want to leave diagnostics mode On for your production
server. For each warning, diagnostics mode generates ten times
more output than warnings mode. If your code generates warnings, with
the diagnostics pragma you will use disk space much faster.
diagnostics mode adds a large performance overhead in comparison
with just having warnings mode On. You can see the benchmark results
in the section 'Code Profiling Techniques'.
When a user presses a STOP or RELOAD button, the current socket connection goes broken (aborted). It would be nice if Apache could always immediately detect this event. Unfortunately there is no way to tell whether the connection is still valid unless an attempt to read from or write to connection is made.
Unfortunately the detection technique we are going to present doesn't work if the connection to the back-end mod_perl server is coming from the front-end mod_proxy, as the latter doesn't break the connection to the back-end when user has aborted the connection.
If the reading of the request's data is completed and the code does
processing without writing anything back to the client the broken
connection won't be noticed. When an attempt is made to send at least
one character to the client, the broken connection would be noticed
and the SIGPIPE signal (Broken pipe) would be sent to the
process. The program could then halt its execution and perform all the
cleanup stuff it has to do.
Prior to Apache version 1.3.6, SIGPIPE was handled by Apache.
Currently Apache is not handling SIGPIPE anymore and mod_perl takes
care of it.
Under mod_perl, $r->print (or just print()) returns a true
value on success, a false value on failure. The latter usually
happens when the connection is broken.
If you want a similar to the old SIGPIPE behaviour (as it was
before Apache version 1.3.6), add the following configuration
directive:
PerlFixupHandler Apache::SIG
When Apache's SIGPIPE handler is used, Perl may be left in the
middle of it's eval context, causing bizarre errors during subsequent
requests are handled by that child. When Apache::SIG is used, it
installs a different SIGPIPE handler which rewinds the context to
make sure Perl is back to normal state, preventing these bizarre
errors.
But in general case, you don't need to use the above setting.
If you use this setting and you would like to log when a request was
canceled by a SIGPIPE in your Apache access_log, you must define a
custom LogFormat in your httpd.conf, like so:
PerlFixupHandler Apache::SIG
LogFormat "%h %l %u %t \"%r\" %s %b %{SIGPIPE}e"
If the server has noticed that the request was canceled via a
SIGPIPE, then the log line will end with 1, otherwise it will
just be a dash. e.g.:
127.0.0.1 - - [09/Jan/2001:10:27:15 +0100] "GET /perl/stopping_detector.pl HTTP/1.0" 200 16 1 127.0.0.1 - - [09/Jan/2001:10:28:18 +0100] "GET /perl/test.pl HTTP/1.0" 200 10 -
Let's use the knowledge we have acquired to trace the execution of the code and watch all the events as they happen.
Let's take a little script that obviously "hangs" the server process:
stopping_detector.pl
--------------------
my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
$r->rflush;
while(1){
$i++;
sleep 1;
}
The script gets a request object $r by shift()ing it from the @_
argument list passed by the handler() subroutine. (This magic is done
by Apache::Registry). Then the script sends a Content-type
header, telling the client that we are going to send a plain text as a
response.
Next the script prints out a single line telling us the id of the process that handles this request, which we need to know in order to run the tracing utility. Then we flush Apache's buffer. (If we don't flush the buffer we will never see this short information printed. That's because our output is shorter than the buffer size and the script intentionally hangs, so the buffer won't be auto-flushed as the script hangs at the end.)
Then we enter an infinite while(1) loop, which just increments a
dummy variable and sleeps for a second.
Running strace -p PID, where PID is the process ID as printed to
the browser, we see the following output printed every second:
SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0
SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0
SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2) = 0
nanosleep(0xbffff308, 0xbffff308,
0x401a61b4, 0xbffff308, 0xbffff41c) = 0
time([941281947]) = 941281947
time([941281947]) = 941281947
Let's leave strace running and press the STOP button. Did
anything change? No, the same system calls trace is printed every
second. Which means that Apache didn't detect the broken connection.
Now we are going to write the \0 (NULL) character to the client in
attempt to detect the broken connection as close as possible to the
time the Stop button is pressed at. Therefore we modify the loop
code in the following way:
while(1){
$r->print("\0");
last if $r->connection->aborted;
$i++;
sleep 1;
}
We add a print() statement to print a NULL character and then we check
whether the connection was aborted with help of the
$r->connection->aborted method. If the connection is
broken, we break out of the loop.
We run this script and strace on it as before, but we see that it still doesn't work. The trouble is we aren't flushing the buffer, which leaves the characters in the buffer and they won't be printed before the buffer will get full and will be autoflushed. Since we want to attempt to write to the connection pipe all the time, after printing the NULL, we add $r->rflush(). Here is a new version of the code:
stopping_detector2.pl
---------------------
my $r = shift;
$r->send_http_header('text/plain');
print "PID = $$\n";
$r->rflush;
while(1){
$r->print("\0");
$r->rflush;
last if $r->connection->aborted;
$i++;
sleep 1;
}
After starting the strace utility on the running process as we did
before and pressing the Stop button, we have seen the following
output.
SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0
SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0
SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2) = 0
nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0
time([941284358]) = 941284358
write(4, "\0", 1) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) ---
select(5, [4], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
time(NULL) = 941284358
write(17, "127.0.0.1 - - [30/Oct/1999:13:52"..., 81) = 81
gettimeofday({941284359, 39113}, NULL) = 0
times({tms_utime=9, tms_stime=8, tms_cutime=0, tms_cstime=0}) = 41551400
close(4) = 0
SYS_174(0xa, 0xbffff4e0, 0xbffff454, 0x8, 0xa) = 0
SYS_174(0xe, 0xbffff46c, 0xbffff3e0, 0x8, 0xe) = 0
fcntl(18, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}
Apache detects the broken pipe as you see from this snippet:
write(4, "\0", 1) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) ---
Then it stops the script and does all the cleanup work, like access logging:
write(17, "127.0.0.1 - - [30/Oct/1999:13:52"..., 81) = 81
where 17 is a file descriptor of the opened access_log file
Cleanup code is a critical issue with aborted scripts.
What happens to locked resources if there are any? Will they be freed or not? If not, scripts using these resources and the same locking scheme will hang, waiting for them to be freed.
First let's go one step back and recall what are the problems and solutions for this issue under mod_cgi.
Under mod_cgi the resource locking issue is a problem only if you happened to create external lock files and use them for lock indication, instead of using flock(). If the script running under mod_cgi is aborted between the lock and the unlock code, and you didn't bother to write cleanup code to remove old dead locks then you are in big trouble.
The solution is to use an END block to place the cleanup code in:
END {
# some code that ensures that locks are removed
}
When the script is aborted, Apache will run the END blocks.
If you use flock() things are much simpler, since all opened files
will be closed when the script exits. When the file is closed, the
lock is removed as well--all the locked resources get freed. There
are systems where flock(2) is unavailable, and for those you can use
Perl's emulation of this function.
With mod_perl things can be more complex when you use global variables as a filehandlers. Because the processes don't exit after processing a request, files won't be closed unless you explicitly close() them or reopen with the open() call, which first closes a file. Let's see what problems we might encounter, and possible solutions for them.
First we want to make a little detour to discuss the "critical section" issue.
Let's start with a resource locking scheme. A schematic representation of a proper locking technique is as follows:
1. lock a resource
<critical section starts>
2. do something with the resource
<critical section ends>
3. unlock the resource
If the locking is exclusive, only one process can hold the resource at any given time, which means that all the other processes will have to wait, therefore the code between the locking and unlocking functions can become a service bottleneck. That's why this code section is called critical and once started it should be finished as soon as possible.
Even if you use a shared locking scheme, where many processes are allowed to concurrently access the resource, if there are processes that sometimes want to get an exclusive lock it's also important to keep the critical section as short as possible.
The next example uses a shared lock, but has a poorly-designed critical section:
critical_section_sh.pl
-------------------
use Fcntl qw(:flock);
use Symbol;
my $fh = gensym;
open $fh, "/tmp/foo" or die $!;
flock $fh, LOCK_SH;
# start critical section
seek $fh, 0, 0;
my @lines = <$fh>;
for(@lines){
print if /foo/;
}
# end critical section
close $fh; # close unlocks the file
The code opens the file for reading, locks and rewinds it to the beginning, reads all the lines from the file and prints out the lines that contain the string 'foo'.
The gensym() function imported by the Symbol module creates an
anonymous glob and returns a reference to it. Such a glob reference
can be used as a file or directory handle. and therefore allows using
lexically scoped variables as filehandlers. Fcntl imports into the
script's namespace file locking symbols like: LOCK_SH, LOCK_EX
and more. Refer to the Fcntl manpage for more information.
If the file the script reads is big, it'd take a relatively long time for this code to complete. All this time the file remains open and locked. While it's other processes may access this file for reading (shared lock), the process that wants to modify the file (which requires an acquisition of the exclusive lock), will be blocked waiting for this section to complete.
We can optimize the critical section this way:
Once the file has been read, we have all the information we need from it. In order to make the example simpler we've chosen to just print out the matching lines. In reality the code might be much longer.
We don't need the file to be open while the loop executes, because we don't access it inside the loop. If we close the file before we start the loop, we will allow other processes to have an exclusive access to the file if they need it, instead of blocking them for no reason.
In the following corrected version of the previous example, we only read the content of the file during the critical section and process it afterwards, without creating a possible bottleneck.
critical_section_sh2.pl
--------------------
use Fcntl qw(:flock);
use Symbol;
my $fh = gensym;
open $fh, "/tmp/foo" or die $!;
flock $fh, LOCK_SH;
# start critical section
seek $fh, 0, 0;
my @lines = <$fh>;
# end critical section
close $fh; # close unlocks the file
for(@lines){
print if /foo/;
}
Here is another similar example, but now it uses an exclusive lock. The script reads in a file and writes it back, adding a number of new text lines to the head of the file.
critical_section_ex.pl
--------------------
use Fcntl qw(:flock);
use Symbol;
my $fh = gensym;
open $fh, "+>>/tmp/foo" or die $!;
flock $fh, LOCK_EX;
# start critical section
seek $fh, 0, 0;
my @add_lines =
(
qq{Complete documentation for Perl, including FAQ lists,\n},
qq{should be found on this system using `man perl' or\n},
qq{`perldoc perl'. If you have access to the Internet, point\n},
qq{your browser at http://www.perl.com/, the Perl Home Page.\n},
);
my @lines = (@add_lines, <$fh>);
seek $fh, 0, 0;
truncate $fh, 0;
print $fh @lines;
# end critical section
close $fh; # close unlocks the file
Since we want to read the file, modify and write it back, without anyone else changing it on the way, we open it for read and write with the help of +>> and lock it with an exclusive lock. You cannot safely accomplish this task by opening the file first for read and then reopening for write, since another process might change the file between the two events. (You could get away with +< as well, please refer to the perlfunc manpage for more information about the open() function.)
Next, the code prepares the lines of text it wants to add to the
head of the file, and assigns them and the content of the file to the
@lines array. Now we have our data ready to be written back to the
file, so we seek() to the start of the file and truncate() it to zero
size. In our example the file always grows, so in this case there is
no need to truncate it, but if there was a chance that the file might
shrink then truncating would be necessary. However it's good practice
to always use truncate(), as you never know what changes your code
might undergo in the future. The truncate() operation does not carry
any significant performance penalty. Finally we write the data back
to the file and close it, which unlocks it as well.
Did you notice that we created the text lines to be added as close to the place of usage as possible? This complies with good "locality of code" style, but it makes the critical section longer. In such cases you should sacrifice style, in order to make the critical section as short as possible. An improved version of this script with a shorter critical section looks like this:
critical_section_ex2.pl
--------------------
use Fcntl qw(:flock);
use Symbol;
my @lines =
(
qq{Complete documentation for Perl, including FAQ lists,\n},
qq{should be found on this system using `man perl' or\n},
qq{`perldoc perl'. If you have access to the Internet, point\n},
qq{your browser at http://www.perl.com/, the Perl Home Page.\n},
);
my $fh = gensym;
open $fh, "+>>/tmp/foo" or die $!;
flock $fh, LOCK_EX;
# start critical section
seek $fh, 0, 0;
push @lines, <$fh>;
seek $fh, 0, 0;
truncate $fh, 0;
print $fh @lines;
# end critical section
close $fh; # close unlocks the file
There are two important differences. First, we prepare the text lines
to be added before the file is locked. Second, instead of
creating a new array and copying lines from one array to another, we
append the file directly to the @lines array.
Let's get back to the main issue of this section, which is safe resource locking.
Unless you use the Apache::PerlRun handler that does the cleanup
for you, if you don't make a habit of closing all the files that you
open--in some cases you will encounter lots of problems. If you open
a file but don't close it, you may have file descriptor leakage.
Since the number of file descriptors available to you is finite, at
some point you may run out of them and your service will fail. This
is bad, but you can live with it until you run out of file descriptors
(which will happen much faster on a heavily used server).
You can use system utilities to observe the opened and locked files, as well as the processes that has opened (and locked) the files. On FreeBSD you would use the fstat(1) utility. On many other UN*X flavors the lsof(1) utility is available.
But this is nothing compared to the trouble you will give yourself if the code terminates and the file stays locked. Any other process requesting a lock on the same file (or resource) will wait indefinitely for it to become unlocked. Since this will not happen until the server reboots, all these processes trying to use this resource will hang.
Here is an example of such a terrible mistake:
flock.pl
--------
use Fcntl qw(:flock);
open IN, "+>>filename" or die "$!";
flock IN, LOCK_EX;
# do something
# quit without closing and unlocking the file
Is this safe code? No - we forgot to close the file. So let's add the close():
flock2.pl
---------
use Fcntl qw(:flock);
open IN, "+>>filename" or die "$!";
flock IN, LOCK_EX;
# do something
close IN;
Is it safe code now? Unfortunately it is not. There is a chance that
the user may abort the request (for example by pressing his browser's
Stop or Reload buttons) during the critical section. The script
will be aborted before it has had a chance to close() the file, which
is just as bad as if we forgot to close it.
In fact if the same process will run the same code again, an open()
call will close the file first, which will unlock the resource. This
is because IN is a global variable. But it's quite possible that
the process that created the lock, will not serve the same request for
a while, since it would be busy serving other requests. So relying on
it to reopen the file is a bad idea.
This problem happens only if you use global variables as file handles. The following example has the same problem.
flock3.pl
---------
use Fcntl qw(:flock);
use Symbol ();
use vars qw($fh);
$fh = Symbol::gensym();
open $fh, "+>>filename" or die "$!";
flock $fh, LOCK_EX;
# do something
close $fh;
$fh is still a global variable and therefore the code using it
suffers from the same problem.
The simplest solution to this problem is to always use lexically scoped variables (created with my ()). Whether script gets aborted before close() is called or you forgot the use close() the lexically scoped variable will always go out of scope and therefore if the file was locked it will be unlocked. Here is a good version of the code:
flock4.pl
---------
use Fcntl qw(:flock);
use Symbol ();
my $fh = Symbol::gensym();
open $fh, "+>>filename" or die "$!";
flock $fh, LOCK_EX;
# do something
close $fh;
Please don't conclude from this example that you don't have to close files anymore, since they will be automatically closed for you. It's a bad style and should be avoided.
mod_perl comes with its own implementation of gensym(), so you don't
even need to load the Symbol module in order to use this function. In
mod_perl this function resides in the Apache package. For example:
use Apache; my $fh = Apache::gensym(); open $fh, "+>>filename" or die "$!"; ...
If you insist on using the file globs, at least make sure that you local()'ize these, and then if the flow of the code is interrupted before close() was called the filehandle will be automatically closed, since the local()'ized variable will go out of the scope. The following example shows that the file is indeed closed even when there is no close():
/tmp/io.pl
--------------
#!/usr/bin/perl
# /dev/null so strace output is more readable
open my $fh, ">/dev/null";
select $fh;
$| = 1;
{
print "enter";
local *FH;
open FH, $0;
print "leave"
}
print "done";
This simple script opens the /dev/null and tells Perl to send all
the STDOUT there, which is also made unbuffered. Then the block is
created in which the FH file glob is localized. Then it's used to
open the source code of the script (which resides in $0). In order
to separate event of entering the block scope and leaving it, the
debug print statements are used. Now let's run the script under
strace(1), which proves once again to be very useful in the tool bag
of the mod_perl programmer:
% strace /tmp/io.pl
write(3, "enter", 5) = 5
-> open("/tmp/io.pl", O_RDONLY) = 4
fstat(4, {st_mode=S_ISGID|S_ISVTX|0401, st_size=0, ...}) = 0
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
write(3, "leave", 5) = 5
-> close(4) = 0
write(3, "done", 4) = 4
So you can see that /tmp/io.pl is actually close()'d.
Under Perl version 5.6 Symbol.pm-like functionality is a built-in
feature, so you can do:
open my $fh, ">/tmp/foo" or die $!;
and $fh will be automatically vivified as a valid filehandle, so
you don't need to use the Symbol module anymore, if backward
compatibility is not a requirement.
You can also use the IO::* modules, such as IO::File or
IO::Dir. These are much bigger than the Symbol module, and
worth using for files or directories only if you are already using
them for the other features which they provide. As a matter of fact,
these modules use the Symbol module themselves. Here is an example
of their usage:
use IO::File;
use IO::Dir;
my $fh = IO::File->new(">filename");
my $dh = IO::Dir->new("dirname");
If you still have to use global filehandles, there are a few approaches we can take to solving the locking problem.
If you are running under Apache::Registry and friends, the END
block will perform the cleanup work for you. You might use END in
the same way for scripts running under mod_cgi, or in plain Perl
scripts. Just add the cleanup code to this block and you are safe.
For example if you work with dbm files just like with locking it's important to flush the dbm buffers, by calling a sync() method:
END{
# make sure that the DB is flushed
$dbh->sync();
}
Normally the END blocks will not be executed after the completion
of a request, but only when an Apache child process exits, then if you
are writing your own handlers you will need to use the
register_cleanup() function to supply cleanup code similar to that
used in END blocks instead of using END blocks.
Under mod_perl, the above will work only for Apache::Registry
scripts. Otherwise execution of the END block will be postponed
until the process terminates. If you write a handler in the Perl API
use the register_cleanup() method instead. It accepts a reference
to a subroutine as an argument:
$r->register_cleanup(sub { $dbh->sync() });
Even better would be to check whether the client connection has been aborted. If you don't check, the cleanup code will always be executed and for normally terminated scripts this may not be what you want:
$r->register_cleanup(
# make sure that the DB is flushed
sub{
$dbh->sync() if Apache->request->connection->aborted();
}
);
So in the case of END block usage you would use:
END{
# make sure that the DB is flushed
$dbh->sync() if Apache->request->connection->aborted();
}
Note that if you use register_cleanup() it should be called at the
beginning of the script, or as soon as the variables you want to use
in this code become available. If you use it at the end of the
script, and the script happens to be aborted before this code is
reached, there will be no cleanup performed.
For example CGI.pm registers the cleanup subroutine in its new()
method:
sub new {
# code snipped
if ($MOD_PERL) {
Apache->request->register_cleanup(\&CGI::_reset_globals);
undef $NPH;
}
# more code snipped
}
There is another way to register a section of cleanup code for Perl
API handlers. You may use PerlCleanupHandler in the configuration
file, like this:
<Location /foo>
SetHandler perl-script
PerlHandler Apache::MyModule
PerlCleanupHandler Apache::MyModule::cleanup()
Options ExecCGI
</Location>
Apache::MyModule::cleanup() performs the cleanup, obviously.
A similar situation to Pressed Stop button disease
happens when the
browser times out the connection (is it about 2 minutes?). There are
cases when your script is about to perform a very long operation and
there is a chance that its duration will be longer than the client's
timeout. One example is database interaction, where the DB engine
hangs or needs a long time to return the results. If this is the
case, use $SIG{ALRM} to prevent the timeouts:
$timeout = 10; # seconds
eval {
local $SIG{ALRM} =
sub { die "Sorry timed out. Please try again\n" };
alarm $timeout;
... db stuff ...
alarm 0;
};
die $@ if $@;
It was recently discovered that local $SIG{'ALRM'} does not restore
the original underlying C handler. This was fixed in mod_perl
1.19_01. As a matter of fact none of the local $SIG{FOO} signals
restores the original C handler - read Debugging Signal Handlers ($SIG{FOO}) for a
debug technique and a possible workaround.
Your server is up and running, but something appears to be wrong. You want to see the numbers to tune your code or server configuration. You just want to know what's really going on inside the server.
How do you do it?
There are a few tools that allow you to look inside the server.
This is a very useful module. It lets you watch what happens to the Perl parts of the server. You can see the size of all subroutines and variables, variable dumps, lexical information, OPcode trees, and more.
You shouldn't use it on production server as it adds quite a bit of overhead for each request.
This configuration enables the Apache::Status module with its
minimum feature set. Add this to httpd.conf:
<Location /perl-status>
SetHandler perl-script
PerlHandler Apache::Status
order deny,allow
#deny from all
#allow from
</Location>
If you are going to use Apache::Status it's important to put it as
the first module in the start-up file, or in httpd.conf:
# startup.pl use Apache::Status (); use Apache::Registry (); use Apache::DBI ();
If you don't put Apache::Status before Apache::DBI, you won't
get the Apache::DBI menu entry in the status. For more about
Apache::DBI see Persistent DB Connections.
There are several variables which you can use to modify the behaviour
of Apache::Status.
This single directive will enable all of the options described below.
When you are browsing symbol tables, you can view the values of your
arrays, hashes and scalars with Data::Dumper.
With this option On and the Apache::Peek module installed,
functions and variables can be viewed in Devel::Peek style.
With this option On and the B::LexInfo module installed, subroutine
lexical variable information can be viewed.
With this option On and B::Deparse version 0.59 or higher (included
in Perl 5.005_59+), subroutines can be "deparsed".
Options can be passed to B::Deparse::new like so:
PerlSetVar StatusDeparseOptions "-p -sC"
See the B::Deparse manpage for details.
With this option On, text-based op tree graphs of subroutines can be
displayed, thanks to B::Terse.
With this option On and the B::TerseSize module installed,
text-based op tree graphs of subroutines and their size can be
displayed. See the B::TerseSize docs for more info.
With this option On and the B::TerseSize module installed,
"Memory Usage" will be added to the Apache::Status main menu. This
option is disabled by default, as it can be rather cpu intensive to
summarize memory usage for the entire server. It is strongly
suggested that this option only be used with a development server
running in -X mode, as the results will be cached.
Remember to preload B::TerseSize with:
PerlModule B::Terse
When StatusDumper (see above) is enabled, another link "OP Tree
Graph" will be present with the dump if this configuration variable
is set to On.
This requires the B module (part of the Perl compiler kit) and the
B::Graph module version 0.03 or higher to be installed along with
the `dot' program. Dot is part of the graph visualization toolkit from
AT&T: http://www.research.att.com/sw/tools/graphviz/.
WARNING: Some graphs may produce very large images, and some graphs
may produce no image if B::Graph's output is incorrect.
There is more information about Apache::Status in its manpage.
Assuming that your mod_perl server listens on port 81, fetch http://www.myserver.com:81/perl-status
Embedded Perl version 5.00502 for Apache/1.3.2 (Unix) mod_perl/1.16 process 187138, running since Thu Nov 19 09:50:33 1998
Below all the sections are links when you view them through /perl-status
Signal Handlers Enabled mod_perl Hooks PerlRequire'd Files Environment Perl Section Configuration Loaded Modules Perl Configuration ISA Tree Inheritance Tree Compiled Registry Scripts Symbol Table Dump
Let's follow, for example, PerlRequire'd Files. We see:
PerlRequire Location /home/perl/apache-startup.pl /home/perl/apache-startup.pl
From some menus you can move deeper to peek into the internals of the server, to see the values of the global variables in the packages, to see the cached scripts and modules, and much more. Just click around...
Sometimes when you fetch /perl-status and look at the Compiled
Registry Scripts you see no listing of scripts at all. This is
correct: Apache::Status shows the registry scripts compiled in the
httpd child which is serving your request for /perl-status. If the
child has not yet compiled the script you are asking for,
/perl-status will just show you the main menu.
The Status module allows a server administrator to find out how well the server is performing. An HTML page is presented that gives the current server statistics in an easily readable form. If required, given a compatible browser this page can be automatically refreshed. Another page gives a simple machine-readable list of the current server state.
This Apache module is written in C. It is compiled by default, so all you have to do to use it is enable it in your configuration file:
<Location /status>
SetHandler server-status
</Location>
For security reasons you will probably want to limit access to it. If you have installed Apache according to the instructions you will find a prepared configuration section in httpd.conf: to enable use of the mod_status module, just uncomment it.
ExtendedStatus On
<Location /status>
SetHandler server-status
order deny,allow
deny from all
allow from localhost
</Location>
You can now access server statistics by using a Web browser to access the page http://localhost/status (as long as your server recognizes localhost:).
The details given by mod_status are:
This module is covered in the section "Apache::* Modules"
When the code doesn't perform as expected, either never or just sometimes, we say that the code needs debugging. There are several levels of debugging complexity.
The basic level is when Perl terminates the program during the compilation phase, before it tries to run the resulting byte-code. This usually happens because there are syntax errors in the code, or perhaps a module is missing. Sometimes it takes quite an effort to solve these problems, since code that uses Apache CORE modules generally won't compile when executed from the shell. We will learn how to solve syntax problems in mod_perl code quite easily.
Once the program compiles and begins to run, there might be logical
problems, when the program doesn't do what you thought you had
programmed it to do. These are somewhat harder to solve, especially
when there is a lot of code to be inspected and reviewed, but it's
just a matter of time. Perl can help a lot, for example to locate
typos, when we enable warnings. For example, if you wanted to compare
two numbers, but you omitted the second '=' character so that you had
something like if $yes = 1 instead of if $yes == 1, it warns us
about the missing '='.
The next level is when the program does what it's expected to do most of the time, but occasionally misbehaves. Often you find that print() statements or the Perl debugger can help, but inspection of the code generally doesn't. Often it's quite easy to debug with print(), but sometimes typing the debug messages can become very tedious. That's where the Perl debugger comes into its own.
While print() statements always work, running the perl debugger for CGI scripts might be quite a challenge. But with the right knowledge and tools handy the debug process becomes much easier. Unfortunately there is no one easy way to debug your programs, as the debugging depends entirely on your code. It can be a nightmare to debug really complex code, but as your style matures you can learn ways to write simpler code that is easier to debug. You will probably find that when you write simpler clearer code it does not need so much debugging in the first place.
One of the most difficult cases to debug, is when the process just terminates in the middle of processing a request and dumps core. Often when there is a bug the program tries to access a memory area that doesn't belong to it. The operating system halts the process, tidies up and dumps core (it creates a file called core in the current directory of the process that was running). This is something that you rarely see with plain perl scripts, but it can easily happen if you use modules written in C or C++ and something goes wrong with them. Occasionally you will come across a bug in mod_perl itself (mod_perl is written in C), that was in a deep slumber before your code awakened it.
In the following sections we will go through in detail each of the problems presented, thoroughly discuss them and present a few techniques to solve them.
While developing code we often make syntax mistakes, like forgetting to put a comma in a list, or a semicolon at the end of a statement.
Even at the end of a {} block, where a semicolon is not required at the end of the last statement, it may be better to put one in: there is a chance that you will add more code later, and when you do you might forget to add the now required semicolon. Similarly, more items might be added later to a list; unlike many other languages, Perl has no problem when you end a list with a redundant comma.
One approach to locating syntactically incorrect code is to execute
the script from the shell with the -c flag. This tells Perl to
check the syntax but not to run the code (actually, it will execute
BEGIN, END blocks, and use() calls, because these are
considered as occurring outside the execution of your program). (Note
also that Perl 5.6.0 has introduced a new special variable, $^C,
which is set to true when perl is run with the -c flag; this
provides an opportunity to have some further control over BEGIN and
END blocks during syntax checking.) Also it's a good idea to add
the -w switch to enable warnings:
perl -cw test.pl
If there are errors in the code, Perl will report the errors, and tell you at which line numbers in your script the errors were found.
The next step is to execute the script, since in addition to syntax errors there may be run time errors. These are the errors that cause the "Internal Server Error" page when executed from a browser. With plain CGI scripts it's the same as running plain Perl scripts -- just execute them and see that they work.
The whole thing is quite different with scripts that use Apache::*
modules which can be used only from within the mod_perl server
environment. These scripts rely on other code, and an environment
which isn't available when you attempt to execute the script from the
shell. There is no Apache request object available to the code when
it is executed from the shell.
If you have a problem when using Apache::* modules, you can make a
request to the script from a browser and watch the errors and warnings
as they are logged to the error_log file. Alternatively you can
use the Apache::FakeRequest module.
Apache::FakeRequest is used to set up an empty Apache request
object that can be used for debugging. The Apache::FakeRequest
methods just set internal variables with the same names as the methods and
return the value of the internal variables. Initial values for
methods can be specified when the object is created. The print method
prints to STDOUT.
Subroutines for Apache constants are also defined so that you can use
Apache::Constants while debugging, although the values of the
constants are hard-coded rather than extracted from the Apache source
code.
Let's write a very simple module, which prints "OK" to the client's browser:
package Apache::Example;
use Apache::Constants;
sub handler{
my $r = shift;
$r->send_http_header('text/plain');
print "You are OK ", $r->get_remote_host, "\n";
return OK;
}
1;
You cannot debug this module unless you configure the server to run it, by calling its handler from somewhere. So for example you could put in httpd.conf:
<Location /ex>
SetHandler perl-script
PerlHandler Apache::Example
</Location>
Then after restarting the server you could start a browser, request the location http://localhost/ex and examine the output. Tedious, no?
But with the help of Apache::FakeRequest you can write a little
script that will emulate a request and return the output.
#!/usr/bin/perl
use Apache::FakeRequest ();
use Apache::Example ();
my $r = Apache::FakeRequest->new('get_remote_host'=>'www.foo.com');
Apache::Example::handler($r);
when you execute the script from the command line, you will see the following output:
You are OK www.foo.com
Perl has no problem with the line numbers and file names for modules
that are read from disk in the normal way, but modules that are
compiled via eval() such as Apache::Registry and Apache::PerlRun
sometimes with some versions of Perl get confused.
There is the Perl <<HEREDOC inside eval "" problem that confuses the Perl current linenumber counter, newer Perls fix this. For older Perls compiling with the experimental PERL_MARK_WHERE=1 should solve this.
Also if you happen to use line continuation \ your script, the line
numbering will be off as well, since eval eats those \'s and
combines the lines together. The solution: do not use \'s to mark
lines as continued in Perl. Unlike shell languages, Perl doesn't
require you to do that.
There are compiler directives to reset its counter to some value that you decide. You can always pepper your code with these to help you locate the problem. At the beginning of the line you could write something of the form:
#line nnn label
For example:
#line 298 myscript.pl or #line 890 some_label_to_be_used_in_the_error_message
The '#' must be in the first column, so if you cut and paste from this text you must remember to remove any leading white space.
The label is optional - the filename of the script will be used by default. This directive sets the line number of the following line, not the line the directive is on. You can use a little script to stuff every N lines of your code with these directives, but then you will have to remember to rerun this script every time you add or remove code lines. The script:
#!/usr/bin/perl
# Puts Perl line markers in a Perl program for debugging purposes.
# Also takes out old line markers.
die "No filename to process.\n" unless @ARGV;
my $filename = shift;
my $lines = 100;
open IN, $filename or die "Cannot open file: $filename: $!\n";
open OUT, ">$filename.marked"
or die "Cannot open file: $filename.marked: $!\n";
my $counter = 1;
while (<IN>) {
print OUT "#line $counter\n" unless $counter++ % $lines;
next if /^#line /;
print OUT $_;
}
close OUT;
close IN;
chmod 0755, "$filename.marked";
Another way of narrowing down the area to be searched is to move most of the code into a separate modules. This ensures that the line number will be reported correctly.
To have a complete trace of calls add:
use Carp ();
local $SIG{__WARN__} = \&Carp::cluck;
The universal debugging tool across nearly all platforms and programming languages is printf() or the equivalent output function. This can send data to the console, a file, an application window and so on. In perl we generally use the print() function. With an idea of where and when the bug is triggered, a developer can insert print() statements in the source code to examine the value of data at certain stages of execution.
However, it is rather difficult to anticipate all possible directions a program might take and what data to suspect of causing trouble. In addition, inline debugging code tends to add bloat and degrade the performance of an application and can also make the code harder to read and maintain. And you have to comment out or remove the debugging print() calls when you think that you have solved the problem. But if later you discover that you need to debug the same code again, you need at best to uncomment the debugging code lines or, at worst, to write them again from scratch.
Let's see a few examples where we use print() to debug some problem. In one of my applications I wrote a function that returns the date that was one week ago. Here it is:
print "Content-type: text/plain\r\n\r\n";
print "A week ago the date was ",date_a_week_ago(),"\n";
# return a date one week ago as a string in format: MM/DD/YYYY
####################
sub date_a_week_ago{
my @month_len = (31,28,31,30,31,30,31,31,30,31,30,31);
my ($day,$month,$year) = (localtime)[3..5];
for (my $j = 0; $j < 7; $j++) {
$day--;
if ($day == 0) {
$month--;
if ($month == 0) {
$year--;
$month = 12;
}
# there are 29 days in February in a leap year
$month_len[1] =
(($year % 4 or $year % 100 == 0) and $year % 400 )
? 28 : 29;
# set $day to be the last day of the previous month
$day = $month_len[$month - 1];
} # end of if ($day == 0)
} # end of for ($i = 0;$i < 7;$i++)
return sprintf "%02d/%02d/%04d",$month,$day,$year+1900;
}
This code is pretty straightforward. We get today's date and subtract one from the value of the day we get, updating the month and the year on the way if boundaries are being crossed (end of month, end of year). If we do it seven times in loop then at the end we should get a date that was a week ago.
Note that since locatime() returns the year as a value of
current_four_digits_format_year-1900 (which means that we don't
have a century boundary to worry about) then if we are in the middle
of the first week of the year 2000, the value of year returned by
localtime() will be 100 and not 0 as you might mistakenly
assume. So when the code does $year-- it becomes 99 and not
-1. At the end we add 1900 to get back the correct four-digit year
format. (This is all correct as long as you don't go to the years
prior to 1900)
Also note that we have to account for leap years where there are 29 days in February. For the other months we have prepared an array containing the month lengths.
Now when we run this code and check the result, we see that something
is wrong. For example, if today is 10/23/1999 we expect the above
code to print 10/16/1999. In fact it prints 09/16/1999, which
means that we have lost a month. The above code is buggy!
Let's put a few debug print() statements in the code, near the
$month variable:
sub date_a_week_ago{
my @month_len = (31,28,31,30,31,30,31,31,30,31,30,31);
my ($day,$month,$year) = (localtime)[3..5];
print "[set] month : $month\n"; # DEBUG
for (my $j = 0; $j < 7; $j++) {
$day--;
if ($day == 0) {
$month--;
if ($month == 0) {
$year--;
$month = 12;
}
print "[loop $i] month : $month\n"; # DEBUG
# there are 29 days in February in a leap year
$month_len[1] =
(($year % 4 or $year % 100 == 0) and $year % 400 )
? 28 : 29;
# set $day to be the last day of the previous month
$day = $month_len[$month - 1];
} # end of if ($day == 0)
} # end of for ($i = 0;$i < 7;$i++)
return sprintf "%02d/%02d/%04d",$month,$day,$year+1900;
}
When we run it we see:
[set] month : 9
It is supposed to be the number of the current month (10), but actually
it is not. We have spotted a bug, since the only code that sets the
$month variable consists of a call to localtime(). So did we find
a bug in Perl? let's look at the manpage of the localtime() function:
% perldoc -f localtime Converts a time as returned by the time function to a 9-element array with the time analyzed for the local time zone. Typically used as follows:
# 0 1 2 3 4 5 6 7 8
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
localtime(time);
All array elements are numeric, and come straight out of a struct tm. In particular this means that C<$mon> has the range C<0..11> and C<$wday> has the range C<0..6> with Sunday as day C<0>. Also, C<$year> is the number of years since 1900, that is, C<$year> is C<123> in year 2023, and I<not> simply the last two digits of the year. If you assume it is, then you create non-Y2K-compliant programs--and you wouldn't want to do that, would you? [more info snipped]
Which reveals to us that if we want to count months from 1 to 12 and
not 0 to 11 we are supposed to increment the value of $month. Among
other interesting facts about locatime() we also see an explanation of
$year, which as I've mentioned before is set to the number of years
since 1900.
We have found the bug in our code and learned new things about localtime(). To correct the above code we just increment the month after we call localtime():
my ($day,$month,$year) = (localtime)[3..5];
$month++;
Sometimes you need to peek into complex data structures, and trying to
print them out can be tricky. That's where Data::Dumper comes to
our rescue. For example if we create this complex data structure:
$data =
{
array => [qw(a b c d)],
hash => {
foo => "oof",
bar => "rab",
},
};
How do we print it out? Very easily:
use Data::Dumper; print Dumper \$data;
What we get is a pretty-printed $data:
$VAR1 = \{
'hash' => {
'foo' => 'oof',
'bar' => 'rab'
},
'array' => [
'a',
'b',
'c',
'd'
]
};
While writing this example I made a mistake and wrote qw(a b c d)
instead of [qw(a b c d)]. When I pretty-printed the contents of
$data I immediately saw my mistake:
$VAR1 = \{
'b' => 'c',
'd' => 'hash',
'HASH(0x80cd79c)' => undef,
'array' => 'a'
};
That's not what I wanted of course, but I spotted the bug and corrected it, as you saw in the original example from above.
Of course you can use
print STDERR $variable;
or:
warn $variable;
instead of print to have all the debug messages in the error_log, which makes it even easier to debug your code.
Don't strive for elegant, clever code. Try to develop a good coding style by writing code which is concise yet easy to understand. It's much easier to find bugs in concise, simple code. And such code tends to have less bugs.
The 'one week ago' example from the previous section is not concise. There is a lot of redundancy in it, and as a result it is harder to debug than it needs to be. Here is a condensed version of the main loop. As you can see, this version won't make it easier to understand the code:
for (0..6) {
next if --$day;
$year--,$month=12 unless --$month;
$day = $month != 1
? $month_len[$month-1]
: (($year % 4 or $year % 100 == 0) and $year % 400 )
? 28
: 29;
}
Don't do that at home :)
Why did I present this version? Because it is too obscure, which makes it difficult to understand and maintain. On the other hand a part of this code is easier to understand.
Larry Wall, the author of Perl, is a linguist. He tried to define the syntax of Perl in a way that makes working in Perl much like working in English. So it can be a good idea to learn Perl coding idioms, some of which might seem odd at first but once you get used to them, you will find it difficult to understand how you could have lived without them before. I'll show just a few of the most common Perl coding idioms.
It's a good idea to write code which is more readable but which avoids redundancy, so it's better to write:
unless ($i) {...}
rather than:
if ($i == 0) {...}
if you want to test for trueness only.
Use a much more concise, Perlish style:
for my $j (0..6) {...}
instead of the syntax used in some other languages:
for (my $j=0; $j<=6; $j++) {...}
It's much simpler to write and comprehend code like this:
print "something" if $debug;
than this:
if($debug){
print "something";
}
A good style that improves understanding, readability and reduces the chances of having a bug is shown below in the form of yet another rewrite of our `one week ago' code:
for (0..6) {
$day--;
next if $day;
$month--;
unless ($month){
$year--;
$month=12
}
if($month == 1){
$day = (($year % 4 or $year % 100 == 0) and $year % 400 )
? 28 : 29;
} else {
$day = $month_len[$month-1];
}
}
which is a happy medium between the excessively verbose style of the first version and very obscure second version.
And of course a two liner, which is much faster and easier to understand is:
sub date_a_week_ago{
my ($day,$month,$year) = (localtime(time-604800))[3..5];
return sprintf "%02d/%02d/%04d",$month+1,$day,$year+1900;
}
Just take the current date in seconds since epoch as time() returns, subtract a week in seconds (7*24*60*60 = 604800) and feed the result to localtime() - voila we've got the date of one week ago!
Why is the last version important, when the first one works just fine? Not because of performance issues (although this last one is twice as fast as the first), but because there are more ways to put a bug in the first version than there are in the last one.
As we saw earlier, it's almost always possible to debug code with the help of print(). However, it is impossible to anticipate all the possible directions which a program might take, and difficult to know what code to suspect when trouble occurs. In addition, inline debugging code tends to add bloat and degrade the performance of an application, although most applications offer inline debugging as a compile time option to avoid these hits. In any case, this information tends to only be useful to the programmer who added the print statements in the first place.
Sometimes you have to debug tens of thousands lines of Perl in an application, and while you may be a very experienced Perl programmer who can understand Perl code quite well by just looking at it, no mere mortal can even begin to understand what will actually happen in such a large application, until the code is running. So you just don't know where to start adding your trusty print() statements to see what is happening inside.
The most effective way to track down a bug is to run the program inside an interactive debugger. The majority of programming languages have such a tool available, allowing one to see what is happening inside an application while it is running. The basic features of an interactive debugger allow you to:
Stop at a certain point in the code, based on a routine name or source file and line number
Stop at a certain point in the code, based on conditions such as the value of a given variable
Perform an action without stopping, based on the criteria above
View and modify the value of variables at any given point
Provide context information such as stack traces and source windows
It does take practice to learn the most effective ways of using an interactive debugger, but the time and effort will be paid back many-fold in the long run.
Most C and C++ programmers are familiar with the interactive GNU
debugger (gdb). gdb is a stand-alone program that requires your
code to be compiled with debugging symbols to be useful. While gdb
can be used to debug the Perl interpreter itself, it cannot be used to
debug your Perl scripts.
Not to worry, Perl provides its own interactive debugger, called
perldb. Giving control of your Perl program to the interactive
debugger is simply a matter of specifying the -d command line
switch. When this switch is used, Perl inserts debugging hooks into
the program syntax tree, but