Ok this topic is
being discussed too many times. Let's try to end it. Many times
I was looking for the answers to my questions
and one moment before going to scream for a help and going
to take people's time to answer them, I thought that while reading my
favorite newsgroups and mailing lists I saw people ask the same
questions over and over. Nice folks who answered the same questions
for a few times went crazy and didn't want to do it again.
Take a minute to type
the magic words www.dejanews.com and
beleive me 90% of your questions has been already answered in
the past.
What do you get when you do this:
So do yourself a favor use www.dejanews.com first , and if you fail to find an answer -- then come and ask your question, and we will gladly help you out.
Another nice page to read is Why Questions go Unanswered? (http://www.PLOVER.com/~mjd/perl/Questions.html) by Mark-Jason Dominus.
Filelocking? A must understand function when you program a cgi script (in any language). Don't cry if your DB or some file is corrupted or broken and you didn't hear about file locking! Before you go any father learn how to protect yourself from these annoying disasters.
How it can happen? Very simple, you wrote a guestbook, which works by reading the file in modifying it and writing it back. Now imagine that 2 users delighted to submit their appriciate of your site using your guestbook, and it happens that they press the submit button at the same time. as a result 2 occurences of your guestbook are starting to run, both read the original file modify it and write it back. What happening next? There are few possibilities:
There are few other scenarios to describe, but this one is the most important. So how do you solve it? Easy (if you have a flock(3) working). Use flock (100% clean) or emulate it(not 100% clean). Lets rewrite the guestbook:
Ok, here are 2 solution to flocking: with flock(3) and other with external file locking which can fail sometimes (I warned you!) !!!
Notes:
# specify the filename to open
my $file = "filename";
open FH, "+<$file"
or die "Cannot open $file: $!";
flock FH, 2; # wait for exclusive lock
## from here to the close FH, we have critical region
## be sure to minimize this time
## get current content
seek FH, 0, 0; # rewind to beginning
my @content = <FH>; # get current content
<Change your @content lines here>## write new list, and release file seek FH, 0, 0; # rewind again truncate FH, 0; # empty the file print FH @content; # print the new content ## release file close FH;
# Note that close frees flock as well !
open FH, "+<$file"
or die "Cannot open $file: $!";
flock FH, 2; # wait for exclusive lock
seek FH, 0, 2; # end of file
print FH "Anything you want to append";
close FH;If you have flock(3) function working use the method described in Locking with flock and modifying the file by opening it only once
If not , you need to create a lockfile when you enter a critical section, and remove it when you leave it, you have to handle the case where your cgi is getting killed in the middle of the critical section -> so the lockfile is not getting removed. So another instance of the program should make the clean up. But the wise one! Here is the code. which is not very good since it can perform no atomic operation when the test and set operations are executed. flock(3) does it at once
sub get_file_lock
{
local ($lock_file) = @_;
local ($timeout);
$timeout=20; # in seconds
while (-e $lock_file &&
(stat($lock_file))[9]+$timeout>time)
{ sleep(1);}
open LOCK_FILE, ">$lock_file"
or die "Can't open $lock_file:$!);
}
sub release_file_lock
{
local ($lock_file) = @_;
close(LOCK_FILE);
unlink($lock_file);
}So how do you use it? Like flock() but you need to specify a
lockfile name to the function!
Important: it should be a static
name like /tmp/myapplication.lock if you think about $$.lock ($$ is
the process number) or something else, you break the concept and your
code. 2programs will easily overwrite each other changes since they
will test for a different files and they will succedd -> you will
fail to lock!
Where this code will fail? In test and write operation. There is a chance (you decide if you can afford it) that 2 programs will try to get the lock file at the time (lets name them A and B) . Now think imagine this:
while (-e $lock_file) {sleep 1);
open LOCK_FILE, ">$lock_file"
or die "Can't open $lock_file:$!);How to fix that? How about writing in the PID (process ID) which is unique for any concurrently running process. It's still doesn't work since again you don't have atomic test and write operation and interrupting the process after the test and before the write will return on the above scenario! you could think about many workarounds, as much as know ther is no one to work always! all of them has their breakpoints. So it's better to get an OS supporting the flock(3).
The following module was written by Tom Christiansen. This technique should be much better then one described above, and as he says, if you replace links with directories it would be even better (But still it doesn't solve the following issues:
Date: Fri, 29 May 1998 06:28:42 -0500
From: Tom Christiansen <tchrist@jhereg.perl.com>
Well, I developed this, but then decided it would be
more portable if it used mkdir.
--tom
package File::NetLock;
use Exporter;
@ISA = qw(Exporter);
@EXPORT = qw(nflock nunflock);
use Cwd;
use Sys::Hostname;
use File::Basename;
use File::stat;
use strict;
sub nflock($;$) {
my $pathname = shift;
my $naptime = shift || 5;
my $lockname = name2lock($pathname);
while (1) {
link($pathname, $lockname); # XXX: ignore return
last if stat($pathname)->nlink == 2;
sleep 5;
}
print "lock granted on $lockname\n";
}
sub nunflock($) {
my $pathname = shift;
my $lockname = name2lock($pathname);
print "lock dissolved on $lockname\n";
unlink($lockname); # XXX: ignore return
}
sub name2lock {
my $pathname = shift;
my $dir = dirname($pathname);
my $file = basename($pathname);
$dir = getcwd() if $dir eq '.';
my $hostname = hostname();
my $lockname = "$dir/$file.lock-$hostname.$$";
warn $lockname;
return $lockname;
}
1;
These are the pointers to other good resources covering file locking issues:
Consider this little script grep.cgi which receives a string from user greps for some file and returns the matching lines
#!/usr/bin/perl -w
my $q = new CGI;
my $pat = $q->param('pattern_to_grep');
# or whatever code you use to extract
# the form field
print $q->header,
$q->starthtml('test'),
`grep $pat somefile`;Everything is OK untill some user will try to grep for:
stupid man ; rm -rf *
Your CGI script will be happy to
execute : grep stupid man; rm -rf
* somefile;
First grep for 'stupid' in file 'man' then rm recursively everything
you have under a directory you grep() from, and if your script is
happened to run as a root rm -rf /* will delete everything
!
You can argue that you can check for the input and doesn't let this happen, but you never know how smart is a hacker who picks for the holes in your scripts. To be on the safe side always use the -T option of perl or Safe.pm module. For information about Safe.pm module read the man page, here I will discuss the -T option and data untainting. When you change this script to start as
#!/usr/bin/perl -wT
and run it you will get an error message:
Insecure dependency in `` while running with -T switch
It means that you have to untaint $pat first. You can do it easily with regexp. Basically:
$pat =~ /(.*)/; $pat = $1;
will shut the warning about Insecure dependency. But this is not what you want (this is how you fool the perl's -T switch ). We want to untaint the data and let only the valid characters in.
$pat =~ /([\w-]*)/; $pat = $1;
Now you can safely call the `grep $pat somefile`; If you now about other characters you want to allow add them into [\w-] like "." [\w-\.] and etc...
You also have to set $ENV{PATH} = "/bin:/usr/bin" at the beginning of your script;
For more information please refer to the perlsec man page (run perldoc perlsec)
It is known that you can't validate all email addresses. But for many of them you can.
Use the CPAN modules: Email::Valid (http://search.cpan.org/search?dist=Email-Valid) or Email::Valid::Loose (http://search.cpan.org/search?dist=Email-Valid-Loose)
Ok, I want to call progam b.pl from a.pl as
% b.pl $who $where
where variables $who $where are:
$who="<a href=\"a.html\">It's Me\</a>";
$where="<a href=\"b.html\">There\</a>";
You say, it's not a problem, but I say it's because of [It's]
-- ' Is a special shell character, and we will use it to separate
parameters like: 'a b' 'c
d'.
a b is a first parameter and
c d the second. What if
I need to pass e'f parameter?
So here we go
$who =~ s/\'/\\\'/g;
$where =~ s/\'/\\\'/g;
This escapes the ' and \ otherwise perl will strip \ since it uses \ for escape too!
system(" b.pl \'$who\' \'$where\' ");
If you don't know what characters might appear in your input stay on the safe side and escape all of them $,",! and other by s/(\'|\$|\"|!)/\\$1/g;
It will work for calling other programs as well...
But you can also save all the hassle by using String::ShellQuote (http://www.argon.org/~roderick/) perl module (by Roderick Schertler).
During fast and dirty URL building you can end up with '//' chars instead of '/'. How to fix that? Here the fast and dirty fix:
@urls=qw(
http://www.a.com/file/in.html
http://www.a.com//big-dash-dash/in.html
http://www.a.com/file.with.dot//in.html
http://www.a.com//file-dash.dot//in.html
);
foreach (@urls){
s#([\-\.\w]+)//#$1/#g;
print $_,"\n";
}or
foreach (@urls){
# much the first part http://
m#^[^/]//#;
# start from the last matched place
# and replace all // with /
s#\G//#/#g;
print $_,"\n";
} You end up with 4 valid URLs :
Benchmark: timing 50000 iterations of try_a, try_b try_a: 29 secs (26.31 usr 0.00 sys = 26.31 cpu) try_b: 8 secs ( 5.95 usr 0.00 sys = 5.95 cpu)
4 times faster!
If we are already talking about effeciency, my beloved map {} construct is going to be slower than foreach, how much ?
Benchmark: timing 50000 iterations of try_b, try_c try_b: 8 secs ( 5.95 usr 0.00 sys = 5.95 cpu) try_c: 12 secs (10.35 usr 0.00 sys = 10.35 cpu)
Almost as twice as slower!!! What a pitty :)
Need to replace the nasty ^M tags when you parse the user entered data?
$data = 'This is multiline^M
input data^M
Yup^M.';
$data =~ s/\cM//sg; # now it's clean
# g stands for global (all of them)
# s stands for single line
# (treat the multiline data as a single line)A snippet to automatically %-encode URLs (Contributed by Ryan Alyn Porter)
If you try to feed the following URL into your browser's
location window:
http://www.you.com/cgi-bin/add?category=Computers and Internet
The CGI will receive only the category=Computers, the
rest will be ignored. Here is how to solve this problem:
$strURL = 'http://www.you.com/cgi-bin/add?category=Computers and Internet';
$strURL =~ s/(\W)/sprintf("%%%x", ord($1))/eg;
print $strURL;
http%3a%2f%2fwww%2eyou%2ecom%2fcgi%2dbin
%2fadd%3fcategory%3dComputers%20and%20Internet
(split to made it fit)
Now you can submit this encoded URL to any search engine
Another technique is to use URI::Escape module
use URI::Escape;
$strURL = uri_escape($strURL ,
'\x00-\x20"#\@%;<>{}|\\\\^~`\[\]\x7F-\xFF' );I have just learned a nice snippet to use for CGI debugging. Put it at the beginning of your code or even better in the BEGIN{} block. Then any time your script calls die() or warn(). All the STDERR (the warnings and the errors) will be printed to your browser's window. So from now on, no need to use errors log files (almost :-)
## death handler, presumes no output yet
$SIG{``__DIE__''} = $SIG{``__WARN__''} = sub {
my $error = shift;
chomp $error;
$error =~ s/[<&>]/``&#''.ord($&).``;''/ge;
print ``Content-type: text/html\n\n[$error]\n'';
exit 0;
};
Let's assume this code line: open FILE, $file or die "Can't open $file:$!\n";
If script fails to open the file you will get the message you passed to die() to your browser's window
I have learned about this wonderfull technique from Web Techniques (http://www.stonehenge.com/merlyn/WebTechniques/)Column 9 by Randal L. Schwartz.
Now I have extended it to provide me more information and it knows when to print the HTML code and when not (when I debug from shell):
$SIG{"__DIE__"} = sub {
my $error = shift;
chomp $error;
# this will handle the case when script is being run from
# shell, so we don't want to see all the HTML but only the
# error message
print $error, return unless defined $ENV{GATEWAY_INTERFACE};
$error =~ s/\n/<BR>\n/gs; # return \n as html
# other parameters
my $cgi_url = $ENV{SCRIPT_NAME} || '';
my $referer = $ENV{HTTP_REFERER} || '';
my $remote_user = $ENV{REMOTE_USER} || '';
# OK, report the error to the browser
# in HTML format
print "Content-type: text/html\n\n";
print qq{
<HTML>
<HEAD>
<TITLE>Error Occured!</TITLE>
</HEAD>
<BODY BGCOLOR="white">
$PROJECT_HEADER
<CENTER><H1>An Error Occured! </H1>
</CENTER>
<P><HR>
<CENTER><B>The errors are:</B>
</CENTER>
<P>
$error
<P><HR>
<CENTER><B> Additional info:</B>
</CENTER><P>
<DT>CGI PATH - $0
<DT>CGI URL - $cgi_url
<DT> HTTP_REFERER - $referer
<DT> REMOTE_USER - $remote_user
<P><B><HR>
Please report it to
<a
href="mailto:you\@yourserver.com">
webadmin
</a></B>
<P>Thanks!
</BODY>
</HTML>
};
exit 0;
};Ofcourse you don't have to set the $SIG{__WARN__} since the code above stops the code execution and exit. You might want to write a separate SIG handler for warn() calls with no exit() call in it.
If you aren't a root and you will try to install the perl5 modules on your machine you will be dissapointed that you can't do that, like you can't install many other things into the system (non-user) area. So what you are going to do is to install things locally.
To install perl5 modules into a local directory use instead of :
perl Makefile.PL
make
make test
make install
perl Makefile.PL PREFIX=/path_to_your_home \
INSTALLPRIVLIB=/path_to_your_home/lib/perl5 \
INSTALLSCRIPT=/path_to_your_home/bin \
INSTALLSITELIB=/path_to_your_home/lib/perl5/site_perl \
INSTALLBIN=/path_to_your_home/bin \
INSTALLMAN1DIR=/path_to_your_home/man \
INSTALLMAN3DIR=/path_to_your_home/man3
make
make test
make install
Where /path_to_your_home can be something like /home/your_name .
Note than in newer versions of MakeMaker running:
perl Makefile.PL PREFIX=/path_to_your_home
would be sufficient, so you might want to try this first.
You have to create the /path_to_your_home/lib/perl5 directory prior
to running the commands above:
mkdir -p /path_to_your_home/lib/perl5
To save you a typing create a file (e.g ~/.perl_dirs ) and put there :
PREFIX=/path_to_your_home \
INSTALLPRIVLIB=/path_to_your_home/lib/perl5 \
INSTALLSCRIPT=/path_to_your_home/bin \
INSTALLSITELIB=/path_to_your_home/lib/perl5/site_perl \
INSTALLBIN=/path_to_your_home/bin \
INSTALLMAN1DIR=/path_to_your_home/man \
INSTALLMAN3DIR=/path_to_your_home/man3 and now any time you want to install some modules do:
perl Makefile.PL `cat ~/.perl_dirs`
make
make test
make installThis way you also can have a few different local modules installation, for example one for production perl and another for development , so you call either
% perl Makefile.PL `cat ~/.perl_dirs.production` or % perl Makefile.PL `cat ~/.perl_dirs.develop`
Basically when you install you script into /usr/lib/perl modules are being installed into these 4 directories (note that these are the directories for RS machine with AIX OS machine running perl 5.00502, so in your case aix and 5.00502 would be different if you have a different os/perl versions
These 4 directories are already preset in your @INC so you shouldn't worry about them.
Now if you install perl5 modules locally, lets say into /home/stas/lib you will have to append to the @INC array these new 4 directories. In most cases you add only 2 directories:
The architecture specific directories are being searched by perl automatically (almost see the notes below)
Each time you want to use modules in that path you should add to your scripts:
use lib qw(/home/stas/lib/perl5/5.00502
/home/stas/lib/perl5/site_perl/5.005);You don't have to put it into BEGIN block, lib.pm modules worries to make it in the BEGIN block. It also adds the architecture specific directories.
You also can use the:
BEGIN { unshift @INC, qw(/home/stas/lib/perl5/5.00502
/home/stas/lib/perl5/5.00502/aix
/home/stas/lib/perl5/site_perl/5.005
/home/stas/lib/perl5/site_perl/5.005/aix);
}but the use lib seems to be cleaner and unshift @INC doesn't add the architecture specific directories to the @INC, so we have to add these explicitely.
Ok, you have installed module A in /home/stas/lib/ Now you want to install a module B that demands module A to be already installed. You know that you have installed the A module, but amazingly B can't locate it. You wonder why? Because when you try to install the module B it doesn't know that you have module A installed in a different place. It searchs the basic 4 directories as defined by default in @INC. But your local dirs aren't listed there.
The solution is simple. PERL5LIB environment variable does the same job as 'use lib' does in your script. So if you use csh/tcsh do :
% setenv PERL5LIB /home/stas/lib/perl5/5.00502: /home/stas/lib/perl5/5.00502/aix: /home/stas/lib/perl5/site_perl/5.005: /home/stas/lib/perl5/site_perl/5.005/aix
(Must be a single argument! I've split it so it would fit in width.)
at your shell prompt. Notice the semicolon separates the directories. Check the man page of your favorite shell how to set the environment variables if you use a shell different from csh/tcsh. Put this setenv statement into .login or other file that is being sourced each time you login into your account and you will not have to worry to remember setting it again and again.
To make the download/install/upgrade easier and faster consider using a wonderfull CPAN.pm module, (see www.perl.com/CPAN/) if you have perl5.004 or higher you have it bundled with a distribution, else download it from CPAN.
When you install it at first time, it asks you a few questions, part of them is a PREFIX dir as above (so you can defined a different ,PREFIX dir if you want local installations).
After configuration is over, you run it as
% perl -MCPAN -e shell
> install CGI -- and it will fetch the
latest CGI module, unpack it, make it, test it and install it into your
local area or any other directory you told it to (as defined in PREFIX
dir )
> i /CGI/ -- will return the list of modules that match that pattern.
It has more functionality like checking for the latest modules and more, just run the perldoc CPAN to read the man page
... it's a time saver module!!!
Note: All 7.* sections were adopted by iserver.com (http://new.iserver.com/support/contrib/perl5/modules.html) for their support pages - you might want to check a more clear version of this quick and dirty doc.